Collectord

Kubernetes Centralized Logging with AWS S3, Athena, Glue and QuickSight

Glue tables

Collectord automatically creates a database kubernetes with 3 tables container_logs, events and host_logs. You can navigate to the Glue catalog to find the database and the list of tables.

AWS Glue Databases

AWS Glue Tables

Collectord automatically updates partitions when it uploads new data on S3. No need to run any crawlers for discovering the data.

Table Schemas

Collectord image has predefined schema formats for each table.

Table container_logs

{
  "DatabaseName": "kubernetes",
  "TableInput": {
    "Name": "container_logs",
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
      "classification": "json"
    },
    "PartitionKeys": [
      {
        "Name": "cluster",
        "Type": "string"
      },
      {
        "Name": "namespace",
        "Type": "string"
      },
      {
        "Name": "workload",
        "Type": "string"
      },
      {
        "Name": "pod_name",
        "Type": "string"
      },
      {
        "Name": "container_name",
        "Type": "string"
      },
      {
        "Name": "dt",
        "Type": "string"
      }
    ],
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "Location": "generated by the s3.output bucket and region",
      "Columns": [
        {
          "Name": "message",
          "Type": "string"
        },
        {
          "Name": "timestamp",
          "Type": "string"
        },
        {
          "Name": "pod_labels",
          "Type": "array<string>"
        },
        {
          "Name": "node_name",
          "Type": "string"
        },
        {
          "Name": "pod_ip",
          "Type": "string"
        },
        {
          "Name": "container_id",
          "Type": "string"
        },
        {
          "Name": "docker_labels",
          "Type": "array<string>"
        },
        {
          "Name": "docker_version",
          "Type": "string"
        },
        {
          "Name": "node_id",
          "Type": "string"
        },
        {
          "Name": "container_imageid",
          "Type": "string"
        },
        {
          "Name": "container_image",
          "Type": "string"
        },
        {
          "Name": "pod_id",
          "Type": "string"
        },
        {
          "Name": "host_ip",
          "Type": "string"
        },
        {
          "Name": "node_labels",
          "Type": "array<string>"
        },
        {
          "Name": "stream",
          "Type": "string"
        },
        {
          "Name": "namespace_labels",
          "Type": "array<string>"
        },
        {
          "Name": "job_name",
          "Type": "string"
        },
        {
          "Name": "job_id",
          "Type": "string"
        },
        {
          "Name": "job_labels",
          "Type": "array<string>"
        },
        {
          "Name": "cronjob_name",
          "Type": "string"
        },
        {
          "Name": "cronjob_id",
          "Type": "string"
        },
        {
          "Name": "cronjob_labels",
          "Type": "array<string>"
        },
        {
          "Name": "deployment_name",
          "Type": "string"
        },
        {
          "Name": "deployment_id",
          "Type": "string"
        },
        {
          "Name": "deployment_labels",
          "Type": "array<string>"
        },
        {
          "Name": "statefulset_name",
          "Type": "string"
        },
        {
          "Name": "statefulset_id",
          "Type": "string"
        },
        {
          "Name": "statefulset_labels",
          "Type": "array<string>"
        },
        {
          "Name": "replicaset_name",
          "Type": "string"
        },
        {
          "Name": "replicaset_id",
          "Type": "string"
        },
        {
          "Name": "replicaset_labels",
          "Type": "array<string>"
        },
        {
          "Name": "replicationcontroller_name",
          "Type": "string"
        },
        {
          "Name": "replicationcontroller_id",
          "Type": "string"
        },
        {
          "Name": "replicationcontroller_labels",
          "Type": "array<string>"
        },
        {
          "Name": "daemonset_name",
          "Type": "string"
        },
        {
          "Name": "daemonset_id",
          "Type": "string"
        },
        {
          "Name": "daemonset_labels",
          "Type": "array<string>"
        },
        {
          "Name": "file_path",
          "Type": "string"
        },
        {
          "Name": "volume_name",
          "Type": "string"
        },
        {
          "Name": "ec2_instance_id",
          "Type": "string"
        },
        {
          "Name": "ec2_instance_type",
          "Type": "string"
        },
        {
          "Name": "host",
          "Type": "string"
        }
      ],
      "SerdeInfo": {
        "Parameters": {
          "paths": "message, timestamp, pod_labels, node_name, pod_ip, container_id, docker_labels, docker_version, node_id, container_imageid, container_image, pod_id, host_ip, node_labels, stream, namespace_labels, job_name, job_id, job_labels, cronjob_name, cronjob_id, cronjob_labels, deployment_name, deployment_id, deployment_labels, statefulset_name, statefulset_id, statefulset_labels, replicaset_name, replicaset_id, replicaset_labels, replicationcontroller_name, replicationcontroller_id, replicationcontroller_labels, daemonset_name, daemonset_id, daemonset_labels, file_path, volume_name"
        },
        "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
      }
    }
  }
}

Table events

{
  "DatabaseName": "kubernetes",
  "TableInput": {
    "Name": "events",
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
      "classification": "json"
    },
    "PartitionKeys": [
      {
        "Name": "cluster",
        "Type": "string"
      },
      {
        "Name": "namespace",
        "Type": "string"
      },
      {
        "Name": "dt",
        "Type": "string"
      }
    ],
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "Location": "{automatically generated by Collectord}",
      "Columns": [
        {
          "Name": "message",
          "Type": "string"
        },
        {
          "Name": "timestamp",
          "Type": "string"
        },
        {
          "Name": "namespace_labels",
          "Type": "array<string>"
        }
      ],
      "SerdeInfo": {
        "Parameters": {
          "paths": "message, timestamp, namespace_labels"
        },
        "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
      }
    }
  }
}

Table host_logs

{
  "DatabaseName": "kubernetes",
  "TableInput": {
    "Name": "host_logs",
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
      "classification": "json"
    },
    "PartitionKeys": [
      {
        "Name": "cluster",
        "Type": "string"
      },
      {
        "Name": "host",
        "Type": "string"
      },
      {
        "Name": "dt",
        "Type": "string"
      }
    ],
    "StorageDescriptor": {
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "Location": "{automatically generated by Collectord}",
      "Columns": [
         {
          "Name": "message",
          "Type": "string"
        },
        {
          "Name": "timestamp",
          "Type": "string"
        },
        {
          "Name": "docker_version",
          "Type": "string"
        },
        {
          "Name": "docker_labels",
          "Type": "array<string>"
        },
        {
          "Name": "node_labels",
          "Type": "array<string>"
        },
        {
          "Name": "file_path",
          "Type": "string"
        },
        {
          "Name": "priority",
          "Type": "string"
        },
        {
          "Name": "syslog_component",
          "Type": "string"
        },
        {
          "Name": "syslog_facility",
          "Type": "string"
        },
        {
          "Name": "syslog_pid",
          "Type": "string"
        },
        {
          "Name": "ec2_instance_id",
          "Type": "string"
        },
        {
          "Name": "ec2_instance_type",
          "Type": "string"
        }
      ],
      "SerdeInfo": {
        "Parameters": {
          "paths": "message, timestamp, docker_version, docker_labels, file_path, priority, syslog_component, syslog_facility, syslog_pid"
        },
        "SerializationLibrary": "org.openx.data.jsonserde.JsonSerDe"
      }
    }
  }
}
  • Installation
    • Setup centralized Logging in 5 minutes.
    • Automatically forward host, container and application logs.
    • Test our solution with the 30 days evaluation license.
  • AWS Glue Catalog
    • Table definitions in Glue Catalog.
  • Querying data with Athena
    • Query automatically partitioned data with AWS Athena.
    • Best practices to work with Athena.
    • Query examples for container_logs, events and host_logs.
  • QuickSight for Dashboards and Reports
    • Connecting AWS QuickSight with the Athena.
    • Building dashboards.
  • Access control
    • Limit access to the data with IAM Policy.
  • Annotations
    • Forwarding application logs.
    • Multi-line container logs.
    • Fields extraction for application and container logs (including timestamp extractions).
    • Hiding sensitive data, stripping terminal escape codes and colors.
  • Configuration
    • Advanced configurations for collectord.
  • Troubleshooting
    • Troubleshooting steps.
    • Verify configuration.

About Outcold Solutions

Outcold Solutions provides solutions for building centralized logging infrastructure and monitoring Kubernetes, OpenShift and Docker clusters. We provide easy to setup centralized logging infrastructure with AWS services. We offer Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers.