Collectord

CloudWatch Logs names for LogGroup and LogStream when forwarding Kubernetes Logs

March 20, 2019

Collectord makes it very easy to forward logs and events from Kubernetes clusters to CloudWatch Logs. If you have not yet, take a look at the installation instructions, and request free 30 days trial license key.

With Collectord it is very easy to configure the outputs for the Kubernetes Logs. You can override default names for the CloudWatch Logs LogGroup and LogStream. And you can always override names for a specific Pod or Kubernetes Workload.

Consideration for choosing LogGroup and LogStream data

When you choose the names for the LogGroup and LogStream consider the following:

  • Group the logs produced from the similar sources under the same LogGroup. LogGroups allow you to configure alerts, and if you will mix logs from different types of the data, it could be challenging to configure correct metrics extractions.
    • As an example, it's good to combine all the nginx access logs from various Pods within the same deployment under the same LogGroup, so you can easily configure one alert for various not expected results from the nginx servers.
    • On the contrary, combining all the logs from the same namespace under the same LogGroup, let's say for a nodejs application and postgres database, will make it hard to write correct metrics extractions, considering that you will have data forwarded in different format in the LogStream under the same LogGroup.
  • LogGroup can have a maximum length of 256 symbols and should be matched to the pattern [\.\-_/#A-Za-z0-9]+. Collectord automatically replaces incorrect symbols with _ symbol.
  • LogStream can have a maximum length of 512 symbols and should match the pattern [^:*]*. Collectord automatically replaces incorrect symbols with _ symbol.
  • Every stream should have just one writer. Which means, that if you have multiple Pods running under the same Deployment, forwarding all of them to LogGroup /kubernetes/example and LogStream /nginx-logs can significantly slow down the log forwarding, as one request from forwarding pipeline of the first Pod changes the sequenceNumber, which requires a second forwarding pipeline resubmit the data with the new sequenceNumber. When just one forwarding pipeline forwards the data, it always receives the next sequenceNumber for the next batch of data from previous request.

Default LogGroup and LogStream formats

By default Collectord forwards logs to the LogGroup and LogStreams in the following format

  • Container logs:
    • LogGroup: /kubernetes/{{cluster}}/container_logs/{{namespace}}/{{::coalesce(daemonset_name, deployment_name, statefulset_name, cronjob_name, job_name, replicaset_name, pod_name)}}/
    • LogStream: /{{pod_name}}/{{container_name}}/{{container_id}}/{{stream}}@{{host}}
  • Application logs:
    • LogGroup: /kubernetes/{{cluster}}/container_logs/{{namespace}}/{{::coalesce(daemonset_name, deployment_name, statefulset_name, cronjob_name, job_name, replicaset_name, pod_name)}}/
    • LogStream: /{{pod_name}}/{{container_name}}/{{container_id}}/{{volume_name}}/{{file_path}}@{{host}}
  • Host Logs:
    • LogGroup: /kubernetes/{{cluster}}/host_logs/{{host}}
    • LogStream: /{{file_path}}
  • Events:
    • LogGroup: /kubernetes/{{cluster}}/events/{{namespace}}/
    • LogStream: /

You can easily change that, by including different configurations in the ConfigMap, that we provide with the installation instructions,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
apiVersion: v1
kind: ConfigMap
metadata:
  name: collectord-cloudwatch
  namespace: collectord-cloudwatch
  labels:
    app: collectord-cloudwatch
data:
  101-general.conf: |
    [general]
    # Review SLA at https://www.outcoldsolutions.com/docs/license-agreement/ and accept the license
    acceptLicense = false
    # Request the trial license with automated form https://www.outcoldsolutions.com/trial/request/
    license = 
    # If you are planning to setup log aggregation for multiple cluster, name the cluster
    fields.cluster = -

    [aws]
    # Specify AWS Region
    region = 

    [output.cloudwatch.logs]

  102-daemonset.conf: |

    # Container Log files
    [input.files]

    output.cloudwatch.logs.logstream = /{{pod_name}}/{{container_name}}/{{container_id}}/{{stream}}@{{host}}
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/container_logs/{{namespace}}/{{::coalesce(daemonset_name, deployment_name, statefulset_name, cronjob_name, job_name, replicaset_name, pod_name)}}/

    # Application Logs
    [input.app_logs]

    output.cloudwatch.logs.logstream = /{{pod_name}}/{{container_name}}/{{container_id}}/{{volume_name}}/{{file_path}}@{{host}}
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/container_logs/{{namespace}}/{{::coalesce(daemonset_name, deployment_name, statefulset_name, cronjob_name, job_name, replicaset_name, pod_name)}}/

    # Input all ^(([\w\-.]+\.log(.[\d\-]+)?)|(docker))$ files
    [input.files::logs]

    output.cloudwatch.logs.logstream = /{{file_path}}
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/host_logs/{{host}}

    # Input all ^(syslog|messages)(.\d+)?$ files
    [input.files::syslog]
    output.cloudwatch.logs.logstream = /{{file_path}}
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/host_logs/{{host}}

    [input.journald]
    output.cloudwatch.logs.logstream = /journal
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/host_logs/{{host}}

  103-addon.conf: |

    [input.kubernetes_events]
    output.cloudwatch.logs.logstream = /
    output.cloudwatch.logs.loggroup = /kubernetes/{{cluster}}/events/{{namespace}}/

Override the LogGroup and LogStream for the Deployments and Pods

By using annotations you can change the names for the LogGroup and LogStream for a specific Workload and Pod.

As an example, you can define the LogGroup and LogStream names for the Pod

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  annotations:
    cloudwatch.collectord.io/logs-logstream: '/{{pod_name}}/{{container_name}}/{{container_id}}'
    cloudwatch.collectord.io/logs-loggroup: '/kubernetes/{{cluster}}/container_logs/{{namespace}}/nginx/'
spec:
  containers:
  - name: nginx
    image: nginx

Or you can use these annotations for Deployment (or any other type of workload)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
  annotations:
    cloudwatch.collectord.io/logs-logstream: '/{{pod_name}}/{{container_name}}/{{container_id}}'
    cloudwatch.collectord.io/logs-loggroup: '/kubernetes/{{cluster}}/container_logs/{{namespace}}/nginx/'
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

Placeholders in the LogGroup and LogStream formats

The list of fields that you can use vary based on the data that you are forwarding.

List of fields

Pod logs

  • cluster - value that you define in the ConfigMap with the installation fields.cluster = -
  • namespace - namespace name
  • pod_name - pod name
  • container_name - container name
  • container_id - unique container id
  • container_image - container image
  • stream - stdout or stderr
  • node_name - the name of the node
  • pod_ip - unique pod id
  • node_id - unique node id
  • container_imageid - unique container image id
  • job_name - job name (if Pod scheduled as a Job)
  • job_id - job id (if Pod scheduled as a Job)
  • cronjob_name - job name (if Pod scheduled as a CronJob)
  • cronjob_id - job id (if Pod scheduled as a CronJob)
  • deployment_name - job name (if Pod scheduled as a Deployment)
  • deployment_id - job id (if Pod scheduled as a Deployment)
  • statefulset_name - job name (if Pod scheduled as a StateFulSet)
  • statefulset_id - job id (if Pod scheduled as a StateFulSet)
  • replicaset_name - job name (if Pod scheduled as a ReplicaSet)
  • replicaset_id - job id (if Pod scheduled as a ReplicaSet)
  • replicationcontroller_name - job name (if Pod scheduled as a ReplicationController)
  • replicationcontroller_id - job id (if Pod scheduled as a ReplicationController)
  • daemonset_name - job name (if Pod scheduled as a DaemonSet)
  • daemonset_id - job id (if Pod scheduled as a DaemonSet)
  • ec2_instance_id - EC2 instance id if running on EC2 instances
  • ec2_instance_type - EC2 instance type if running on EC2 instances
  • host - host name
  • timestamp - timestamp (use with the format function)

Application logs

  • cluster - value that you define in the ConfigMap with the installation fields.cluster = -
  • namespace - namespace name
  • pod_name - pod name
  • container_name - container name
  • container_id - unique container id
  • container_image - container image
  • stream - stdout or stderr
  • node_name - the name of the node
  • pod_ip - unique pod id
  • node_id - unique node id
  • container_imageid - unique container image id
  • job_name - job name (if Pod scheduled as a Job)
  • job_id - job id (if Pod scheduled as a Job)
  • cronjob_name - job name (if Pod scheduled as a CronJob)
  • cronjob_id - job id (if Pod scheduled as a CronJob)
  • deployment_name - job name (if Pod scheduled as a Deployment)
  • deployment_id - job id (if Pod scheduled as a Deployment)
  • statefulset_name - job name (if Pod scheduled as a StateFulSet)
  • statefulset_id - job id (if Pod scheduled as a StateFulSet)
  • replicaset_name - job name (if Pod scheduled as a ReplicaSet)
  • replicaset_id - job id (if Pod scheduled as a ReplicaSet)
  • replicationcontroller_name - job name (if Pod scheduled as a ReplicationController)
  • replicationcontroller_id - job id (if Pod scheduled as a ReplicationController)
  • daemonset_name - job name (if Pod scheduled as a DaemonSet)
  • daemonset_id - job id (if Pod scheduled as a DaemonSet)
  • file_path - for application logs and host logs
  • volume_name - for application logs
  • ec2_instance_id - EC2 instance id if running on EC2 instances
  • ec2_instance_type - EC2 instance type if running on EC2 instances
  • host - host name
  • timestamp - timestamp (use with the format function)

Host logs

  • cluster - value that you define in the ConfigMap with the installation fields.cluster = -
  • node_name - the name of the node
  • node_id - unique node id
  • file_path - for application logs and host logs
  • ec2_instance_id - EC2 instance id if running on EC2 instances
  • ec2_instance_type - EC2 instance type if running on EC2 instances
  • host - host name
  • timestamp - timestamp (use with the format function)

Kubernetes Events

  • cluster - value that you define in the ConfigMap with the installation fields.cluster = -
  • namespace - namespace name
  • timestamp - timestamp (use with the format function)

Placeholder functions

To configure the names for the LogGroup and LogStream we also provide following functions

  • ::format(arg1) - can be applied to the timestamp field, where arg1 is the format, as defined by the Go language time.Format
  • ::coalesce(arg1,..., argN) - where arg1,..., argN will be used in the order, the first not null field is used
  • ::path_escape() - URL path escape
  • ::query_escape() - URL query escape
  • ::replace(arg1, arg2) - replace arg1 with arg2 in string

Yo can use more than one function, for example

::coalesce(cronjob_name, job_name)::replace(foo, bar)

Summary

As you can see Collectord provides you reach control for the names of the LogGroup and LogStream, if you believe the default patterns for the LogGroup and LogStream do not work for you, you can easily override them by changing the configuration and apply it to all forwarded data. As an alternative you can specify them with the annotations for the Workloads and Pods.

collectord, kubernetes, eks, aws, cloudwatch, loggroup, logstream, cloudwatch logs

About Outcold Solutions

Outcold Solutions provides solutions for building centralized logging infrastructure and monitoring Kubernetes, OpenShift and Docker clusters. We provide easy to setup centralized logging infrastructure with AWS services. We offer Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers.