Collectord

Kubernetes Centralized Logging with AWS S3, Athena, Glue and QuickSight

Querying data with Athena

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. (from AWS Athena)

In the Athena query dashboard, switch to the kubernetes database to run queries inside of this database.

Athena Overview

Partitions

Considering that you pay for the amount of scanned data you need to be sure to always include partition in the WHERE clause to limit which files are going to be scanned. Let's look at the example. Table container_logs has two fields, one is the timestamp (time in RFC3339Nano format, compatible with iso8601 format), and second one is dt, which is by default represent partitions by date. If we need to write a query to find amount of warning messages in container in a specific date, we can write it using timestamp column or dt column.

At first we run the query by filtering with partitioned column dt

Athena Query with Partition

After that we query by filtering with not-partitioned column timestamp

Athena Query without Partition

Both queries give us the same result, but second query scanned 395.47MB and first one 15.18MB which is in 26 times more expensive.

Verify the list of columns used as partitions in table schemas of AWS Glue Catalog.

SQL Syntax

Athena is built with Presto. Use Presto documentation -> Functions and Operators to learn about available functions.

Bucket for query results

Athena writes query results to another S3 bucket, that you specify in settings (default is s3://aws-athena-query-results-{AccountId}-{Region}/)

Athena Settings

In the Properties of this bucket we recommend to enable default encryption (to keep the data encrypted at rest)

Bucket default encryption

And specify the Lifecycle policy to clean old results after several days

Bucket lifecycle policy

Querying container logs

As we learned, it is important to use partitions to improve performance of queries and reduce the cost. We recommend to create a View on top of container_logs that will always search only for the last N days and use this view in all the queries.

As an example, this is a view that only goes back 7 days.

CREATE OR REPLACE VIEW container_logs_7d AS
SELECT *
FROM container_logs
WHERE dt>=date_format(date_add('day', -7, now()), '%Y%m%d');

Partitioned columns

  • cluster - value defined when you run collectord-s3
  • namespace - Kubernetes namespace name
  • workload - workload name, that schedules the Pod (DaemonSet, Deployment, StatefulSet, CronJob, Job, ReplicaSet, Pod)
  • pod_name - the name of the pod
  • container_name - the name of the container
  • dt - date of the logs

Examples

Example 1. Searching for pattern in the message

To search for a specific word in the message string you can use operator like

select timestamp, message, pod_name, host 
from container_logs_7d 
where message like '%WARN%'

Or you can use regular expressions

select timestamp, message, pod_name, host 
from container_logs_7d 
where regexp_like(message, '(WARN|ERROR)')

Example 2. Extracting the fields from the message

With Athena you can extract the fields from the message using regular expressions. For example, we have a docker registry running inside Kubernetes, that reports logs in the format

time="2019-03-08T00:27:02.468507707Z" level=info msg=response go.version=go1.9.4 http.request.host="10.131.0.14:5000" http.request.id=53fd4d79-d2a5-43a9-9d8f-fbd157436245 http.request.method=GET http.request.remoteaddr="10.131.0.1:56376" http.request.uri=/healthz http.request.useragent=kube-probe/1.11+ http.response.duration="29.634µs" http.response.status=200 http.response.written=0 instance.id=d725b073-2743-4e4a-bf0b-a2e0472f7dcc

If we want to get an average of the response time by method we can extract these fields with regular expression

select avg(cast(duration as double)) as avg_duration, method
from (
  select 
    regexp_extract(message, 'http\.response\.duration="([\d\.]+)µs"', 1) as duration,
    regexp_extract(message, 'http\.request\.method=([\w]+)', 1) as method
  from container_logs_7d  
  where pod_name like 'docker-registry%'
) as t
group by method

Example 3. Search in container logs and filter by pod labels

select timestamp, message, pod_name, host 
from container_logs_7d 
where message like '%WARN%' and contains(pod_labels, 'app=example1')

Querying events

For events you can also create a view, that only search for the last 7 days.

CREATE OR REPLACE VIEW events_7d AS
SELECT *
FROM events
WHERE dt>=date_format(date_add('day', -7, now()), '%Y%m%d');

Partitioned columns

  • cluster - value defined when you run collectord-s3
  • namespace - Kubernetes namespace name
  • dt - date of the logs

Examples

The message in every event is a JSON representation of the event how it defined by the Kubernetes API. You can extract fields from the JSON messages by using JSON Functions and Operators.

Query latest events

Because events can be triggered more than once, we want to group by the uid and find the latest timestamp for this uid

SELECT
  max(namespace) as namespace,
  max(timestamp) as timestamp,
  max(involved_object_kind) as involved_object_kind,
  max(involved_object_name) as involved_object_name,
  max(reason) as reason,
  max(message) as message,
  max(count) as count,
  uid
FROM 
(
  SELECT timestamp,
     namespace,
     json_extract_scalar(message, '$.object.metadata.uid') AS uid,
     json_extract_scalar(message, '$.object.involvedObject.kind') AS involved_object_kind, 
     json_extract_scalar(message, '$.object.involvedObject.name') AS involved_object_name,
     json_extract_scalar(message, '$.object.reason') AS reason,
     json_extract_scalar(message, '$.object.message') AS message,
     json_extract_scalar(message, '$.object.count') AS count
  FROM events_7d
) AS t
group by uid
order by 2 desc

Find all warnings

We can find events with warnings by using json_extract_scalar in the WHERE clause

SELECT timestamp, namespace, message
FROM events_7d
WHERE json_extract_scalar(message, '$.object.type') = 'Warning'

Querying host logs

Similarly to container logs, create a view for the host logs, that always will limit the query back to 7 days.

CREATE OR REPLACE VIEW host_logs_7d AS
SELECT *
FROM host_logs
WHERE dt>=date_format(date_add('day', -7, now()), '%Y%m%d');

Partitioned columns

  • cluster - value defined when you run collectord-s3
  • host - the host name
  • dt - date of the logs

Examples

Query logs from a specific host

select timestamp, syslog_component, message, file_path
from host_logs_7d
where host='master1.cluster-3-11.local.outcold.solutions'
order by timestamp desc

Query logs from the master nodes

select timestamp, host, syslog_component, message, file_path
from host_logs_7d
where contains(node_labels, 'node-role.kubernetes.io/master=true')
order by timestamp desc
  • Installation
    • Setup centralized Logging in 5 minutes.
    • Automatically forward host, container and application logs.
    • Test our solution with the 30 days evaluation license.
  • AWS Glue Catalog
    • Table definitions in Glue Catalog.
  • Querying data with Athena
    • Query automatically partitioned data with AWS Athena.
    • Best practices to work with Athena.
    • Query examples for container_logs, events and host_logs.
  • QuickSight for Dashboards and Reports
    • Connecting AWS QuickSight with the Athena.
    • Building dashboards.
  • Access control
    • Limit access to the data with IAM Policy.
  • Annotations
    • Forwarding application logs.
    • Multi-line container logs.
    • Fields extraction for application and container logs (including timestamp extractions).
    • Hiding sensitive data, stripping terminal escape codes and colors.
  • Configuration
    • Advanced configurations for collectord.
  • Troubleshooting
    • Troubleshooting steps.
    • Verify configuration.

About Outcold Solutions

Outcold Solutions provides solutions for building centralized logging infrastructure and monitoring Kubernetes, OpenShift and Docker clusters. We provide easy to setup centralized logging infrastructure with AWS services. We offer Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers.