Collectord

Docker Centralized Logging with AWS S3, Athena, Glue and QuickSight

QuickSight for Dashboards and Reports

Amazon QuickSight is a fast, cloud-powered business intelligence (BI) service that makes it easy for you to deliver insights to everyone in your organization.

In this example, we will look at the example, how to build custom dashboards with AWS QuickSight from the logs, forwarded by Collectord.

Preparing the data

To prepare the logs, we use the following docker-compose definition, that contains of one nginx containers, and multiple clients, generating POST and GET requests.

version: "3.7"
services:

  nginx-server:
    image: nginx
  nginx-client-get-200:
    labels:
      io.collectord.s3.logs-output: devnull
    image: busybox
    deploy:
      replicas: 10
    command: ['/bin/sh', '-c', 'while true; do wget -qO- http://nginx-server:80; sleep 5; done']
    depends_on:
      - nginx-server
  nginx-client-post:
    labels:
      io.collectord.s3.logs-output: devnull
    image: busybox
    deploy:
      replicas: 4
    command: ['/bin/sh', '-c', 'while true; do wget -qO- --post-data=foo=x http://nginx-server:80; sleep 8; done']
    depends_on:
      - nginx-server
  nginx-client-get-404:
    labels:
      io.collectord.s3.logs-output: devnull
    image: busybox
    deploy:
      replicas: 5
    command: ['/bin/sh', '-c', 'while true; do wget -qO- http://nginx-server:80/404; sleep 10; done']
    depends_on:
      - nginx-server

After deploying the docker-compose we will scale the clients as

docker-compose scale nginx-client-get-200=10 nginx-client-post=4 nginx-client-get-404=5

After running this workload for about 10 minutes, we can start seeing the data in Athena with the following search (we can filter by service name com.docker.compose.service=nginx-server)

select timestamp, stream, message 
from container_logs 
where container_image='nginx' and container_name like '%nginx-server%' and contains(container_labels, 'com.docker.compose.service=nginx-server') order by timestamp desc

The format of the logs looks similar to

172.17.0.50 - - [09/Mar/2019:00:14:15 +0000] "POST / HTTP/1.1" 405 157 "-" "Wget" "-"
172.17.0.38 - - [09/Mar/2019:00:14:15 +0000] "GET / HTTP/1.1" 200 612 "-" "Wget" "-"
172.17.0.33 - - [09/Mar/2019:00:14:16 +0000] "GET / HTTP/1.1" 200 612 "-" "Wget" "-"

There are also some errors from the stderr that we want to ignore.

Extracting the fields

Let's extract the fields from the logs. We will use regular expressions to do that. Based on the Nginx documentation the default format for access logs is

log_format compression '$remote_addr - $remote_user [$time_local] '
                       '"$request" $status $bytes_sent '
                       '"$http_referer" "$http_user_agent" "$gzip_ratio"';

We can define the regular expression as similar as ^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$, where we just need to be only very accurate with the indexes for the subgroups. Let's create a view based on the regexp, where we will extract the most interesting fields and convert timestamp to unix timestamp to make it easily compatible with QuickSight.

-- View Example
CREATE OR REPLACE VIEW container_logs_nginx_7d AS
select 
  to_unixtime(from_iso8601_timestamp(timestamp)) as timestamp, 
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 1) as nginx_remote_addr,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 2) as nginx_remote_user,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 4) as nginx_request_method,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 5) as nginx_request_path,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 6) as nginx_request_http_version,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 7) as nginx_response_status,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 8) as nginx_bytes_sent,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 9) as nginx_referer,
  regexp_extract(message, '^([^\s]+) - ([^\s]+) \[(.+)\] "([^\s]+) ([^\s]+) ([^\s]+)" ([^\s]+) ([^\s]+) "(.+)" "(.+)" "(.+)"$', 10) as nginx_user_agent
from container_logs 
where container_image='nginx' and container_name like '%nginx-server%' and contains(container_labels, 'com.docker.compose.service=nginx-server') and stream='stdout' and dt>=date_format(date_add('day', -7, now()), '%Y%m%d');

Athena Query with Fields

QuickSight Access

Make sure that QuickSight have access to the Athena and the S3 Buckets where you store the logs and results. You can modify permissions only in N. Virginia region, go to the Manage QuickSight, Account Settings, and verify Connected products & services

QuickSight permissions

QuickSight Data Set

With AWS QuickSight we can create a new Data Set from Athena

QuickSight new data set

Select database kubernetes and table (the view we just created) container_logs_nginx_7d.

QuickSight select table

Creating the view is an optional step, you can always define a data set with the custom SQL.

Before saving the data set you can always choose "Preview data", and modify the types of the fields. In this example we have changed timestamp to Date and nginx_bytes_sent to Int.

Data Set Definition

You can choose to store the data in Spice (data cache in QuickSight) or always execute the Query from Athena on every time you will open Dashboard or analytics. If you choose Spice, make sure to configure when QuickSight should refresh the data in the Spice.

After that click "Save & Analyze"

Building dashboards

By following the guidance from the QuickSight documentation you can start building dashboards from the data in Data Set.

Dashboard

Reports

You can configure to deliver reports from QuickSight, see Sending Reports by Email.

  • Installation
    • Setup centralized Logging in 5 minutes.
    • Automatically forward host, container and application logs.
    • Test our solution with the 30 days evaluation license.
  • AWS Glue Catalog
    • Table definitions in Glue Catalog.
  • Querying data with Athena
    • Query automatically partitioned data with AWS Athena.
    • Best practices to work with Athena.
    • Query examples for container_logs, events and host_logs.
  • QuickSight for Dashboards and Reports
    • Connecting AWS QuickSight with the Athena.
    • Building dashboards.
  • Access control
    • Limit access to the data with IAM Policy.
  • Annotations
    • Forwarding application logs.
    • Multi-line container logs.
    • Fields extraction for application and container logs (including timestamp extractions).
    • Hiding sensitive data, stripping terminal escape codes and colors.
  • Configuration
    • Advanced configurations for collectord.
  • Troubleshooting
    • Troubleshooting steps.
    • Verify configuration.

About Outcold Solutions

Outcold Solutions provides solutions for building centralized logging infrastructure and monitoring Kubernetes, OpenShift and Docker clusters. We provide easy to setup centralized logging infrastructure with AWS services. We offer Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers.