Collectord

Kubernetes Centralized Logging with AWS S3, Athena, Glue and QuickSight

Troubleshooting

Verify configuration

Get the list of the pods

kubectl get pods -n collectord-s3

The output will be similar to

NAME                                          READY   STATUS    RESTARTS   AGE
collectord-s3-4n52x                   1/1     Running   0          96m
collectord-s3-addon-6b6bbdfdd-g8qhm   1/1     Running   0          96m

Considering that we have 2 different deployment types, the DaemonSet we deploy on every node and one Deployment addon (collectord-s3-addon) verify one node from each deployment (in example below change the pod names to the pods that are running on your cluster).

kubectl exec -n collectord-s3 collectord-s3-addon-6b6bbdfdd-g8qhm /collectord verify
kubectl exec -n collectord-s3 collectord-s3-4n52x  /collectord verify

For each command you will see an output similar to

Version = 6.0.300
Build date = 190308
Environment = kubernetes


  General:
  + conf: OK
  + db: OK
  + db-meta: OK
  + instanceID: OK
    instanceID = 2M563HM3871R8KDT6P74V17RD8
  + license load: OK
  + license expiration: OK
  + license connection: OK

  Kubernetes configuration:
  + api: OK
  + volumes root: OK
  + runtime: OK
    docker

  Docker configuration:
  + connect: OK
    containers = 22
  + path: OK
  + files: OK

  CRI-O configuration:
  - ignored: OK
    kubernetes uses other container runtime

  File Inputs:
  x input(syslog): FAILED
    no matches
  x input(logs): FAILED
    no matches
  x input(journald): FAILED
    err = stat /rootfs/var/log/journal/: no such file or directory

Errors: 3

With the number of the errors at the end. In our example we show output from minikube, where we see some invalid configurations, like

  • input(syslog) - minikube does not persist syslog output to disk, we will not be able to see these logs in application
  • input(logs) - minikube does not have any host log files on /var/log
  • input(journald) - minikube does not persist journald on disk

If you find some error in the configuration, after applying the change kubectl apply -f ./collectord-s3.yaml you will need to recreate pods, for that you can just delete all of them in our namespace kubectl delete pods --all -n collectord-s3. The workloads will recreate them.

Collect diagnostic information

If you need to open a support case you can collect diagnostic information, including performance, metrics and configuration.

1. Collect diagnostics information run following command

Choose pod from which you want to collect a diag information.

The following command takes several minutes.

kubectl exec -n collectord-s3 collectord-s3-4n52x  /collectord -- diag --stream 1>diag.tar.gz

You can extract a tar archive to verify the information that we collect. We include information about performance, memory usage, basic telemetry metrics, information file with the information of the host Linux version and basic information about the license.

2. Collect logs

kubectl logs -n collectord-s3 --timestamps collectord-s3-bwmwr  1>collectord-s3.log 2>&1

3. Run verify

kubectl exec -n collectord-s3 collectord-s3-bwmwr /collectord verify > verify.log

4. Prepare tar archive

kubectl -czvf collectord-s3-$(date +%s).tar.gz verify.log collectord-s3.log diag.tar.gz

Pod is not getting scheduled

Verify that daemonsets have scheduled pods on the nodes

kubectl get daemonset --namespace collectord-s3

If in the output numbers under DESIRED, CURRENT, READY or UP-TO-DATE are 0, something can be wrong with configuration

NAME                            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
collectord-s3          0         0         0         0            0           <none>          1m

You can run command to describe current state of the daemonset/collectord-s3

$ kubectl describe daemonsets --namespace collectord-s3

In the output there are will be one daemonsetIn the last lines events reported for this daemonset, for example

...
Events:
  Type     Reason            Age                From                  Message
  ----     ------            ----               ----                  -------
  Warning  FailedCreate      31m                daemonset-controller  Error creating: pods "collectord-s3-" is forbidden: SecurityContext.RunAsUser is forbidden

This error means that you are using Pod Security Policies, in that case you need to add our Cluster Role to the privileged Pod Security Policy, with

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app: collectord-s3
  name: collectord-s3
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - privileged
- apiGroups:
  ...

Failed to pull the image

When you run command

$ kubectl get daemonsets --namespace collectord-s3

You can find that number under READY does not match DESIRED

NAMESPACE   NAME                     DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
default     collectord-s3   1         1         0         1            0           <none>          6m

Try to find the pods, that Kubernetes failed to start

$ kubectl get pods --namespace collectord-s3

If you see that collectord-s3- pod has an error ImagePullBackOff, as in the example below

NAMESPACE   NAME                             READY     STATUS             RESTARTS   AGE
default     collectord-s3-55t61     0/1       ImagePullBackOff   0          2m

In that case you need to verify that your Kubernetes cluster have access to the hub.docker.com registry.

You can run command

$ kubectl describe pods --namespace collectord-s3

Which should show you an output for each pod, including events raised for every pod

Events:
  FirstSeen LastSeen    Count   From            SubObjectPath               Type        Reason      Message
  --------- --------    -----   ----            -------------               --------    ------      -------
  3m        2m      4   kubelet, localhost  spec.containers{collectord-s3}  Normal      Pulling     pulling image "hub.docker.com/outcoldsolutions/collectord:6.0.301"
  3m        1m      6   kubelet, localhost  spec.containers{collectord-s3}  Normal      BackOff     Back-off pulling image "hub.docker.com/outcoldsolutions/collectord:6.0.301"
  3m        1m      11  kubelet, localhost                      Warning     FailedSync  Error syncing pod

Blocked access to external registries

If you are blocking external registries (hub.docker.com) for security reasons, you can copy image from external registry to your own repository with one host which have access to external registry

Copying image from hub.docker.com to your own registry

$ docker pull outcoldsolutions/collectord:6.0.301

After that you can re-tag it by prefixing with your own registry

docker tag  outcoldsolutions/collectord:6.0.301 [YOUR_REGISTRY]/outcoldsolutions/collectord:6.0.301

And push it to your registry

docker push [YOUR_REGISTRY]/outcoldsolutions/collectord:6.0.301

After that you will need to change your configuration yaml file to specify that you want to use image from different location

image: [YOUR_REGISTRY]/outcoldsolutions/collectord:6.0.301

If you need to move image between computers you can export it to tar file

$ docker image save outcoldsolutions/collectord:6.0.301 > collectord.tar

And load it on different docker host

$ cat collectord.tar | docker image load

Pod is crashing or running, but you don't see any data

Take a look at the pod logs using kubectl

kubectl logs -n collectord-s3 collectord-s3-4n52x
INFO 2019/03/09 21:31:46.124630 outcoldsolutions.com/collectord/main.go:294: Build date = 190308, version = 6.0.300
INFO 2019/03/09 21:31:46.124768 outcoldsolutions.com/collectord/main.go:92: reading configuration from /config/s3/kubernetes/daemonset/001-general.conf
INFO 2019/03/09 21:31:46.124868 outcoldsolutions.com/collectord/main.go:92: reading configuration from /config/s3/kubernetes/daemonset/002-daemonset.conf
INFO 2019/03/09 21:31:46.124956 outcoldsolutions.com/collectord/main.go:92: reading configuration from /config/s3/kubernetes/daemonset/secret/100-general.conf
INFO 2019/03/09 21:31:46.125019 outcoldsolutions.com/collectord/main.go:92: reading configuration from /config/s3/kubernetes/daemonset/user/101-general.conf
INFO 2019/03/09 21:31:46.125041 outcoldsolutions.com/collectord/main.go:92: reading configuration from /config/s3/kubernetes/daemonset/user/102-daemonset.conf
INFO 2019/03/09 21:31:46.125820 outcoldsolutions.com/collectord/main.go:282: InstanceID = 2M56FGJ0LIFOAKDT6P75HS0IP0, created = 2019-03-09 21:31:46.125808425 +0000 UTC m=+0.004556642
WARN 2019/03/09 21:31:46.327389 outcoldsolutions.com/core/aws/ec2_metadata/client.go:28: failed to retrieve /latest/meta-data/instance-id. Get http://169.254.169.254/latest/meta-data/instance-id: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
WARN 2019/03/09 21:31:46.527940 outcoldsolutions.com/core/aws/ec2_metadata/client.go:28: failed to retrieve /latest/meta-data/instance-type. Get http://169.254.169.254/latest/meta-data/instance-type: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
INFO 2019/03/09 21:31:46.552044 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:85: watching /rootfs/var/log//(glob = , match = ^(([\w\-.]+\.log(.[\d\-]+)?)|(docker))$)
INFO 2019/03/09 21:31:46.552146 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:85: watching /rootfs/var/log//(glob = , match = ^(syslog|messages)(.\d+)?$)
INFO 2019/03/09 21:31:46.552161 outcoldsolutions.com/collectord/environment/instance.go:1292: journald input: cannot get stat of the path /rootfs/var/log/journal/
INFO 2019/03/09 21:31:46.650901 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 44cf44d7-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.652303 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 46990b04-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.673293 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 66351eb6-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.687935 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching bac3e0e0-42b2-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.702287 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 2319f05f-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.716638 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 2330b439-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.717627 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 4305fa0e-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.719393 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 48fb8e3d-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.752050 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 663f7aea-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.769874 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 23320379-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.771253 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 242bb597-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.773658 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching 44cf598f-42a3-11e9-9920-0800277a19a4
INFO 2019/03/09 21:31:46.907883 outcoldsolutions.com/collectord/license/license_check_pipe.go:155: license-check kubernetes BG5183Q44IE2M 0 0 ouMQMBLYDcTo3Lfen4CBmD7geZL8J5Uh1ueP4JKA3ZA 1552167106 1552167106 6.0.300 1552003200 true true 0
INFO 2019/03/09 21:31:46.999861 outcoldsolutions.com/collectord/pipeline/output/s3/glue_service.go:263: database already exists .kubernetes
INFO 2019/03/09 21:31:47.129983 outcoldsolutions.com/collectord/pipeline/output/s3/glue_service.go:299: table already exists .kubernetes.host_logs
INFO 2019/03/09 21:31:47.273230 outcoldsolutions.com/collectord/pipeline/output/s3/glue_service.go:299: table already exists .kubernetes.container_logs
INFO 2019/03/09 21:31:47.444017 outcoldsolutions.com/collectord/pipeline/watcher/watcher.go:305: kubernetes_watcher - watching bacd39d7-42b2-11e9-9920-0800277a19a4

There are could be warning, letting you know about the existing issue. Which could be incorrect policy for the AWS service, or invalid configuration.

Documentation does not help?

Contact us.

  • Installation
    • Setup centralized Logging in 5 minutes.
    • Automatically forward host, container and application logs.
    • Test our solution with the 30 days evaluation license.
  • AWS Glue Catalog
    • Table definitions in Glue Catalog.
  • Querying data with Athena
    • Query automatically partitioned data with AWS Athena.
    • Best practices to work with Athena.
    • Query examples for container_logs, events and host_logs.
  • QuickSight for Dashboards and Reports
    • Connecting AWS QuickSight with the Athena.
    • Building dashboards.
  • Access control
    • Limit access to the data with IAM Policy.
  • Annotations
    • Forwarding application logs.
    • Multi-line container logs.
    • Fields extraction for application and container logs (including timestamp extractions).
    • Hiding sensitive data, stripping terminal escape codes and colors.
  • Configuration
    • Advanced configurations for collectord.
  • Troubleshooting
    • Troubleshooting steps.
    • Verify configuration.

About Outcold Solutions

Outcold Solutions provides solutions for building centralized logging infrastructure and monitoring Kubernetes, OpenShift and Docker clusters. We provide easy to setup centralized logging infrastructure with AWS services. We offer Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers.