SQR-031: EFD deployment and operation

  • Angelo Fausti

Latest Revision: 2019-07-24

Note

This technote is not yet published.

Instructions to deploy and operate the EFD

1   TL;DR

1.1   Tucson lab deployment

1.2   Summit deployment

Follow #com-efd at LSSTC Slack for updates.

2   Why are we doing this?

The EFD is a solution based on Kafka and InfluxDB for recording telemetry, commands, and events for LSST. It was prototyped and tested with the simulator for the M1M3 subsystem. The next logical step is to deploy and test the EFD with real hardware.

The Auxilary Telescope Camera (ATCamera) is being tested at the Tucson lab, and it presents an excellent opportunity for testing the EFD by recording data from these tests.

We might need to deploy the EFD at the Summit, Base facility, and LDF. Thus, the ability to deploy the EFD at different environments quickly and reproduce these deployments is crucial. To solve this problem we’ve adopted Docker and Kubernetes as our deployment platform, and a combination of Terragrunt, Terraform, and Helm to manage and automate the deployments.

In this technote, we demonstrate that we can deploy the EFD on a single machine with Docker and k3s (“kubes”), a lightweight Kubernetes using the same Terraform modules and Helm charts that we used at Google Could. We also provide instructions on how to operate and use the EFD system.

3   The ATCamera test environment

As of June 2019, the ATCamera test environment at the Tucson lab runs SAL 3.9. The following subsystems are being tested and produce data: ATCamera, ATHeaderService, ATArchiver, ATMonochromator, ATSpectrograph, and Electrometer.

Kafka writers are responsible for sending messages from each SAL topic to the Kafka brokers.

The EFD runs on a dedicated machine ts-csc-01 (140.252.32.142). The first step to deploy it is to provision a Kubernetes cluster.

4   Provisioning a k3s (“kubes”) cluster

k3s is a lightweight Kubernetes that runs in a container.

4.1   Requirements

We assume a Linux box running Centos 7. We’ve installed Docker CE and kubectl:

  • Docker CE 18.09.6
  • kubectl 1.14.1

Note

We also tried k3s locally, on Docker Desktop for Mac, but it can not route traffic from the host to the container (the --network host option does not work).

4.2   Configure Docker to start on boot

CentOS uses systemd to manage which services start when the system boots. Run the following to configure Docker to start on boot.

sudo systemctl enable docker

4.3   Run the k3s master

Run the k3s master with the following commands:

export K3S_PORT=6443
export K3S_URL=http://localhost:${K3S_PORT}
export K3S_TOKEN=$(date | base64)
export HOST_PATH=/data # change depending on your host
export CONTAINER_PATH=/opt/local-path-provisioner
sudo docker run  -d --restart always --tmpfs /run --tmpfs /var/run --volume ${HOST_PATH}:${CONTAINER_PATH} -e K3S_URL=${K3S_URL} -e K3S_TOKEN=${K3S_TOKEN} --privileged --network host --name master docker.io/rancher/k3s:v0.5.0-rc1 server --https-listen-port ${K3S_PORT} --no-deploy traefik

The --restart always option ensures that the k3s master is automatically restarted after a system reboot.

Data is persisted at $HOST_PATH with the --volume option (see also 4.4   Deploy the local-path provisioner).

The --network host option routes network traffic from the host to the container, we need that to reach the different services running inside k3s.

Note that we are not deploying Traefik Ingress Controller which is included in the k3s docker image, because the EFD already deploys the NGINX Ingress Controller.

To connect to the master you need to copy the kubeconfig file from the container, and set the KUBECONFIG environment variable so that kubectl knows how to connect to the cluster:

sudo docker cp master:/etc/rancher/k3s/k3s.yaml k3s.yaml
export KUBECONFIG=$(pwd)/k3s.yaml
kubectl cluster-info

Kubernetes master is running at https://localhost:6443
CoreDNS is running at https://localhost:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

To connect to the cluster from another machine, copy the k3s.yaml file and replace localhost by 140.252.32.142.

4.4   Deploy the local-path provisioner

The local-path provisioner creates hostPath persistent volumes on the node automatically. The directory /opt/local-path-provisioner is used as the path for provisioning. The provisioner is installed in the local-path-storage namespace by default.

kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

At this point you should see the following pods running in the cluster:

kubectl get pods --all-namespaces
NAMESPACE            NAME                                      READY   STATUS    RESTARTS   AGE
kube-system          coredns-695688789-r9gkt                   1/1     Running   0          5m
local-path-storage   local-path-provisioner-5d4b898474-vz2np   1/1     Running   0          4s

4.5   Add workers (optional)

If there are other machines, you can easily add workers to the cluster. Copy the node-token from the master:

sudo docker cp master:/var/lib/rancher/k3s/server/node-token node-token

and start the worker(s):

export SERVER_URL=https://<master external IP>:${K3S_PORT}
export NODE_TOKEN=$(cat node-token)
export WORKER=kube-0
export HOST_PATH=/data # change depending on your host
export CONTAINER_PATH=/opt/local-path-provisioner
sudo docker run -d --tmpfs /run --tmpfs /var/run -v ${HOST_PATH}:${CONTAINER_PATH} -e K3S_URL=${SERVER_URL} -e K3S_TOKEN=${NODE_TOKEN} --privileged --name ${WORKER} rancher/k3s:v0.5.0-rc1

Note

By default /opt/local-path-provisioner is used across all the nodes to store persistent volume data, see local-path provisioner configuration.

5   Deploy the EFD

Once the cluster is ready, we can deploy the EFD.

5.1   Requirements

  • AWS credentials (we save the deployment configuration to an S3 bucket and create names for our services on Route53)
  • TLS/SSL certificates for the lsst.codes domain (we share certificates via SQuaRE Dropbox account)
  • Deployment configuration for the EFD test environment (we share secrets via SQuaRE 1Password account)

Note

The current mechanism to share secrets and certificates is not ideal. We still need to integrate our EFD deployment with the Vault service implemented by SQuaRE.

We automate the deployment of the EFD with Terraform and Helm. Terragrunt is used to manage the different deployment environments (dev, test, and production) while keeping the Terraform modules environment-agnostic. We also use Terragrunt to save the Terraform configuration and the state of a particular deployment remotely.

Install Terragrunt, Terraform, and Helm.

git clone https://github.com/lsst-sqre/terragrunt-live-test.git
cd terragrunt-live-test
make all
export PATH="${PWD}/bin:${PATH}"

Install the SSL certificates (this step requires access to the SQuaRE Dropbox account).

make tls

5.2   Initialize the deployment environment

The following commands initialize the deployment environment. (Terragrunt uses an S3 bucket to save the deployment configuration, so this step requires the AWS credentials).

export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""

cd afausti/efd
make all
terragrunt init --terragrunt-source-update
terragrunt init

5.3   Deployment configuration

The EFD deployment configuration on k3s is defined by a set of Terraform variables listed in the terraform-efd-k3s repository.

Edit the terraform.tfvars file with the values obtained from the SQuaRE 1Password account. Search for terraform vars (efd test).

Finally, deploy the EFD with the following commands:

terragrunt plan
terragrunt apply

6   Testing the EFD

The EFD deployment can be tested using kafkacat a command line utility implemented with librdkafka the Apache Kafka C/C++ client library.

Run in producer mode (-P) to produce messages for a test topic:

kafkacat -P -b <kafka broker url> -t test_topic
Hello EFD!
^D

Run in Metadata listing mode (-L) to retrieve metadata from the cluster:

kafkacat -L -b <kafka broker url>

The -d option enables librdkafka debugging. For instance, -d broker can be used to debug connection issues with the cluster:

kafkacat -L -b <kafka broker url>  -d broker

7   Monitoring

The EFD deployment includes dashboards for monitoring the k3s cluster and Kafka instrumented by Prometheus metrics. You can login with your GitHub credentials if you are a member of the lsst-sqre organization.

8   Restarting the EFD manually

k3s is configured to automatically start after a system reboot (--restart-always flag). In case you need to start the k3s master manually, first check its status:

sudo docker ps -a

If k3s master status is Exited start with the following command:

sudo docker start master

After a few minutes, all Kubernetes pods should be running again:

export KUBECONFIG=$(pwd)/k3s.yaml
kubectl cluster-info
kubectl get pods --all-namespaces

9   Accessing EFD data

Use the Chronograf interface for time-series visualization and dashboarding.

In this notebook we we demonstrate how to extract data from the EFD using aioinflux, a Python client for InfluxDB, and proceed with data analysis using Pandas dataframes.