Skip to content

Week 11 - Even More Kubernetes

Welcome to week 11. By now, your application should have Service and Ingress resources created. TLS should also be implemented via cert-manager. Your database should exist in a separate deployment, and your Kubernetes integration with GitLab should be functioning correctly.

We strongly recommend reviewing the Kubernetes lecture slides, as the theory is explained more thoroughly there than in this lab guide.

Development requirements

Health check endpoints

For liveness endpoint /healthz update your application so that it returns HTTP 200 ("ok") when the application is running. The endpoint should be very lightweight and should not depend on any external services.

For readiness endpoint /readyz update your application so that it returns HTTP 200 ("ok") when the application is ready to accept traffic. A common approach is to check whether the database is responding properly.

These endpoints are used to inform Kubernetes about the state of your application. More information is available in the operations requirements under the Probes section.

Complete

Implement health check endpoints in your application's backend. The required endpoints are health and readiness.

Metrics

Metrics should be served from a separate container port to avoid interfering with the regular application traffic. It is best practice to expose the gathered metrics at the /metrics endpoint. Be mindful of possible timing issues.

To achieve proper observability, consider the four golden signals:

  • latency metrics describe the time to make requests,
  • traffic metrics describe the amount of demand or how many requests are made for one time unit,
  • errors metrics describe how often the requests are failing,
  • saturation metrics measure how full the system is.

Complete

Considering these golden signals create at least one metric for each except saturation which is easier to implement from Grafana (3 metrics). And then make a metric that counts how many posts there are or how many users registered (3 plus 1 makes 4 metrics in total).

Operations requirements

Storage

Longhorn is a lightweight, open-source, distributed block storage system for Kubernetes. It works by replicating volumes across nodes. The Longhorn manager orchestrates creation, replication, and rebuilds. It also supports snapshots, backups, volume expansion, and UI-based management.

Longhorn provides persistent storage across Pod restarts. Users can request storage through a Persistent Volume Claim (PVC), after which the Longhorn manager creates a PV. Once a PVC exists, a Pod can mount it and use the Persistent Volume for reading and writing data. To use Longhorn for your volumes, set storageClassName: longhorn in your PVC. The default storage system in this course’s student Kubernetes cluster is also Longhorn, so it will be used even if you do not explicitly specify it.

As many teams have implemented S3 as their storage provider, S3 MinIO operator is installed in the student cluster. In production, S3 would typically run on a separate machine or cluster, but for this course, running it on the same node is sufficient. The operator lets users create and manage S3 buckets and access credentials directly through Kubernetes manifests.

Refer to the operator’s documentation for full details.

Example setups are available to spin up an S3 instance. Create all resources inside your team's namespace.

Because our S3 instance is on the same cluster and also uses Longhorn for its volumes, using the default Longhorn storage class would cause double replication. The S3 operator replicates volumes based on the number of servers, while Longhorn defaults to 3 volume replicas.

Important

To avoid double replication, use the longhorn-single storage class or configure the S3 tenant with only 1 server.

Complete

Use persistent storage to store your data — either a Longhorn volume or the S3 operator.

Probes

The kubelet uses liveness probes to know when to restart a container. For example, liveness probes can catch deadlocks, where the application is running but unable to make progress. Restarting the container can help maintain application availability.

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. One use of this signal is to control which Pods are used as backends for Services. A Pod is considered ready when its Ready condition is true. When a Pod is not ready, it is removed from Service load balancers.

Complete

Implement LivenessProbes and ReadinessProbes in your Kubernetes Deployment.

For testing port forward the service or pod endpoint and then query your application at the /healthz and readyz addresses, like this:

kubectl port-forward svc/backend -n <your-team-namespace> 9911:app-port # in a separate terminal window
curl -i http://localhost:9911/readyz
# And the answer here must be 'OK'

Scraping metrics with Prometheus

Prometheus is the de-facto monitoring system for Kubernetes. It collects, stores, and queries metrics in a time-series format. First, Prometheus scrapes the endpoints it is monitoring, then stores the data, which is available for querying with the PromQL language. To make your application's metrics endpoint discoverable by Prometheus, you need to set appropriate annotations in your Service or use a CR called ServiceMonitor.

Prometheus will scrape endpoints with the annotation prometheus.io/scrape: "true". Here's an example setup:

apiVersion: v1
kind: Service
metadata:
  name: app-backend
  namespace: example
  labels:
    app: app-backend
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"     # <— the metrics port that the application exposes, this will be scraped
    prometheus.io/path: "/metrics"
spec:
  selector:
    app: app-backend
  ports:
    - name: http
      port: 8080
      targetPort: http
    - name: metrics
      port: 9090                   # scrape metrics port
      targetPort: metrics          # port name set in the Deployment/Pod

Complete

Implement Prometheus monitoring for your application.

To test you can curl your /metrics endpoint similarly as in the previous section.

Grafana dashboard

Grafana is a visualization and analytics tool that turns Prometheus metrics into interactive dashboards. It queries metrics from Prometheus using PromQL. Access Grafana at the course monitoring URL using your university credentials via "Sign in with OIDC-Keycloak". A job runs every 10 minutes to grant each team access to their dashboard folder after user has logged in. The folder can be found under Dashboards with your team’s name. If your folder doesn't appear after 10 minutes, contact a TA.

Inside your team’s folder, you have permissions to create and edit dashboards. After creating a dashboard, you can add panels and run queries within them.

Complete

Create a dashboard in Grafana to monitor your application. The dashboard should have these panels:

  • Visualization of the 5 custom metrics from your application's /metrics endpoint.
  • 2 status panels to check whether your application and database are up or not.
  • At least 1 panel that visualizes PV storage usagege for any of your components.
  • 2 app Pods' resource usage panels - one for CPU and the other for Memory.

Documentation requirements

What we expect your documentation to include?

Healthpoint and Readinesspoint

  • How are the healthpoint and readinesspoint set up in your code / application side?
  • How are the ReadinessProbe and LivenessProbe set up in Kubernetes?
  • Include information on how to test the endpoints.

Metrics

  • What is the container port where metrics are being served from?
  • What metrics are collected and under what's their name value?
  • How are they scraped? Add Service info about metrics endpoint.
  • How have you visualised important metrics in Grafana? Bring out important queries.
  • Grafana board URL link.

Storage

  • What persistent storage solution do you use and describe its setup and architecture.

Tasks

  • Use persistent storage: S3 or longhorn volume.
  • Implement health points in your application.
  • Configure liveness and readiness probes in Kubernetes.
  • Add metrics code to your application that sends the information to metrics endpoint and a separate container port.
  • Configure Prometheus scraping in Kubernetes.
  • Create a Grafana dashboard that has 10 panels:
    • visualization of 4 custom metrics from your application,
    • 2 status panels to check whether your database and application are up or not,
    • 10 panel that visualizes storage usage - for database and application storage (longhorn PV or S3 PV),
    • 2 application Pod resource usage panels - one for CPU and the other for Memory.
  • Write Documentation.