Skip to content

Week 14 - Production

Last week, you migrated your Kubernetes deployments from static manifest files to a templated solution using Helm or Kustomize.

This week focuses on making your web application production-ready through automatic scaling and a dedicated production environment.

Operations requirements

Production deployment

Each team has a dedicated production namespace following the naming convention <team-name>-prod. Deploy your production configuration to this namespace using IaC: Helm values or Kustomize overlays.

Complete

Production deployment.

  • Deploy your application to your production namespace.
  • Use Helm or Kustomize to use different configurations in your namespaces.
    • The deployments don't need to have big differences.
    • You just need to demonstrate that you can deploy different configurations using Helm or Kustomize.
  • Deployment needs to happen in your GitLab pipeline.

Automatic scaling

Autoscaling and dynamic resource management are core functionalities of Kubernetes. Traffic patterns change throughout the day, and it would be useful if your application scaled automatically based on actual demand.

Autoscaling helps to optimise available resources and reduce the bill sent by your cloud provider at the end of the month. It also ensures that the client-facing experience stays as smooth as possible.

The most basic way to add autoscaling to a deployment is using kubectl autoscale commands, which configure a Horizontal Pod Autoscaler (HPA), a resource available in core Kubernetes.

HPA example, taken from the official documentation:

kubectl autoscale deployment deployment-name --cpu-percent=50 --min=1 --max=10

Kubernetes HPA makes scaling decisions based on the metrics available from the Kubernetes metrics API. This can be somewhat limiting, if you want to use more sophisticated external metrics or scale based on events.

There are tools that simplify this process and provide extra options for configuring autoscaling. One of these tools is called KEDA, short for Kubernetes-based Event Driven Autoscaling. KEDA is quite versatile and can, for example, make scaling decisions based on Prometheus metrics:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: scale-my-app
  namespace: namespace-name
spec:
  scaleTargetRef:
    name: deployment-to-scale
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus-stack-kube-prom-prometheus.prometheus-stack.svc.cluster.local:9090
        metricName: http_requests_per_second
        query: sum(rate(http_requests_total{namespace="namespace-name", pod=~"backend-.*"}[2m]))
        threshold: "10"

You will need to determine the appropriate metric and query for your application, the example might not work for you!

Complete

Configure automatic scaling for your application.

  • It is your choice whether to use classic Kubernetes HPA or KEDA.
  • As your applications probably will not receive much traffic, configure the autoscaling such that it can be easily demonstrated.
  • You do not need to autoscale everything. Choose those deployments where horizontal scaling would provide the most benefit.

Affinity and topology constraints

Complete

Configure affinity or topology constraints.

  • A single node violently dying should not pull your application offline.
  • You can prove that by showing us the affinity/topology constraint you use.
  • Do keep in mind that you have multiple deployments/statefulsets.
  • Check the lecture to get a starting point for your configurations.

Chaos engineering

As mentioned in the lecture, we will use chaos engineering principles to test your application's fault tolerance.

To clarify:

  • We will run chaos experiments only in your production namespaces.
  • Your production application should have zero downtime.
  • We will simulate pod faults.
    • By injecting faults into random pods or outright killing them.
    • There are no exceptions; every pod in your production namespace could be a target.
  • We might simulate network faults.
    • Network disconnection, high latency, high packet loss rate.
  • We might simulate stress scenarios.
    • By injecting a rogue process into your containers.
  • The exact interval or extent of induced faults is not set in stone.
    • Consider it part of the chaos.

Documentation requirements

What we expect your documentation to include?

Production deployment

  • How do your staging and production environments differ?
  • How do you deploy the different configurations?

Automatic scaling

  • Do you use HPA or KEDA for automatic scaling?
  • How is your autoscaling implemented? What metrics do you use?
  • Which deployments are you scaling? Why?

Affinity/topology constraints

  • How do you prevent your application going offline if a single Kubernetes node fails?

Tasks

  • Set up your production environment in your <team-name>-prod namespace.
    • Using Helm or Kustomize in your pipeline.
  • Configure autoscaling for your application.
    • Make it easily demonstratable.
  • Configure affinity or topology constraints.
    • A single node violently dying should not pull your application offline.
  • Document your work.