Secure SSH access¶

In our course, we deploy applications to a Virtual Machine from GitLab CI/CD by storing a long-lived SSH private key in a CI/CD variable. This approach is intentionally simplified to keep the focus on DevOps principles and CI/CD logic without the overhead of managing additional security infrastructure.

However, as discussed in the lecture, this method carries significant security risks in a real-world production environments. This document outlines why it's risky and explores the standard, more secure alternatives used in the industry.

Access methods¶

Method	Security Posture	Target Setup Complexity	Scalability	Key Characteristic
Static SSH Key	🔴 Poor	✅ Low	🔴 Poor	Simple but insecure and brittle.
SSH Certificates	🟢 Good	🟡 Medium	🔵 Excellent	Short-lived, centrally-issued credentials.
Identity-Aware Proxy	🔵 Excellent	🟠 High	🔵 Excellent	Zero-trust access with no open inbound ports.
Pull-Based (GitOps)	🔵 Excellent	🟠 High	🔵 Excellent	Inverted model; servers pull their desired state.

Static SSH key¶

The method we use involves generating an SSH key pair, adding the public key to the server's ~/.ssh/authorized_keys file, and adding the private key as a protected variable (e.g., SSH_PRIVATE_KEY) in GitLab's CI/CD settings.

Why We Use It: It's the most straightforward way to grant a CI/CD runner programmatic access to a server. It requires no extra services, is quick to set up, and allows us to focus on the pipeline's deployment script itself.

Virtual machine setup¶

sequenceDiagram
    participant Developer
    participant Target_VM
    participant GitLab_CI/CD

    Developer->>Developer: Generates SSH Key Pair (public + private)
    Developer->>Target_VM: Copies public key to ~/.ssh/authorized_keys
    Developer->>GitLab_CI/CD: Adds private key as CI/CD variable (SSH_PRIVATE_KEY)

CI/CD pipeline usage sequence¶

sequenceDiagram
    participant GitLab_CI/CD_Runner
    participant Target_VM

    GitLab_CI/CD_Runner->>Target_VM: Connects via SSH using stored private key
    activate Target_VM
    Target_VM-->>GitLab_CI/CD_Runner: Authenticates connection
    GitLab_CI/CD_Runner->>Target_VM: Executes deployment commands
    deactivate Target_VM

Why It's Bad in Production:

Permanent Credential: The private key is a static, long-lived secret. If an attacker gains access to your GitLab project (e.g., through a compromised developer account or a vulnerability in the CI/CD pipeline), they can steal this key and gain persistent, often privileged, access to your server.
No Automatic Rotation: SSH keys are not typically rotated automatically. A leaked key remains a valid credential until it is manually discovered and removed from the server, which can be months or years later.
- Rotation at scale is also made difficult, as each private key and it's counterpart need to be found, identified and rotated across the fleet of servers.
Poor Auditability: Every deployment uses the same SSH key. It's difficult to distinguish in server logs which specific pipeline job or user action initiated a session.

Signed SSH certificates¶

This approach replaces static authorized SSH keys with temporary signed SSH certificates. A trusted Certificate Authority (CA) signs a certificate for the CI/CD runner that is valid for a very short time (e.g., 5 minutes) - just long enough to complete the deployment.

How It Works:

You operate an SSH CA. The target server is configured to trust this CA, not individual public keys.
The GitLab CI/CD job authenticates to the CA using some authentication mechanism. Many are possible, e.g. with Hashicorp Vault:
- Tokens
- Kerberos
- JWT/OIDC
- AWS
The CA signs the runner's ephemeral public key, creating a short-lived SSH certificate.
The runner uses this certificate to log into the server. Once the job is done, the certificate expires and becomes useless.

Virtual machine setup¶

sequenceDiagram
    participant Admin
    participant SSH_CA_Service
    participant Target_VM

    Admin->>SSH_CA_Service: Deploys & Configures SSH CA
    Admin->>SSH_CA_Service: Generates CA public/private key pair
    Admin->>Target_VM: Configures sshd to trust CA (TrustedUserCAKeys)
    Admin->>Admin: Configures CA to issue certs based on CI/CD identity

CI/CD pipeline usage sequence¶

sequenceDiagram
    participant GitLab_CI/CD_Runner
    participant SSH_CA_Service
    participant Target_VM

    GitLab_CI/CD_Runner->>SSH_CA_Service: Authenticates (e.g., with OIDC token, API key)
    activate SSH_CA_Service
    SSH_CA_Service-->>GitLab_CI/CD_Runner: Issues short-lived SSH certificate
    deactivate SSH_CA_Service
    GitLab_CI/CD_Runner->>Target_VM: Connects via SSH using short-lived certificate
    activate Target_VM
    Target_VM-->>GitLab_CI/CD_Runner: Validates certificate against trusted CA, authenticates
    GitLab_CI/CD_Runner->>Target_VM: Executes deployment commands
    deactivate Target_VM

What It Requires:

A Certificate Authority Service: This is the core component. Popular tools for this include HashiCorp Vault, step-ca, and Teleport. This service must be deployed and managed separately and securely.
Server Configuration: The SSH daemon (sshd) on the target VM must be configured to trust the CA by adding the CA's public key to a file specified by the TrustedUserCAKeys directive.

Example configuration:

curl -o /etc/ssh/ssh-public-key.pem https://vault/v1/ssh/public_key

# /etc/ssh/sshd_config
# ...
TrustedUserCAKeys /etc/ssh/ssh-public-key.pem
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u

mkdir /etc/ssh/auth_principals
echo "<principal>" > /etc/ssh/auth_principals/root

systemctl restart sshd

Considerations:

Tools: Hashicorp Vault, Teleport

Pros:

Highly scalable and secure.
CI/CD runners do not need direct credentials to production servers.
Can be configured to provide a clear audit trail via the CA server.

Cons:

Requires running extra infrastructure for the central tool. Access to other systems is dependent on this service.
Target hosts need to be directly addressable over the network by the clients looking to connect.

Identity-aware proxies¶

This modern approach eliminates the need for direct SSH access from the internet entirely. An agent on the server establishes an outbound connection to a central control plane. The CI/CD job authenticates against this plane to get a connection brokered to the server.

How It Works:

An agent is installed on the target VM. It "phones home" to a central proxy service, creating a secure reverse tunnel. The server's SSH port does not need to be exposed to the internet.
The GitLab runner fetches a short-lived, verifiable identity token from GitLab itself (a JWT-SVID via its OIDC provider).
The runner presents this token to the proxy service to authenticate. The service validates the token and its claims (e.g., "this token is from project X, branch main").
If authorized, the proxy grants access to the target VM through the secure tunnel for the duration of the job.

Virtual machine setup¶

sequenceDiagram
    participant Admin
    participant Proxy_Service
    participant Target_VM

    Admin->>Proxy_Service: Deploys & Configures Proxy Service (e.g., Teleport/Boundary)
    Admin->>Target_VM: Installs & configures Proxy Agent on VM
    Target_VM-->>Proxy_Service: Agent establishes secure outbound connection
    Admin->>Admin: Configures Proxy Service to trust GitLab OIDC provider
    Admin->>Admin: Defines access policies based on GitLab identity/claims

CI/CD pipeline usage sequence¶

sequenceDiagram
    participant GitLab_CI/CD_Runner
    participant Proxy_Service
    participant Target_VM

    GitLab_CI/CD_Runner->>GitLab: Requests OIDC token (JWT-SVID)
    GitLab-->>GitLab_CI/CD_Runner: Provides OIDC token
    GitLab_CI/CD_Runner->>Proxy_Service: Presents OIDC token for authentication
    activate Proxy_Service
    Proxy_Service->>GitLab: Validates OIDC token
    GitLab-->>Proxy_Service: Token is valid
    Proxy_Service-->>GitLab_CI/CD_Runner: Grants temporary access/connection to Target_VM
    deactivate Proxy_Service
    GitLab_CI/CD_Runner->>Proxy_Service: Connects to Target_VM via Proxy
    activate Target_VM
    Proxy_Service->>Target_VM: Forwards connection request
    Target_VM-->>Proxy_Service: Authenticates local user
    Proxy_Service->>GitLab_CI/CD_Runner: Connection established
    GitLab_CI/CD_Runner->>Target_VM: Executes deployment commands
    deactivate Target_VM

What It Requires:

A Proxy/Control Plane Service: Solutions like Teleport, HashiCorp Boundary, and Tailscale SSH provide this functionality, either as a cloud service or a self-hosted application.
Agent Installation: An agent must be installed and configured on every target VM.

Considerations:

Tools: Teleport, HashiCorp Boundary, Tailscale

Pros:

Highly scalable and secure.
CI/CD runners never need credentials to production servers.
Provides a clear audit trail via the central proxy.
Target servers do not need to be exposed to the internet, connections happen via the central tool.

Cons:

Requires running extra infrastructure for the central proxy tool. Access to other systems is dependent on this service being available.
Higher agent setup complexity.
Operations only: You might need another access method for situations where the system has network errors, when they cannot establish the necessary connections, or the agent is down.

Pull-based (GitOps)¶

This pattern inverts the deployment model. Instead of the CI/CD pipeline "pushing" changes to the server, an agent on the server "pulls" the desired state from a trusted source.

How It Works:

The CI/CD pipeline runs tests, builds the application artifact (e.g., a binary or Docker image), and pushes it to a secure repository (e.g., an artifact registry or Docker Hub).
The final step of the pipeline is to update a configuration file in a dedicated Git repository (the "state" or "manifest" repo) to point to the new artifact version.
An agent on the target VM constantly monitors this Git repository. When it sees a change, it pulls the new artifact and performs the deployment locally.

Virtual machine setup¶

sequenceDiagram
    participant Admin
    participant Artifact_Registry
    participant Git_State_Repo
    participant Target_VM

    Admin->>Artifact_Registry: Deploys & Configures Artifact Registry (e.g., Docker Registry)
    Admin->>Target_VM: Installs & configures GitOps Agent (e.g., Argo CD, FluxCD, custom script)
    Target_VM->>Git_State_Repo: Agent is configured to monitor Git_State_Repo
    Admin->>Admin: Defines deployment manifests in Git_State_Repo

CI/CD pipeline usage sequence¶

sequenceDiagram
    participant Developer
    participant GitLab_CI/CD_Runner
    participant Git_State_Repo
    participant Artifact_Registry
    participant Target_VM
    participant GitOps_Agent

    Developer->>GitLab_CI/CD_Runner: Triggers pipeline (e.g., commit to main app repo)
    GitLab_CI/CD_Runner->>GitLab_CI/CD_Runner: Builds application artifact
    GitLab_CI/CD_Runner->>Artifact_Registry: Pushes new application artifact
    GitLab_CI/CD_Runner->>Git_State_Repo: Updates deployment manifest (e.g., image tag)
    activate Git_State_Repo
    Git_State_Repo-->>GitOps_Agent: GitOps Agent detects change in manifest
    deactivate Git_State_Repo
    GitOps_Agent->>Artifact_Registry: Pulls new application artifact
    activate Artifact_Registry
    Artifact_Registry-->>GitOps_Agent: Provides artifact
    deactivate Artifact_Registry
    GitOps_Agent->>Target_VM: Deploys/Updates application on VM

What It Requires:

An Agent on the VM: A configuration management tool like Ansible (in pull mode), Puppet, or a simpler custom script can serve as the agent. For containerized workflows, this is the domain of tools like Argo CD or FluxCD.
An Artifact Repository: A place to store the built application artifacts.
A State Repository: A Git repository that serves as the single source of truth for what version should be running.

Considerations:

Tools: Argo CD, FluxCD (for Kubernetes); Ansible (in pull mode), custom agents.

Pros:

Highly scalable and secure.
CI/CD runners never need credentials to production servers. * Provides a clear audit trail and declarative state via Git.

Cons:

Higher initial setup complexity.
Requires a shift in thinking to a declarative, pull-based model.
(With VMs) environments need read access to the git repository, which is often not suitable.

Benefits of modern systems¶

On the surface, it might look like trading one secret for another, but the argument against that view lies in the fundamental shift from static, identity-based credentials to dynamic, policy-based access.

The scope of what can be accessed (the server) might seem the same, but the how, who, when, and for how long are radically different and superior.

The credential is no longer the access itself

With a static SSH key, the key is the access. Possessing the key is sufficient to enter. With certificate authorities or identity proxies, the initial credential (e.g., a CI/CD job token) is not the access—it is a request for access.

Static Key: The secret is a long-lived, high-privilege credential.

Modern Methods: The secret is a short-lived token used to authenticate to a policy engine. The policy engine then authorizes a separate, ephemeral credential (the SSH certificate) or a proxied session. This separation of authentication and authorization is critical, and every operation is logged and audited.

Time-bound access: The blast radius of a leak is drastically reduced

Static Key Leak: A compromised SSH private key provides persistent, indefinite access until it is manually discovered and revoked on every single server where its public key is installed. An attacker can remain undetected for months.

Modern Methods Leak:

SSH Certificate: If the temporary certificate is stolen, it's only valid for a few minutes. By the time an attacker can use it, it has likely already expired.

OIDC Token/Proxy Token: These tokens are also short-lived (typically 5-60 minutes). A stolen token becomes useless very quickly. More importantly, the session it establishes is also temporary and logged.

Centralized control and immediate revocation

Management and control are moved from the edges (each individual VM) to a central system.

Static Key: Revocation is a distributed problem. To revoke access, you must connect to every single VM and remove the public key from the authorized_keys file. This is slow, error-prone, and doesn't scale.

Modern Methods: You have a central control plane (the CA or Proxy). If you detect a breach, you can revoke access instantly at this single point. You can disable a user's identity, a specific service account, or a policy, and that change is immediately enforced across the entire fleet.

Rich context and policy enforcement (Least Privilege)

Modern systems make decisions based on rich, verifiable context, not just possession of a secret. A GitLab OIDC token, for example, contains verifiable claims like:

project_id: Which project is this job from?
ref_type: Was this a branch or a tag?
ref: What is the name of the branch (e.g., main)?
environment: Is this deploying to production?

This allows the central policy engine to enforce highly granular rules that are impossible with static keys:

Static Key: The key grants access. That's it.

Modern Methods: The policy can state: "Only allow connections from a job running on the main branch of project X that is deploying to the production environment. Grant access as the deployer user for exactly 5 minutes." A job from a feature branch, even with a valid token, would be denied.