Building a Production CI/CD Pipeline: From Local Dev to Multi-Environment EKS

The Problem

CI/CD pipelines written in YAML are the infrastructure equivalent of technical debt that nobody tracks. They start simple — a few steps to lint, test, and deploy. Then someone adds caching. Then environment variables for staging vs. production. Then conditional logic for different branches. After six months, the pipeline is a 400-line YAML file that nobody fully understands, cannot be tested locally, and breaks in ways that take 15-minute CI cycles to debug.

We hit this wall on our own projects. The GitHub Actions workflow had grown to the point where debugging a deploy failure meant pushing a commit, waiting for CI, reading logs, pushing another commit, and repeating. Testing a pipeline change against a real Kubernetes cluster required pushing to a branch and waiting. There was no way to run the pipeline locally, so the feedback loop was measured in minutes per iteration instead of seconds.

The deeper problem was portability. The pipeline logic was locked into GitHub Actions syntax. If we ever needed to run the same pipeline in a different CI system — or help a client who used GitLab or CircleCI — we would be rewriting from scratch. The build logic was coupled to the platform.

The Approach

We rebuilt the entire pipeline using Dagger's TypeScript SDK. Dagger runs every pipeline step inside containers orchestrated by the Dagger Engine. The pipeline is a TypeScript program — real code with type checking, IDE autocomplete, and the ability to run the full pipeline with a single command on a laptop before pushing to CI.

The key design decision was building the pipeline to target two deployment environments with a single codebase: a local Kind cluster for development and AWS EKS for production. A single environment variable — DEPLOYMENT_TARGET — switches between the two. The pipeline logic, container builds, and Helm deployments are identical. Only the registry (local Docker registry vs. ECR) and the ingress configuration (Traefik vs. ALB) change.

We also designed for multi-environment promotion from the start. On EKS, the pipeline deploys sequentially to three namespace-isolated environments — dev, staging, and production — each with its own Helm values overlay defining replica counts, resource limits, autoscaling policies, network policies, and security contexts.

The Solution

Six-Stage Pipeline

The pipeline runs six stages in sequence:

Lint and Test — ESLint and Vitest run inside isolated Node.js 22 Alpine containers. Dagger mounts the project source, installs dependencies, and executes the checks. The containers are identical whether running locally or in GitHub Actions — no environment-specific behavior.

Chart Lint — The Helm chart is validated against every environment's values file. Rather than hardcoding environments, the stage discovers all .yaml files in the environments/ directory automatically. Adding a new environment overlay is just creating a file — the pipeline picks it up without any configuration change.

Build and Push — A multi-stage Docker build compiles TypeScript, then produces a minimal production image running as a non-root user. The image and the Helm chart (as an OCI artifact) are pushed to either the local registry or ECR, depending on the deployment target.

Deploy — helm upgrade --install deploys the application using the OCI chart reference and the appropriate values overlay. On EKS, the pipeline creates the target namespace, substitutes the ECR repository URI into the values file, and deploys with --wait so it fails immediately if pods do not become healthy.

Helm Test — A test pod runs inside the cluster, curling the application's health, readiness, and liveness endpoints from within the cluster network. This validates that the deploy actually works end-to-end, not just that Kubernetes accepted the manifests.

Progressive Environment Configuration

Each environment has a different security and resilience posture, enforced through Helm values overlays:

Dev gets a single replica, minimal resources, and no network restrictions — optimized for fast iteration.

Staging gets two replicas, horizontal pod autoscaling (2-5 pods), and a pod disruption budget — mirroring production topology at smaller scale.

Production gets three replicas minimum, HPA scaling to ten, network policies restricting traffic, a read-only root filesystem, and all Linux capabilities dropped. The production values file enforces the security baseline that would pass an infrastructure audit.

Infrastructure as Code

The EKS cluster, VPC, ECR repository, and AWS Load Balancer Controller are all defined in AWS CDK (TypeScript) and versioned alongside the application code. A single setup script deploys the entire infrastructure and writes a .env.eks file with all the environment variables the pipeline needs. Teardown is equally automated.

GitHub Actions Integration

Two workflow types drive the CI/CD in GitHub Actions:

A CI workflow runs on every pull request — executing lint, test, and chart lint stages only. No deployment, no AWS credentials needed.

A deploy workflow (reusable) handles the build-deploy-test stages for each environment. Three trigger workflows map branches to environments: develop auto-deploys to dev, staging auto-deploys to staging, and main deploys to production with GitHub Environment protection rules requiring manual approval.

AWS authentication uses OIDC federation — no long-lived access keys. GitHub's OIDC provider assumes an IAM role scoped to ECR push and EKS deploy permissions.

Results

The pipeline runs in approximately 4 minutes end-to-end, including build, push, deploy, and in-cluster health check. The previous manual process — SSH, git pull, restart, verify — took anywhere from 30 minutes to several hours depending on the complexity of the change and whether anything went wrong.

The most significant improvement is the local development experience. Running npm run pipeline executes the full six-stage pipeline against a local Kind cluster. Pipeline changes are testable in seconds, not CI cycles. The "push and pray" debugging loop is eliminated.

The dual-target architecture proved its value immediately. The same pipeline code that deploys to a local Kind cluster drives production EKS deployments. There is no translation layer between local development and production CI. What works on a laptop works in GitHub Actions.

The multi-environment promotion model with progressive security enforcement means production deployments are not just automated — they are hardened. Network policies, read-only filesystems, dropped capabilities, and pod disruption budgets are enforced by the Helm values, not by manual configuration. Security is a property of the deployment, not a checklist item.

The project is open source on GitHub, and we wrote a detailed technical walkthrough covering the full implementation.