How I Hardened Our Docker Supply Chain: A Practical Security Guide

How I stopped shipping “mystery meat” container images — and the tools that changed the way my team thinks about Docker security in 2025.

Why This Keeps Me Up at Night (And Should Keep You Up Too)

A few years back, I inherited a microservices platform that had been running in production for 18 months. When I finally sat down to audit the supply chain, I found images pulling from unverified base layers, zero provenance on third-party dependencies, and a latest tag being deployed straight to production. We were, to put it bluntly, one compromised upstream package away from a catastrophe.

The SolarWinds attack, the codecov bash uploader incident, the xz utils backdoor — these aren’t abstract threats anymore. Supply chain attacks are the dominant vector targeting containerized workloads. And most engineering teams are woefully underprepared.

This isn’t a “here’s how to run docker pull” tutorial. I’m going to walk you through the exact hardening strategy I now enforce across every production Kubernetes cluster I run: cryptographic image signing with Cosign, generating and attesting Software Bill of Materials (SBOMs), multi-stage build discipline, and automated vulnerability scanning with Trivy and Grype. We’ll go deep on each one.

The Threat Model: What “Supply Chain Attack” Actually Means in Docker Context

Before we harden anything, let’s be precise about what we’re defending against. In Docker’s supply chain, the attack surface spans four layers:

Base image poisoning — An attacker compromises a popular base image (node:18, python:3.12-slim) before or after it’s pushed to a registry.
Dependency confusion — A malicious package with the same name as an internal one gets resolved from a public registry.
Compromised build pipelines — CI systems injecting malicious artifacts during the build phase.
Registry tampering — An image is modified in transit or at rest before your cluster pulls it.

Each layer requires a different defense. Let’s build them one by one.

Layer 1: Multi-Stage Builds — Your First Line of Defense

I cannot stress this enough: if you are still using single-stage Dockerfiles in production, you are shipping a build environment as a runtime artifact. That means compilers, package managers, build caches, and often credentials that were used during the build, all sitting inside the final image.

Multi-stage builds solve this by keeping your build toolchain completely separate from your runtime environment.

The Pattern I Use for Node.js Services

# ---- Stage 1: Dependency Installation ----
FROM node:20-alpine AS deps
WORKDIR /app

# Copy only what's needed for npm install (layer cache optimization)
COPY package.json package-lock.json ./

# Use ci for reproducible installs; omit devDependencies
RUN npm ci --omit=dev && npm cache clean --force

# ---- Stage 2: Build ----
FROM node:20-alpine AS builder
WORKDIR /app

COPY --from=deps /app/node_modules ./node_modules
COPY . .

RUN npm run build

# ---- Stage 3: Production Runtime (minimal attack surface) ----
FROM node:20-alpine AS runtime

# Run as non-root user — ALWAYS
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

WORKDIR /app

# Copy ONLY the built artifact and production deps
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=deps --chown=appuser:appgroup /app/node_modules ./node_modules

EXPOSE 3000
CMD ["node", "dist/server.js"]

Why This Matters for Supply Chain Security

The final image above contains no npm, no build tools, and no source code. The attack surface is dramatically reduced. If a dependency was compromised during the build, the blast radius is limited to what actually executes at runtime.

Pro Tip: Pin your base images to a specific digest, not just a tag. Tags are mutable; digests are not.

# ❌ Mutable — tag can be overwritten at any time
FROM node:20-alpine

# ✅ Immutable — cryptographically pinned to a specific manifest
FROM node:20-alpine@sha256:b9bb8c6ce1c02b9e5e0fabb4a24c54e1f62a0e2e3de50a52a2e1e15d3f5e7c8d

I use crane digest node:20-alpine to fetch the current digest and lock it in my Dockerfiles, then update it deliberately as part of a controlled upgrade cycle.

Layer 2: Image Signing with Cosign — Cryptographic Provenance at Scale

If multi-stage builds reduce the attack surface, Cosign closes the verification gap between what you built and what you deployed. Without signing, your cluster has no way to know whether the image it’s about to pull is the one your CI pipeline produced — or something an attacker slipped in.

Cosign is part of the Sigstore project and has become the de facto standard for container image signing. It supports both keyless signing (via OIDC-based ephemeral keys, my preference in CI) and long-lived key pairs.

Setting Up Keyless Signing in GitHub Actions

Keyless signing is the mode I use for all my CI pipelines. Instead of managing long-lived key material, Cosign leverages the OIDC token from your CI provider (GitHub, GitLab, etc.) to get a short-lived certificate from Sigstore’s Fulcio CA. The signing event is also logged in Rekor, a transparency log, giving you an immutable audit trail.

# .github/workflows/build-and-sign.yml
name: Build, Scan, Sign

on:
  push:
    branches: [main]

jobs:
  build-sign:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write  # Required for keyless Cosign signing

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to GHCR
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and Push
        id: build-push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          # Output the image digest for signing
          outputs: type=image,name=ghcr.io/${{ github.repository }},push-by-digest=true,name-canonical=true

      - name: Install Cosign
        uses: sigstore/cosign-installer@v3

      - name: Sign the image (keyless)
        run: |
          cosign sign --yes \
            ghcr.io/${{ github.repository }}@${{ steps.build-push.outputs.digest }}
        env:
          COSIGN_EXPERIMENTAL: "true"

Verifying Signatures Before Deployment

Signing means nothing if you don’t verify. I enforce this at the admission controller level in Kubernetes using Policy Controller (from Sigstore), but you can also verify manually:

# Verify keyless signature — checks Rekor transparency log automatically
cosign verify \
  --certificate-identity="https://github.com/myorg/myrepo/.github/workflows/build-and-sign.yml@refs/heads/main" \
  --certificate-oidc-issuer="https://token.actions.githubusercontent.com" \
  ghcr.io/myorg/myrepo@sha256:<digest>

Common Pitfall: Engineers often sign the tag (myimage:latest) rather than the digest. Always sign by digest. Tags can be overwritten; a digest is immutable. If you sign a tag and someone pushes a new image to that tag, your signature is now on a different image.

Layer 3: Software Bill of Materials (SBOMs) — Know Exactly What’s in Your Images

An SBOM is a machine-readable inventory of every component in your software artifact: packages, libraries, licenses, versions. Think of it as a nutrition label for your container image. In 2025, with Executive Order 14028 and frameworks like SLSA pushing SBOM requirements into enterprise procurement, this is no longer optional for teams shipping to regulated industries.

I generate SBOMs using Syft and attach them as attestations to my images using Cosign, making them part of the immutable provenance chain.

Generating and Attesting an SBOM

# Install syft (or use the GitHub Action)
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

# Generate SBOM in SPDX JSON format (also supports CycloneDX)
syft ghcr.io/myorg/myrepo@sha256:<digest> \
  -o spdx-json=sbom.spdx.json

# Inspect the SBOM — look at what packages are present
cat sbom.spdx.json | jq '.packages[] | {name: .name, version: .versionInfo, license: .licenseConcluded}'

# In your GitHub Actions workflow, after building:
- name: Generate SBOM
  run: |
    syft ghcr.io/${{ github.repository }}@${{ steps.build-push.outputs.digest }} \
      -o spdx-json=sbom.spdx.json

- name: Attest SBOM to image
  run: |
    cosign attest --yes \
      --predicate sbom.spdx.json \
      --type spdxjson \
      ghcr.io/${{ github.repository }}@${{ steps.build-push.outputs.digest }}
  env:
    COSIGN_EXPERIMENTAL: "true"

Now your SBOM is cryptographically attached to the image. Anyone pulling the image can verify and download the SBOM:

cosign verify-attestation \
  --type spdxjson \
  --certificate-identity="..." \
  --certificate-oidc-issuer="..." \
  ghcr.io/myorg/myrepo@sha256:<digest> | jq '.payload | @base64d | fromjson'

Lesson Learned: The first time I generated an SBOM for what I thought was a lean Alpine-based service, it revealed 214 packages — including several with known CVEs that had been silently baked into a transitive dependency. We never would have found them without the SBOM because nothing was scanning that deep.

Layer 4: Automated Vulnerability Scanning with Trivy and Grype

Signing and SBOMs give you provenance. Vulnerability scanning gives you risk awareness. I run both Trivy and Grype in my pipelines — they use different databases and have different strengths, so running both catches more.

Trivy: Fast, CI-Friendly, Battle-Tested

Trivy by Aqua Security is my go-to for in-pipeline scanning. It’s fast, has excellent Kubernetes CRD support, and can scan not just container images but also filesystems, git repositories, and IaC configs.

# Install Trivy
brew install trivy  # macOS
# or via GitHub Releases for CI

# Scan a local image — fail on HIGH or CRITICAL vulns
trivy image \
  --exit-code 1 \
  --severity HIGH,CRITICAL \
  --ignore-unfixed \
  ghcr.io/myorg/myrepo:latest

# Output SARIF for GitHub Security tab integration
trivy image \
  --format sarif \
  --output trivy-results.sarif \
  ghcr.io/myorg/myrepo:latest

# Full GitHub Actions integration with SARIF upload
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: ghcr.io/${{ github.repository }}@${{ steps.build-push.outputs.digest }}
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
    exit-code: '1'
    ignore-unfixed: true

- name: Upload Trivy scan results to GitHub Security tab
  uses: github/codeql-action/upload-sarif@v3
  if: always()
  with:
    sarif_file: 'trivy-results.sarif'

Grype: The Second Opinion You Need

Grype (by Anchore, the same team that makes Syft) uses a different vulnerability database and tends to catch different things than Trivy. I run it as a complementary check.

# Install grype
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin

# Scan and fail on high/critical — also accepts SBOM as input
grype sbom:./sbom.spdx.json \
  --fail-on high \
  -o table

Pro Tip: Scan your SBOM rather than the live image when possible. It’s faster (no re-analysis of the image layers), and it’s testing the exact inventory you attested. This is particularly useful in CD pipelines where you want fast feedback.

Building a `.trivyignore` That Isn’t a Cop-Out

Every team eventually encounters a CVE they can’t immediately fix — the upstream package hasn’t patched it, or the vulnerability is in a path not exercised at runtime. The temptation is to blanket-ignore it. Resist that. Use a documented, time-bounded ignore with justification:

# .trivyignore
# CVE-2024-XXXXX — libssl vulnerability in OpenSSL 3.x
# Affected code path: TLS renegotiation (not used in our service)
# Upstream fix expected in OpenSSL 3.3.2 (target: Q3 2025 upgrade)
# Reviewed by: @your-security-team, 2025-01-15
CVE-2024-XXXXX

Bringing It All Together: The Complete Hardened Pipeline

Here’s the full pipeline order I use in production, combining all four layers:

Build (multi-stage Dockerfile with pinned digest)
  └─> Push to registry (by digest, not tag)
       └─> Generate SBOM (Syft → SPDX JSON)
            └─> Scan SBOM for vulnerabilities (Grype)
                 └─> Scan image layers (Trivy)
                      └─> Sign image digest (Cosign keyless)
                           └─> Attest SBOM (Cosign + predicate)
                                └─> Tag image (only AFTER all checks pass)
                                     └─> Deploy (Kubernetes Policy Controller verifies signature)

Critically, tagging happens last. This means myimage:latest or myimage:v1.2.3 is only ever pointing to an image that has passed signing and scanning. This prevents the common failure mode where a tag gets updated to an unscanned image.

Lessons Learned: What I Wish Someone Had Told Me

Start with digest pinning, not signing. It’s the highest-ROI change you can make in an afternoon and requires zero new tooling. Everything else builds on it.
SBOMs are only useful if you store and query them. Set up a lightweight database (even a S3 bucket + Athena will do) to query SBOMs across your fleet. Knowing what packages you have is only valuable when a new CVE drops and you need to answer “are we affected?” in minutes, not days.
Cosign keyless is superior to key-based in CI. Managing long-lived signing keys in CI is an operational and security burden. Keyless signing with OIDC eliminates the “where is the private key?” problem entirely.
Trivy false positives are real. Base OS packages often carry CVEs that are patched by the distro maintainer but still show up in vulnerability DBs. Use --ignore-unfixed judiciously and cross-reference with your distro’s security advisory feed.
Policy enforcement at admission time is the real goal. Scanning in CI is great; enforcing at the Kubernetes admission controller level means even manual kubectl apply of a rogue image gets blocked. Kyverno or Sigstore’s Policy Controller are both excellent for this.

Common Pitfalls to Avoid

Using USER root in production Dockerfiles — Even if it “works,” it violates the principle of least privilege. Always drop to a non-root user in the final stage.
Signing the :latest tag — Tags are mutable. Always sign by digest. Always.
Treating scanning as a one-time event — New CVEs are published daily. Scan images in your registry on a schedule, not just at build time. Tools like Trivy’s --list-all-pkgs mode help you build a continuous monitoring workflow.
Skipping the SBOM for base images — Your base image’s supply chain is also your supply chain. Syft can generate SBOMs for pulled images you don’t own.
Not testing your verification policy — Write an integration test that deliberately pushes an unsigned image and verifies your admission controller rejects it.

Pro Tips for Senior Engineers

Use docker buildx build --sbom=true --provenance=true — Docker Buildx now natively supports SBOM and provenance generation via the --sbom and --provenance flags, attaching them as OCI image index attestations automatically.
Integrate Grype into your editor — The Anchore VS Code extension surfaces CVEs in your Dockerfile and dependency files before you even commit.
Automate base image upgrades — Use Renovate Bot with a digest-pinning configuration. It will automatically open PRs to update your pinned digests on a schedule, keeping you current without manual effort.
Consider distroless base images — Google’s distroless images contain no shell, no package manager, and no OS utilities. The attack surface is minimal. They’re my preferred base for compiled languages like Go and Rust.

Conclusion

Container supply chain security is not a checkbox. It’s a posture — one that requires defense-in-depth across your entire build and deploy lifecycle. Multi-stage builds reduce your runtime attack surface. Digest pinning eliminates tag mutability as a risk. Cosign and keyless signing provide cryptographic provenance. SBOMs give you a real-time inventory of what you’re running. Trivy and Grype catch known vulnerabilities before they reach production.

The good news: none of this requires a massive platform rewrite. You can layer these controls in incrementally, starting with Dockerfile hygiene this week and reaching full SBOM attestation and policy enforcement within a quarter. Start small, automate aggressively, and build the security culture on your team one PR at a time.

Did I miss something, or have a different approach to container hardening? I’d love to hear it. Drop me a comment or find me on X/Twitter.

Tags: Docker Security, Container Security, Supply Chain Security, Cosign, Trivy, Grype, SBOM, DevSecOps, Kubernetes Security, CI/CD Security