Name: Deployment Pipeline Design
Author: Wshobson
Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill deployment-pipeline-design
Works with Paperclip
How Deployment Pipeline Design fits into a Paperclip company.

Deployment Pipeline Design drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
SaaS FactoryPaired
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
$27$59
Explore pack
Source file
SKILL.md500 linesmarkdown
Expand
1---2name: deployment-pipeline-design3description: Design multi-stage CI/CD pipelines with approval gates, security checks, and deployment orchestration. Use this skill when designing zero-downtime deployment pipelines, implementing canary rollout strategies, setting up multi-environment promotion workflows, or debugging failed deployment gates in CI/CD.4---5 6# Deployment Pipeline Design7 8Architecture patterns for multi-stage CI/CD pipelines with approval gates, deployment strategies, and environment promotion workflows.9 10## Purpose11 12Design robust, secure deployment pipelines that balance speed with safety through proper stage organization, automated quality gates, and progressive delivery strategies. This skill covers both the structural design of pipeline architecture and the operational patterns for reliable production deployments.13 14## Input / Output15 16### What You Provide17 18- **Application type**: Language/runtime, containerized or bare-metal, monolith or microservices19- **Deployment target**: Kubernetes, ECS, VMs, serverless, or platform-as-a-service20- **Environment topology**: Number of environments (dev/staging/prod), region layout, air-gap requirements21- **Rollout requirements**: Acceptable downtime, rollback SLA, traffic splitting needs, canary vs blue-green preference22- **Gate constraints**: Approval teams, required test coverage thresholds, compliance scans (SAST, DAST, SCA)23- **Monitoring stack**: Prometheus, Datadog, CloudWatch, or other metrics sources used for automated promotion decisions24 25### What This Skill Produces26 27- **Pipeline configuration**: Stage definitions, job dependencies, parallelism, and caching strategy28- **Deployment strategy**: Chosen rollout pattern with annotated configuration (canary weights, blue-green switchover, rolling parameters)29- **Health check setup**: Shallow vs deep readiness probes, post-deployment smoke test scripts30- **Gate definitions**: Automated metric thresholds and manual approval workflows31- **Rollback plan**: Automated rollback triggers and manual runbook steps32 33## When to Use34 35- Design CI/CD architecture for a new service or platform migration36- Implement deployment gates between environments37- Configure multi-environment pipelines with mandatory security scanning38- Establish progressive delivery with canary or blue-green strategies39- Debug pipelines where stages succeed but production behavior is wrong40- Reduce mean time to recovery by automating rollback on metric degradation41 42## Pipeline Stages43 44### Standard Pipeline Flow45 46```47┌─────────┐   ┌──────┐   ┌─────────┐   ┌────────┐   ┌──────────┐48│  Build  │ → │ Test │ → │ Staging │ → │ Approve│ → │Production│49└─────────┘   └──────┘   └─────────┘   └────────┘   └──────────┘50```51 52### Detailed Stage Breakdown53 541. **Source** - Code checkout, dependency graph resolution552. **Build** - Compile, package, containerize, sign artifacts563. **Test** - Unit, integration, SAST/SCA security scans574. **Staging Deploy** - Deploy to staging environment with smoke tests585. **Integration Tests** - E2E, contract tests, performance baselines596. **Approval Gate** - Manual or automated metric-based gate607. **Production Deploy** - Canary, blue-green, or rolling strategy618. **Verification** - Deep health checks, synthetic monitoring629. **Rollback** - Automated rollback on failure signals63 64## Approval Gate Patterns65 66### Pattern 1: Manual Approval (GitHub Actions)67 68```yaml69production-deploy:70  needs: staging-deploy71  environment:72    name: production73    url: https://app.example.com74  runs-on: ubuntu-latest75  steps:76    - name: Deploy to production77      run: kubectl apply -f k8s/production/78```79 80Environment protection rules in GitHub enforce required reviewers before this job starts. Configure reviewers at **Settings → Environments → production → Required reviewers**.81 82### Pattern 2: Time-Based Approval (GitLab CI)83 84```yaml85deploy:production:86  stage: deploy87  script:88    - deploy.sh production89  environment:90    name: production91  when: delayed92  start_in: 30 minutes93  only:94    - main95```96 97### Pattern 3: Multi-Approver (Azure Pipelines)98 99```yaml100stages:101  - stage: Production102    dependsOn: Staging103    jobs:104      - deployment: Deploy105        environment:106          name: production107          resourceType: Kubernetes108        strategy:109          runOnce:110            preDeploy:111              steps:112                - task: ManualValidation@0113                  inputs:114                    notifyUsers: "team-leads@example.com"115                    instructions: "Review staging metrics before approving"116```117 118### Pattern 4: Automated Metric Gate119 120Use an AnalysisTemplate (Argo Rollouts) or a custom gate script to block promotion when error rates exceed a threshold:121 122```yaml123# Argo Rollouts AnalysisTemplate — blocks canary promotion automatically124apiVersion: argoproj.io/v1alpha1125kind: AnalysisTemplate126metadata:127  name: success-rate128spec:129  metrics:130  - name: success-rate131    interval: 60s132    successCondition: "result[0] >= 0.95"133    failureCondition: "result[0] < 0.90"134    inconclusiveLimit: 3135    provider:136      prometheus:137        address: http://prometheus:9090138        query: |139          sum(rate(http_requests_total{status!~"5..",job="my-app"}[2m]))140          / sum(rate(http_requests_total{job="my-app"}[2m]))141```142 143## Deployment Strategies144 145### Decision Table146 147| Strategy     | Downtime | Rollback Speed | Cost Impact     | Best For                        |148|-------------|----------|----------------|-----------------|----------------------------------|149| Rolling      | None     | ~minutes       | None            | Most stateless services          |150| Blue-Green   | None     | Instant        | 2x infra (temp) | High-risk or database migrations |151| Canary       | None     | Instant        | Minimal         | High-traffic, metric-driven      |152| Recreate     | Yes      | Fast           | None            | Dev/test, batch jobs             |153| Feature Flag | None     | Instant        | None            | Gradual feature exposure         |154 155### 1. Rolling Deployment156 157```yaml158apiVersion: apps/v1159kind: Deployment160metadata:161  name: my-app162spec:163  replicas: 10164  strategy:165    type: RollingUpdate166    rollingUpdate:167      maxSurge: 2         # at most 12 pods during rollout168      maxUnavailable: 1   # at least 9 pods always serving169```170 171Characteristics: gradual rollout, zero downtime, easy rollback, best for most applications.172 173### 2. Blue-Green Deployment174 175```bash176# Switch traffic from blue to green177kubectl apply -f k8s/green-deployment.yaml178kubectl rollout status deployment/my-app-green179 180# Flip the service selector181kubectl patch service my-app -p '{"spec":{"selector":{"version":"green"}}}'182 183# Rollback instantly if needed184kubectl patch service my-app -p '{"spec":{"selector":{"version":"blue"}}}'185```186 187Characteristics: instant switchover, easy rollback, doubles infrastructure cost temporarily, good for high-risk deployments with long warm-up times.188 189### 3. Canary Deployment (Argo Rollouts)190 191```yaml192apiVersion: argoproj.io/v1alpha1193kind: Rollout194metadata:195  name: my-app196spec:197  replicas: 10198  strategy:199    canary:200      analysis:201        templates:202          - templateName: success-rate203        startingStep: 2204      steps:205        - setWeight: 10206        - pause: { duration: 5m }207        - setWeight: 25208        - pause: { duration: 5m }209        - setWeight: 50210        - pause: { duration: 10m }211        - setWeight: 100212```213 214Characteristics: gradual traffic shift, real-user metric validation, automated promotion or rollback, requires Argo Rollouts or a service mesh.215 216### 4. Feature Flags217 218```python219from flagsmith import Flagsmith220 221flagsmith = Flagsmith(environment_key="API_KEY")222 223if flagsmith.has_feature("new_checkout_flow"):224    process_checkout_v2()225else:226    process_checkout_v1()227```228 229Characteristics: deploy without releasing, A/B testing, instant rollback per user segment, granular control independent of deployment.230 231## Pipeline Orchestration232 233### Multi-Stage Pipeline Example (GitHub Actions)234 235```yaml236name: Production Pipeline237 238on:239  push:240    branches: [main]241 242jobs:243  build:244    runs-on: ubuntu-latest245    outputs:246      image: ${{ steps.build.outputs.image }}247    steps:248      - uses: actions/checkout@v4249      - name: Build and push Docker image250        id: build251        run: |252          IMAGE=myapp:${{ github.sha }}253          docker build -t $IMAGE .254          docker push $IMAGE255          echo "image=$IMAGE" >> $GITHUB_OUTPUT256 257  test:258    needs: build259    runs-on: ubuntu-latest260    steps:261      - name: Unit tests262        run: make test263      - name: Security scan264        run: trivy image ${{ needs.build.outputs.image }}265 266  deploy-staging:267    needs: test268    environment:269      name: staging270    runs-on: ubuntu-latest271    steps:272      - name: Deploy to staging273        run: kubectl apply -f k8s/staging/274 275  integration-test:276    needs: deploy-staging277    runs-on: ubuntu-latest278    steps:279      - name: Run E2E tests280        run: npm run test:e2e281 282  deploy-production:283    needs: integration-test284    environment:285      name: production        # blocks here until required reviewers approve286    runs-on: ubuntu-latest287    steps:288      - name: Canary deployment289        run: |290          kubectl apply -f k8s/production/291          kubectl argo rollouts promote my-app292 293  verify:294    needs: deploy-production295    runs-on: ubuntu-latest296    steps:297      - name: Deep health check298        run: |299          for i in {1..12}; do300            STATUS=$(curl -sf https://app.example.com/health/ready | jq -r '.status')301            [ "$STATUS" = "ok" ] && exit 0302            sleep 10303          done304          exit 1305      - name: Notify on success306        run: |307          curl -X POST ${{ secrets.SLACK_WEBHOOK }} \308            -d '{"text":"Production deployment successful: ${{ github.sha }}"}'309```310 311## Health Checks312 313### Shallow vs Deep Health Endpoints314 315A shallow `/ping` returns 200 even when downstream dependencies are broken. Use a deep readiness endpoint that verifies actual dependencies before promoting traffic.316 317```python318# /health/ready — checks real dependencies, used by pipeline gate319@app.get("/health/ready")320async def readiness():321    checks = {322        "database": await check_db_connection(),323        "cache":    await check_redis_connection(),324        "queue":    await check_queue_connection(),325    }326    status = "ok" if all(checks.values()) else "degraded"327    code = 200 if status == "ok" else 503328    return JSONResponse({"status": status, "checks": checks}, status_code=code)329```330 331### Post-Deployment Verification Script332 333```bash334#!/usr/bin/env bash335# verify-deployment.sh — run after every production deploy336set -euo pipefail337 338ENDPOINT="${1:?usage: verify-deployment.sh <base-url>}"339MAX_ATTEMPTS=12340SLEEP_SECONDS=10341 342for i in $(seq 1 $MAX_ATTEMPTS); do343  STATUS=$(curl -sf "$ENDPOINT/health/ready" | jq -r '.status' 2>/dev/null || echo "unreachable")344  if [ "$STATUS" = "ok" ]; then345    echo "Health check passed after $((i * SLEEP_SECONDS))s"346    exit 0347  fi348  echo "Attempt $i/$MAX_ATTEMPTS: status=$STATUS — retrying in ${SLEEP_SECONDS}s"349  sleep "$SLEEP_SECONDS"350done351 352echo "Health check failed after $((MAX_ATTEMPTS * SLEEP_SECONDS))s"353exit 1354```355 356## Rollback Strategies357 358### Automated Rollback in Pipeline359 360```yaml361deploy-and-verify:362  steps:363    - name: Deploy new version364      run: kubectl apply -f k8s/365 366    - name: Wait for rollout367      run: kubectl rollout status deployment/my-app --timeout=5m368 369    - name: Post-deployment health check370      id: health371      run: ./scripts/verify-deployment.sh https://app.example.com372 373    - name: Rollback on failure374      if: failure()375      run: |376        kubectl rollout undo deployment/my-app377        echo "Rolled back to previous revision"378```379 380### Manual Rollback Commands381 382```bash383# List revision history with change-cause annotations384kubectl rollout history deployment/my-app385 386# Rollback to previous version387kubectl rollout undo deployment/my-app388 389# Rollback to a specific revision390kubectl rollout undo deployment/my-app --to-revision=3391 392# Verify rollback completed393kubectl rollout status deployment/my-app394```395 396For advanced rollback strategies including database migration rollbacks and Argo Rollouts abort flows, see [`references/advanced-strategies.md`](references/advanced-strategies.md).397 398## Monitoring and Metrics399 400### Key DORA Metrics to Track401 402| Metric                    | Target (Elite) | How to Measure                           |403|--------------------------|----------------|------------------------------------------|404| Deployment Frequency      | Multiple/day   | Pipeline run count per day               |405| Lead Time for Changes     | < 1 hour       | Commit timestamp → production deploy     |406| Change Failure Rate       | < 5%           | Failed deploys / total deploys           |407| Mean Time to Recovery     | < 1 hour       | Incident open → service restored         |408 409### Post-Deployment Metric Verification410 411```yaml412- name: Verify error rate post-deployment413  run: |414    sleep 60  # allow metrics to accumulate415 416    ERROR_RATE=$(curl -sf "$PROMETHEUS_URL/api/v1/query" \417      --data-urlencode 'query=sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))' \418      | jq '.data.result[0].value[1]')419 420    echo "Current error rate: $ERROR_RATE"421    if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then422      echo "Error rate $ERROR_RATE exceeds 1% threshold — triggering rollback"423      exit 1424    fi425```426 427## Pipeline Best Practices428 4291. **Fail fast** — Run quick checks (lint, unit tests) before slow ones (E2E, security scans)4302. **Parallel execution** — Run independent jobs concurrently to minimize total pipeline time4313. **Caching** — Cache dependency layers and build artifacts between runs4324. **Artifact promotion** — Build once, promote the same artifact through all environments4335. **Environment parity** — Keep staging infrastructure as close to production as possible4346. **Secrets management** — Use secret stores (Vault, AWS Secrets Manager, GitHub encrypted secrets) — never hardcode4357. **Deployment windows** — Prefer low-traffic windows; enforce change freeze periods via gate policies4368. **Idempotent deploys** — Ensure re-running a deploy produces the same result4379. **Rollback automation** — Trigger rollback automatically on health check or metric threshold failure43810. **Annotate deployments** — Send deployment markers to monitoring tools (Datadog, Grafana) for correlation439 440## Troubleshooting441 442### Health check passes in pipeline but service is unhealthy in production443 444The pipeline health check is hitting a shallow `/ping` endpoint that returns 200 even when the database is unreachable. Use a deep readiness check that verifies actual dependencies (see Health Checks section above).445 446### Canary deployment never promotes to 100%447 448Argo Rollouts requires a valid `AnalysisTemplate` to auto-promote. If the Prometheus query returns no data (e.g., metric name changed), the analysis stays inconclusive and promotion stalls. Add `inconclusiveLimit` so the rollout fails fast rather than hanging:449 450```yaml451spec:452  metrics:453  - name: error-rate454    failureCondition: "result[0] > 0.05"455    inconclusiveLimit: 2   # fail after 2 inconclusive results, not hang indefinitely456    provider:457      prometheus:458        query: |459          sum(rate(http_requests_total{status=~"5.."}[2m]))460          / sum(rate(http_requests_total[2m]))461```462 463### Staging deploy succeeds but production job never starts464 465Check that production environment protection rules are configured — a missing reviewer assignment means the approval gate waits indefinitely with no notification. In GitHub Actions, ensure `Required reviewers` is set to an existing user or team in **Settings → Environments → production**.466 467### Docker layer cache busted on every run causing slow builds468 469If `COPY . .` appears before dependency installation, any source file change invalidates the dependency layer. Reorder to copy dependency manifests first:470 471```dockerfile472# Good: dependencies cached separately from source code473COPY package*.json ./474RUN npm ci475COPY . .476RUN npm run build477```478 479### Rollback leaves database migrations applied to old code480 481A service rollback without a migration rollback causes schema/code mismatch errors. Always make migrations backward-compatible (additive only) for at least one release cycle, and keep undo scripts versioned alongside the migration:482 483```bash484# migrations/V20240315__add_nullable_column.sql       (forward)485# migrations/V20240315__add_nullable_column.undo.sql  (backward)486```487 488Never run destructive migrations (DROP COLUMN, ALTER NOT NULL) until the old code version is fully retired from all environments.489 490## Advanced Topics491 492For platform-specific pipeline configurations, multi-region promotion workflows, and advanced Argo Rollouts patterns, see:493 494- [`references/advanced-strategies.md`](references/advanced-strategies.md) — Extended YAML examples, platform-specific configs (GitHub Actions, GitLab CI, Azure Pipelines), multi-region canary patterns, and database migration rollback strategies495 496## Related Skills497 498- `github-actions-templates` - For GitHub Actions implementation patterns and reusable workflows499- `gitlab-ci-patterns` - For GitLab CI/CD pipeline implementation500- `secrets-management` - For secrets handling in CI/CD pipelines
Related skills
Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app