Name: On Call Handoff Patterns
Author: Wshobson

Install

Terminal · npx

$npx skills add https://github.com/wshobson/agents --skill on-call-handoff-patterns

Works with Paperclip

How On Call Handoff Patterns fits into a Paperclip company.

On Call Handoff Patterns drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md331 linesmarkdown

Expand

1---2name: on-call-handoff-patterns3description: Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use this skill when transitioning on-call responsibilities between engineers and ensuring the incoming responder has full situational awareness, when writing a shift summary that captures active incidents, ongoing investigations, and recent changes, when handing off mid-incident so a fresh engineer can take over the incident commander role without losing context, when onboarding a new engineer to the on-call rotation for the first time, or when auditing and improving the quality of existing handoff processes across teams.4---5 6# On-Call Handoff Patterns7 8Effective patterns for on-call shift transitions, ensuring continuity, context transfer, and reliable incident response across shifts.9 10## When to Use This Skill11 12- Transitioning on-call responsibilities13- Writing shift handoff summaries14- Documenting ongoing investigations15- Establishing on-call rotation procedures16- Improving handoff quality17- Onboarding new on-call engineers18 19## Core Concepts20 21### 1. Handoff Components22 23| Component                  | Purpose                 |24| -------------------------- | ----------------------- |25| **Active Incidents**       | What's currently broken |26| **Ongoing Investigations** | Issues being debugged   |27| **Recent Changes**         | Deployments, configs    |28| **Known Issues**           | Workarounds in place    |29| **Upcoming Events**        | Maintenance, releases   |30 31### 2. Handoff Timing32 33```34Recommended: 30 min overlap between shifts35 36Outgoing:37├── 15 min: Write handoff document38└── 15 min: Sync call with incoming39 40Incoming:41├── 15 min: Review handoff document42├── 15 min: Sync call with outgoing43└── 5 min: Verify alerting setup44```45 46## Templates47 48### Template 1: Shift Handoff Document49 50````markdown51# On-Call Handoff: Platform Team52 53**Outgoing**: @alice (2024-01-15 to 2024-01-22)54**Incoming**: @bob (2024-01-22 to 2024-01-29)55**Handoff Time**: 2024-01-22 09:00 UTC56 57---58 59## 🔴 Active Incidents60 61### None currently active62 63No active incidents at handoff time.64 65---66 67## 🟡 Ongoing Investigations68 69### 1. Intermittent API Timeouts (ENG-1234)70 71**Status**: Investigating72**Started**: 2024-01-2073**Impact**: ~0.1% of requests timing out74 75**Context**:76 77- Timeouts correlate with database backup window (02:00-03:00 UTC)78- Suspect backup process causing lock contention79- Added extra logging in PR #567 (deployed 01/21)80 81**Next Steps**:82 83- [ ] Review new logs after tonight's backup84- [ ] Consider moving backup window if confirmed85 86**Resources**:87 88- Dashboard: [API Latency](https://grafana/d/api-latency)89- Thread: #platform-eng (01/20, 14:32)90 91---92 93### 2. Memory Growth in Auth Service (ENG-1235)94 95**Status**: Monitoring96**Started**: 2024-01-1897**Impact**: None yet (proactive)98 99**Context**:100 101- Memory usage growing ~5% per day102- No memory leak found in profiling103- Suspect connection pool not releasing properly104 105**Next Steps**:106 107- [ ] Review heap dump from 01/21108- [ ] Consider restart if usage > 80%109 110**Resources**:111 112- Dashboard: [Auth Service Memory](https://grafana/d/auth-memory)113- Analysis doc: [Memory Investigation](https://docs/eng-1235)114 115---116 117## 🟢 Resolved This Shift118 119### Payment Service Outage (2024-01-19)120 121- **Duration**: 23 minutes122- **Root Cause**: Database connection exhaustion123- **Resolution**: Rolled back v2.3.4, increased pool size124- **Postmortem**: [POSTMORTEM-89](https://docs/postmortem-89)125- **Follow-up tickets**: ENG-1230, ENG-1231126 127---128 129## 📋 Recent Changes130 131### Deployments132 133| Service      | Version | Time        | Notes                      |134| ------------ | ------- | ----------- | -------------------------- |135| api-gateway  | v3.2.1  | 01/21 14:00 | Bug fix for header parsing |136| user-service | v2.8.0  | 01/20 10:00 | New profile features       |137| auth-service | v4.1.2  | 01/19 16:00 | Security patch             |138 139### Configuration Changes140 141- 01/21: Increased API rate limit from 1000 to 1500 RPS142- 01/20: Updated database connection pool max from 50 to 75143 144### Infrastructure145 146- 01/20: Added 2 nodes to Kubernetes cluster147- 01/19: Upgraded Redis from 6.2 to 7.0148 149---150 151## ⚠️ Known Issues & Workarounds152 153### 1. Slow Dashboard Loading154 155**Issue**: Grafana dashboards slow on Monday mornings156**Workaround**: Wait 5 min after 08:00 UTC for cache warm-up157**Ticket**: OPS-456 (P3)158 159### 2. Flaky Integration Test160 161**Issue**: `test_payment_flow` fails intermittently in CI162**Workaround**: Re-run failed job (usually passes on retry)163**Ticket**: ENG-1200 (P2)164 165---166 167## 📅 Upcoming Events168 169| Date        | Event                | Impact              | Contact       |170| ----------- | -------------------- | ------------------- | ------------- |171| 01/23 02:00 | Database maintenance | 5 min read-only     | @dba-team     |172| 01/24 14:00 | Major release v5.0   | Monitor closely     | @release-team |173| 01/25       | Marketing campaign   | 2x traffic expected | @platform     |174 175---176 177## 📞 Escalation Reminders178 179| Issue Type      | First Escalation     | Second Escalation |180| --------------- | -------------------- | ----------------- |181| Payment issues  | @payments-oncall     | @payments-manager |182| Auth issues     | @auth-oncall         | @security-team    |183| Database issues | @dba-team            | @infra-manager    |184| Unknown/severe  | @engineering-manager | @vp-engineering   |185 186---187 188## 🔧 Quick Reference189 190### Common Commands191 192```bash193# Check service health194kubectl get pods -A | grep -v Running195 196# Recent deployments197kubectl get events --sort-by='.lastTimestamp' | tail -20198 199# Database connections200psql -c "SELECT count(*) FROM pg_stat_activity;"201 202# Clear cache (emergency only)203redis-cli FLUSHDB204```205````206 207### Important Links208 209- [Runbooks](https://wiki/runbooks)210- [Service Catalog](https://wiki/services)211- [Incident Slack](https://slack.com/incidents)212- [PagerDuty](https://pagerduty.com/schedules)213 214---215 216## Handoff Checklist217 218### Outgoing Engineer219 220- [x] Document active incidents221- [x] Document ongoing investigations222- [x] List recent changes223- [x] Note known issues224- [x] Add upcoming events225- [x] Sync with incoming engineer226 227### Incoming Engineer228 229- [ ] Read this document230- [ ] Join sync call231- [ ] Verify PagerDuty is routing to you232- [ ] Verify Slack notifications working233- [ ] Check VPN/access working234- [ ] Review critical dashboards235 236````237 238### Template 2: Quick Handoff (Async)239 240```markdown241# Quick Handoff: @alice → @bob242 243## TL;DR244- No active incidents245- 1 investigation ongoing (API timeouts, see ENG-1234)246- Major release tomorrow (01/24) - be ready for issues247 248## Watch List2491. API latency around 02:00-03:00 UTC (backup window)2502. Auth service memory (restart if > 80%)251 252## Recent253- Deployed api-gateway v3.2.1 yesterday (stable)254- Increased rate limits to 1500 RPS255 256## Coming Up257- 01/23 02:00 - DB maintenance (5 min read-only)258- 01/24 14:00 - v5.0 release259 260## Questions?261I'll be available on Slack until 17:00 today.262````263 264### Template 3: Incident Handoff (Mid-Incident)265 266```markdown267# INCIDENT HANDOFF: Payment Service Degradation268 269**Incident Start**: 2024-01-22 08:15 UTC270**Current Status**: Mitigating271**Severity**: SEV2272 273---274 275## Current State276 277- Error rate: 15% (down from 40%)278- Mitigation in progress: scaling up pods279- ETA to resolution: ~30 min280 281## What We Know282 2831. Root cause: Memory pressure on payment-service pods2842. Triggered by: Unusual traffic spike (3x normal)2853. Contributing: Inefficient query in checkout flow286 287## What We've Done288 289- Scaled payment-service from 5 → 15 pods290- Enabled rate limiting on checkout endpoint291- Disabled non-critical features292 293## What Needs to Happen294 2951. Monitor error rate - should reach <1% in ~15 min2962. If not improving, escalate to @payments-manager2973. Once stable, begin root cause investigation298 299## Key People300 301- Incident Commander: @alice (handing off)302- Comms Lead: @charlie303- Technical Lead: @bob (incoming)304 305## Communication306 307- Status page: Updated at 08:45308- Customer support: Notified309- Exec team: Aware310 311## Troubleshooting312 313**Incoming engineer misses a critical issue because the handoff document was incomplete.**314Use the outgoing checklist as a gate: do not mark handoff complete until every section has at least one entry (or an explicit "none"). Make incomplete handoffs a blameless postmortem action item.315 316**A 30-minute sync call is not possible due to timezone gaps.**317Fall back to the async quick handoff template (Template 2). Supplement with a short Loom or voice memo walking through the watch list. Ensure the incoming engineer has a direct contact method if they have follow-up questions.318 319**The incoming engineer inherits a mid-incident and is immediately overwhelmed.**320Use the incident handoff template (Template 3) specifically. The outgoing engineer should remain available on Slack for 15 minutes after handoff, even if off-call, to answer clarifying questions.321 322**On-call handoff documents are inconsistently formatted across teams.**323Adopt the shift handoff template organization-wide and store completed handoffs in a shared location (wiki, Notion, Confluence). Link each handoff from the on-call schedule entry in PagerDuty.324 325**Incoming engineer cannot verify their alerting is working before the outgoing engineer logs off.**326Add a standard step: outgoing engineer fires a test alert and confirms incoming engineer receives it in PagerDuty and Slack before ending the overlap window.327 328## Related Skills329 330- [incident-classification](../../skills/incident-classification/SKILL.md) — Classify and prioritize incidents that need to be included in the handoff document331- [postmortem-facilitation](../../skills/postmortem-facilitation/SKILL.md) — Turn resolved incidents from the shift into structured postmortems

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app