Last updated: March 22, 2026

A playbook is a documented procedure that any team member can execute without needing the author present. For remote teams across time zones, good playbooks turn what would be a “wake someone up” situation into a “follow these steps” situation. The difference is in the structure and completeness of the template.

This guide provides templates for three core playbook types: incident response, deployment, and onboarding.


Incident Response Playbook Template

Incident - [INCIDENT-NAME]

Severity - P1 / P2 / P3
Status - Active / Resolved
Incident Commander - @[owner]
Started - YYYY-MM-DD HH:MM UTC
Resolved - YYYY-MM-DD HH:MM UTC (fill when resolved)

---

What Is Happening

One paragraph plain-language description of the incident. What is affected? Who is affected? What is the user-visible impact?

> Example - "The payments API is returning 502 errors for ~40% of checkout attempts. Approximately 200 users per hour are unable to complete purchases. The error started at 14:23 UTC."

Current Status

> Example - "Identified root cause (database connection pool exhausted). Implementing fix. ETA 30 minutes."

Timeline

| Time (UTC) | Event |
|------------|-------|
| 14:23 | First error alerts fired |
| 14:31 | On-call acknowledged |
| 14:45 | Root cause identified |
| 15:10 | Fix deployed |
| 15:15 | Confirmed resolved |

Impact

- Services affected: payments-api, checkout-frontend
- Error rate: ~40%
- Users affected: ~200/hour
- Revenue impact: ~$4,000/hour estimate
- External customers notified: [Yes/No] via [status.yourcompany.com]

---

Diagnosis Steps

Run these commands to gather context:

```bash
Check service health
kubectl get pods -n production | grep payments

View recent error logs
kubectl logs -n production deployment/payments-api --since=30m | grep ERROR | tail -50

Check database connectivity
kubectl exec -n production deployment/payments-api -- \
 pg_isready -h $DB_HOST -p 5432

View active DB connections
psql -h $DB_HOST -U postgres -c \
 "SELECT count(*), state FROM pg_stat_activity GROUP BY state;"

Mitigation Options

Option Risk ETA Steps
Restart pods Low 2 min kubectl rollout restart deployment/payments-api -n production
Roll back deployment Low 5 min kubectl rollout undo deployment/payments-api -n production
Increase DB pool size Medium 15 min Edit DB_POOL_SIZE env var and redeploy
Enable maintenance mode Medium 2 min Set MAINTENANCE_MODE=true in config and redeploy

Resolution

What was done to resolve the incident. What was the root cause?

Follow-up Actions


Deployment Playbook Template

Deployment - [SERVICE-NAME] v[VERSION]

Deployer - @[name]
Date - YYYY-MM-DD
Environment - staging / production
Deploy type - Standard / Hotfix / Rollback
PR / Release - [link]

---

Pre-Deployment Checklist

Code
- [ ] PR approved by required reviewers
- [ ] All CI checks passing
- [ ] CHANGELOG updated
- [ ] Migration scripts reviewed (if applicable)

Staging Verified
- [ ] Deployed to staging successfully
- [ ] Smoke tests passing on staging
- [ ] New feature tested on staging

Dependencies
- [ ] Dependent services notified
- [ ] External APIs/webhooks compatible with new version
- [ ] Feature flags configured for gradual rollout

Rollback Plan
- [ ] Previous version noted: `v[PREVIOUS_VERSION]`
- [ ] Rollback command tested: `kubectl rollout undo deployment/[service]`
- [ ] Database migrations are reversible: [Yes / No. explain if No]

---

Deployment Steps

1. Announce

Post in #deployments Slack channel:

Deploying [service] v[version] to production. Changes - [brief summary] Risk - Low / Medium / High Rollback ready - yes


2. Deploy

```bash
Tag and push (if not automated)
git tag v[version]
git push origin v[version]

Trigger deploy (if manual)
kubectl set image deployment/[service] [service]=[registry]/[service]:v[version] -n production

Wait for rollout
kubectl rollout status deployment/[service] -n production --timeout=5m
  1. Verify
Check pods are running
kubectl get pods -n production -l app=[service]

Confirm new version
kubectl describe deployment/[service] -n production | grep Image

Check error rate (first 5 minutes)
Run this every 60 seconds x5
curl -s "https://monitoring.yourcompany.com/api/v1/query?query=rate(http_requests_total{service='[service]',status=~'5..'}[1m])" \
 | jq '.data.result[0].value[1]'
  1. Post-Deploy

Rollback Procedure

If error rate increases or critical errors appear:

Immediate rollback
kubectl rollout undo deployment/[service] -n production
kubectl rollout status deployment/[service] -n production

Verify rollback
kubectl describe deployment/[service] -n production | grep Image

Post in #deployments:

ROLLBACK - [service] rolled back to v[previous_version].
Reason - [brief explanation]
Investigation ongoing in #incidents

---

Onboarding Playbook Template

```markdown
Onboarding - [ENGINEER_NAME]

Start Date - YYYY-MM-DD
Role - [role]
Manager - @[manager]
Buddy - @[buddy]
Team - [team name]

---

Week 1 - Foundation

Day 1. Access and Setup

IT/Admin tasks (Manager)
- [ ] Google Workspace account created
- [ ] GitHub org invitation sent
- [ ] Slack invitation sent
- [ ] 1Password team invitation sent
- [ ] PagerDuty account created (if on-call eligible)
- [ ] AWS/GCP/Azure console access configured

Environment setup (New Hire)

```bash
Clone the onboarding repo for setup scripts
git clone git@github.com:your-org/onboarding.git
cd onboarding && make setup

Day 2-3. Codebase Orientation

Day 4-5. Process and Context


Week 2 - Contributing


Ongoing - 30/60/90 Day Goals

30 Days

60 Days

90 Days


Key Resources

| Resource | Link | |———-|——| | Architecture overview | [link] | | API documentation | [link] | | Deployment runbook | [link] | | Incident response playbook | [link] | | On-call rotation | [link] | | Team calendar | [link] |


---

Storing and Accessing Playbooks

Playbooks rot if they're not maintained. The best storage is wherever your team already looks:

```bash
Notion database with properties
Title - [text]
Type - [Incident / Deployment / Onboarding / Process]
Owner - [person]
Last Reviewed - [date]
Status - [Active / Draft / Archived]

GitHub Wiki (version-controlled)
docs/
 playbooks/
    incident-response.md
    deployment-standard.md
    onboarding.md

Trigger playbook reminders via GitHub Actions to review stale playbooks:

name: Playbook Review Reminder
on:
  schedule:
    - cron: '0 9 1 * *'  # First of each month

jobs:
  remind:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Check for stale playbooks
        run: |
          find docs/playbooks/ -name "*.md" -mtime +90 \
            -exec echo "STALE PLAYBOOK: {}" \; \
            | while read msg; do
                curl -s -X POST \
                  -H 'Content-type: application/json' \
                  --data "{\"text\":\"$msg. please review and update\"}" \
                  "${{ secrets.SLACK_WEBHOOK }}"
              done

Related Reading


Related Articles