How Slo Implementation fits into a Paperclip company.

Slo Implementation drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md333 linesmarkdown

Expand

1---2name: slo-implementation3description: Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.4---5 6# SLO Implementation7 8Framework for defining and implementing Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.9 10## Purpose11 12Implement measurable reliability targets using SLIs, SLOs, and error budgets to balance reliability with innovation velocity.13 14## When to Use15 16- Define service reliability targets17- Measure user-perceived reliability18- Implement error budgets19- Create SLO-based alerts20- Track reliability goals21 22## SLI/SLO/SLA Hierarchy23 24```25SLA (Service Level Agreement)26  ↓ Contract with customers27SLO (Service Level Objective)28  ↓ Internal reliability target29SLI (Service Level Indicator)30  ↓ Actual measurement31```32 33## Defining SLIs34 35### Common SLI Types36 37#### 1. Availability SLI38 39```promql40# Successful requests / Total requests41sum(rate(http_requests_total{status!~"5.."}[28d]))42/43sum(rate(http_requests_total[28d]))44```45 46#### 2. Latency SLI47 48```promql49# Requests below latency threshold / Total requests50sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))51/52sum(rate(http_request_duration_seconds_count[28d]))53```54 55#### 3. Durability SLI56 57```58# Successful writes / Total writes59sum(storage_writes_successful_total)60/61sum(storage_writes_total)62```63 64**Reference:** See `references/slo-definitions.md`65 66## Setting SLO Targets67 68### Availability SLO Examples69 70| SLO %  | Downtime/Month | Downtime/Year |71| ------ | -------------- | ------------- |72| 99%    | 7.2 hours      | 3.65 days     |73| 99.9%  | 43.2 minutes   | 8.76 hours    |74| 99.95% | 21.6 minutes   | 4.38 hours    |75| 99.99% | 4.32 minutes   | 52.56 minutes |76 77### Choose Appropriate SLOs78 79**Consider:**80 81- User expectations82- Business requirements83- Current performance84- Cost of reliability85- Competitor benchmarks86 87**Example SLOs:**88 89```yaml90slos:91  - name: api_availability92    target: 99.993    window: 28d94    sli: |95      sum(rate(http_requests_total{status!~"5.."}[28d]))96      /97      sum(rate(http_requests_total[28d]))98 99  - name: api_latency_p95100    target: 99101    window: 28d102    sli: |103      sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))104      /105      sum(rate(http_request_duration_seconds_count[28d]))106```107 108## Error Budget Calculation109 110### Error Budget Formula111 112```113Error Budget = 1 - SLO Target114```115 116**Example:**117 118- SLO: 99.9% availability119- Error Budget: 0.1% = 43.2 minutes/month120- Current Error: 0.05% = 21.6 minutes/month121- Remaining Budget: 50%122 123### Error Budget Policy124 125```yaml126error_budget_policy:127  - remaining_budget: 100%128    action: Normal development velocity129  - remaining_budget: 50%130    action: Consider postponing risky changes131  - remaining_budget: 10%132    action: Freeze non-critical changes133  - remaining_budget: 0%134    action: Feature freeze, focus on reliability135```136 137**Reference:** See `references/error-budget.md`138 139## SLO Implementation140 141### Prometheus Recording Rules142 143```yaml144# SLI Recording Rules145groups:146  - name: sli_rules147    interval: 30s148    rules:149      # Availability SLI150      - record: sli:http_availability:ratio151        expr: |152          sum(rate(http_requests_total{status!~"5.."}[28d]))153          /154          sum(rate(http_requests_total[28d]))155 156      # Latency SLI (requests < 500ms)157      - record: sli:http_latency:ratio158        expr: |159          sum(rate(http_request_duration_seconds_bucket{le="0.5"}[28d]))160          /161          sum(rate(http_request_duration_seconds_count[28d]))162 163  - name: slo_rules164    interval: 5m165    rules:166      # SLO compliance (1 = meeting SLO, 0 = violating)167      - record: slo:http_availability:compliance168        expr: sli:http_availability:ratio >= bool 0.999169 170      - record: slo:http_latency:compliance171        expr: sli:http_latency:ratio >= bool 0.99172 173      # Error budget remaining (percentage)174      - record: slo:http_availability:error_budget_remaining175        expr: |176          (sli:http_availability:ratio - 0.999) / (1 - 0.999) * 100177 178      # Error budget burn rate179      - record: slo:http_availability:burn_rate_5m180        expr: |181          (1 - (182            sum(rate(http_requests_total{status!~"5.."}[5m]))183            /184            sum(rate(http_requests_total[5m]))185          )) / (1 - 0.999)186```187 188### SLO Alerting Rules189 190```yaml191groups:192  - name: slo_alerts193    interval: 1m194    rules:195      # Fast burn: 14.4x rate, 1 hour window196      # Consumes 2% error budget in 1 hour197      - alert: SLOErrorBudgetBurnFast198        expr: |199          slo:http_availability:burn_rate_1h > 14.4200          and201          slo:http_availability:burn_rate_5m > 14.4202        for: 2m203        labels:204          severity: critical205        annotations:206          summary: "Fast error budget burn detected"207          description: "Error budget burning at {{ $value }}x rate"208 209      # Slow burn: 6x rate, 6 hour window210      # Consumes 5% error budget in 6 hours211      - alert: SLOErrorBudgetBurnSlow212        expr: |213          slo:http_availability:burn_rate_6h > 6214          and215          slo:http_availability:burn_rate_30m > 6216        for: 15m217        labels:218          severity: warning219        annotations:220          summary: "Slow error budget burn detected"221          description: "Error budget burning at {{ $value }}x rate"222 223      # Error budget exhausted224      - alert: SLOErrorBudgetExhausted225        expr: slo:http_availability:error_budget_remaining < 0226        for: 5m227        labels:228          severity: critical229        annotations:230          summary: "SLO error budget exhausted"231          description: "Error budget remaining: {{ $value }}%"232```233 234## SLO Dashboard235 236**Grafana Dashboard Structure:**237 238```239┌────────────────────────────────────┐240│ SLO Compliance (Current)           │241│ ✓ 99.95% (Target: 99.9%)          │242├────────────────────────────────────┤243│ Error Budget Remaining: 65%        │244│ ████████░░ 65%                     │245├────────────────────────────────────┤246│ SLI Trend (28 days)                │247│ [Time series graph]                │248├────────────────────────────────────┤249│ Burn Rate Analysis                 │250│ [Burn rate by time window]         │251└────────────────────────────────────┘252```253 254**Example Queries:**255 256```promql257# Current SLO compliance258sli:http_availability:ratio * 100259 260# Error budget remaining261slo:http_availability:error_budget_remaining262 263# Days until error budget exhausted (at current burn rate)264(slo:http_availability:error_budget_remaining / 100)265*26628267/268(1 - sli:http_availability:ratio) * (1 - 0.999)269```270 271## Multi-Window Burn Rate Alerts272 273```yaml274# Combination of short and long windows reduces false positives275rules:276  - alert: SLOBurnRateHigh277    expr: |278      (279        slo:http_availability:burn_rate_1h > 14.4280        and281        slo:http_availability:burn_rate_5m > 14.4282      )283      or284      (285        slo:http_availability:burn_rate_6h > 6286        and287        slo:http_availability:burn_rate_30m > 6288      )289    labels:290      severity: critical291```292 293## SLO Review Process294 295### Weekly Review296 297- Current SLO compliance298- Error budget status299- Trend analysis300- Incident impact301 302### Monthly Review303 304- SLO achievement305- Error budget usage306- Incident postmortems307- SLO adjustments308 309### Quarterly Review310 311- SLO relevance312- Target adjustments313- Process improvements314- Tooling enhancements315 316## Best Practices317 3181. **Start with user-facing services**3192. **Use multiple SLIs** (availability, latency, etc.)3203. **Set achievable SLOs** (don't aim for 100%)3214. **Implement multi-window alerts** to reduce noise3225. **Track error budget** consistently3236. **Review SLOs regularly**3247. **Document SLO decisions**3258. **Align with business goals**3269. **Automate SLO reporting**32710. **Use SLOs for prioritization**328 329 330## Related Skills331 332- `prometheus-configuration` - For metric collection333- `grafana-dashboards` - For SLO visualization

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app