Name: Prometheus Configuration
Author: Wshobson

Install

Terminal · npx

$npx skills add https://github.com/wshobson/agents --skill prometheus-configuration

Works with Paperclip

How Prometheus Configuration fits into a Paperclip company.

Prometheus Configuration drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md394 linesmarkdown

Expand

1---2name: prometheus-configuration3description: Set up Prometheus for comprehensive metric collection, storage, and monitoring of infrastructure and applications. Use when implementing metrics collection, setting up monitoring infrastructure, or configuring alerting systems.4---5 6# Prometheus Configuration7 8Complete guide to Prometheus setup, metric collection, scrape configuration, and recording rules.9 10## Purpose11 12Configure Prometheus for comprehensive metric collection, alerting, and monitoring of infrastructure and applications.13 14## When to Use15 16- Set up Prometheus monitoring17- Configure metric scraping18- Create recording rules19- Design alert rules20- Implement service discovery21 22## Prometheus Architecture23 24```25┌──────────────┐26│ Applications │ ← Instrumented with client libraries27└──────┬───────┘28       │ /metrics endpoint29       ↓30┌──────────────┐31│  Prometheus  │ ← Scrapes metrics periodically32│    Server    │33└──────┬───────┘34       │35       ├─→ AlertManager (alerts)36       ├─→ Grafana (visualization)37       └─→ Long-term storage (Thanos/Cortex)38```39 40## Installation41 42### Kubernetes with Helm43 44```bash45helm repo add prometheus-community https://prometheus-community.github.io/helm-charts46helm repo update47 48helm install prometheus prometheus-community/kube-prometheus-stack \49  --namespace monitoring \50  --create-namespace \51  --set prometheus.prometheusSpec.retention=30d \52  --set prometheus.prometheusSpec.storageVolumeSize=50Gi53```54 55### Docker Compose56 57```yaml58version: "3.8"59services:60  prometheus:61    image: prom/prometheus:latest62    ports:63      - "9090:9090"64    volumes:65      - ./prometheus.yml:/etc/prometheus/prometheus.yml66      - prometheus-data:/prometheus67    command:68      - "--config.file=/etc/prometheus/prometheus.yml"69      - "--storage.tsdb.path=/prometheus"70      - "--storage.tsdb.retention.time=30d"71 72volumes:73  prometheus-data:74```75 76## Configuration File77 78**prometheus.yml:**79 80```yaml81global:82  scrape_interval: 15s83  evaluation_interval: 15s84  external_labels:85    cluster: "production"86    region: "us-west-2"87 88# Alertmanager configuration89alerting:90  alertmanagers:91    - static_configs:92        - targets:93            - alertmanager:909394 95# Load rules files96rule_files:97  - /etc/prometheus/rules/*.yml98 99# Scrape configurations100scrape_configs:101  # Prometheus itself102  - job_name: "prometheus"103    static_configs:104      - targets: ["localhost:9090"]105 106  # Node exporters107  - job_name: "node-exporter"108    static_configs:109      - targets:110          - "node1:9100"111          - "node2:9100"112          - "node3:9100"113    relabel_configs:114      - source_labels: [__address__]115        target_label: instance116        regex: "([^:]+)(:[0-9]+)?"117        replacement: "${1}"118 119  # Kubernetes pods with annotations120  - job_name: "kubernetes-pods"121    kubernetes_sd_configs:122      - role: pod123    relabel_configs:124      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]125        action: keep126        regex: true127      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]128        action: replace129        target_label: __metrics_path__130        regex: (.+)131      - source_labels:132          [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]133        action: replace134        regex: ([^:]+)(?::\d+)?;(\d+)135        replacement: $1:$2136        target_label: __address__137      - source_labels: [__meta_kubernetes_namespace]138        action: replace139        target_label: namespace140      - source_labels: [__meta_kubernetes_pod_name]141        action: replace142        target_label: pod143 144  # Application metrics145  - job_name: "my-app"146    static_configs:147      - targets:148          - "app1.example.com:9090"149          - "app2.example.com:9090"150    metrics_path: "/metrics"151    scheme: "https"152    tls_config:153      ca_file: /etc/prometheus/ca.crt154      cert_file: /etc/prometheus/client.crt155      key_file: /etc/prometheus/client.key156```157 158**Reference:** See `assets/prometheus.yml.template`159 160## Scrape Configurations161 162### Static Targets163 164```yaml165scrape_configs:166  - job_name: "static-targets"167    static_configs:168      - targets: ["host1:9100", "host2:9100"]169        labels:170          env: "production"171          region: "us-west-2"172```173 174### File-based Service Discovery175 176```yaml177scrape_configs:178  - job_name: "file-sd"179    file_sd_configs:180      - files:181          - /etc/prometheus/targets/*.json182          - /etc/prometheus/targets/*.yml183        refresh_interval: 5m184```185 186**targets/production.json:**187 188```json189[190  {191    "targets": ["app1:9090", "app2:9090"],192    "labels": {193      "env": "production",194      "service": "api"195    }196  }197]198```199 200### Kubernetes Service Discovery201 202```yaml203scrape_configs:204  - job_name: "kubernetes-services"205    kubernetes_sd_configs:206      - role: service207    relabel_configs:208      - source_labels:209          [__meta_kubernetes_service_annotation_prometheus_io_scrape]210        action: keep211        regex: true212      - source_labels:213          [__meta_kubernetes_service_annotation_prometheus_io_scheme]214        action: replace215        target_label: __scheme__216        regex: (https?)217      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]218        action: replace219        target_label: __metrics_path__220        regex: (.+)221```222 223**Reference:** See `references/scrape-configs.md`224 225## Recording Rules226 227Create pre-computed metrics for frequently queried expressions:228 229```yaml230# /etc/prometheus/rules/recording_rules.yml231groups:232  - name: api_metrics233    interval: 15s234    rules:235      # HTTP request rate per service236      - record: job:http_requests:rate5m237        expr: sum by (job) (rate(http_requests_total[5m]))238 239      # Error rate percentage240      - record: job:http_requests_errors:rate5m241        expr: sum by (job) (rate(http_requests_total{status=~"5.."}[5m]))242 243      - record: job:http_requests_error_rate:percentage244        expr: |245          (job:http_requests_errors:rate5m / job:http_requests:rate5m) * 100246 247      # P95 latency248      - record: job:http_request_duration:p95249        expr: |250          histogram_quantile(0.95,251            sum by (job, le) (rate(http_request_duration_seconds_bucket[5m]))252          )253 254  - name: resource_metrics255    interval: 30s256    rules:257      # CPU utilization percentage258      - record: instance:node_cpu:utilization259        expr: |260          100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)261 262      # Memory utilization percentage263      - record: instance:node_memory:utilization264        expr: |265          100 - ((node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100)266 267      # Disk usage percentage268      - record: instance:node_disk:utilization269        expr: |270          100 - ((node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100)271```272 273**Reference:** See `references/recording-rules.md`274 275## Alert Rules276 277```yaml278# /etc/prometheus/rules/alert_rules.yml279groups:280  - name: availability281    interval: 30s282    rules:283      - alert: ServiceDown284        expr: up{job="my-app"} == 0285        for: 1m286        labels:287          severity: critical288        annotations:289          summary: "Service {{ $labels.instance }} is down"290          description: "{{ $labels.job }} has been down for more than 1 minute"291 292      - alert: HighErrorRate293        expr: job:http_requests_error_rate:percentage > 5294        for: 5m295        labels:296          severity: warning297        annotations:298          summary: "High error rate for {{ $labels.job }}"299          description: "Error rate is {{ $value }}% (threshold: 5%)"300 301      - alert: HighLatency302        expr: job:http_request_duration:p95 > 1303        for: 5m304        labels:305          severity: warning306        annotations:307          summary: "High latency for {{ $labels.job }}"308          description: "P95 latency is {{ $value }}s (threshold: 1s)"309 310  - name: resources311    interval: 1m312    rules:313      - alert: HighCPUUsage314        expr: instance:node_cpu:utilization > 80315        for: 5m316        labels:317          severity: warning318        annotations:319          summary: "High CPU usage on {{ $labels.instance }}"320          description: "CPU usage is {{ $value }}%"321 322      - alert: HighMemoryUsage323        expr: instance:node_memory:utilization > 85324        for: 5m325        labels:326          severity: warning327        annotations:328          summary: "High memory usage on {{ $labels.instance }}"329          description: "Memory usage is {{ $value }}%"330 331      - alert: DiskSpaceLow332        expr: instance:node_disk:utilization > 90333        for: 5m334        labels:335          severity: critical336        annotations:337          summary: "Low disk space on {{ $labels.instance }}"338          description: "Disk usage is {{ $value }}%"339```340 341## Validation342 343```bash344# Validate configuration345promtool check config prometheus.yml346 347# Validate rules348promtool check rules /etc/prometheus/rules/*.yml349 350# Test query351promtool query instant http://localhost:9090 'up'352```353 354**Reference:** See `scripts/validate-prometheus.sh`355 356## Best Practices357 3581. **Use consistent naming** for metrics (prefix_name_unit)3592. **Set appropriate scrape intervals** (15-60s typical)3603. **Use recording rules** for expensive queries3614. **Implement high availability** (multiple Prometheus instances)3625. **Configure retention** based on storage capacity3636. **Use relabeling** for metric cleanup3647. **Monitor Prometheus itself**3658. **Implement federation** for large deployments3669. **Use Thanos/Cortex** for long-term storage36710. **Document custom metrics**368 369## Troubleshooting370 371**Check scrape targets:**372 373```bash374curl http://localhost:9090/api/v1/targets375```376 377**Check configuration:**378 379```bash380curl http://localhost:9090/api/v1/status/config381```382 383**Test query:**384 385```bash386curl 'http://localhost:9090/api/v1/query?query=up'387```388 389 390## Related Skills391 392- `grafana-dashboards` - For visualization393- `slo-implementation` - For SLO monitoring394- `distributed-tracing` - For request tracing

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app