Claude Agent Skill · by Wshobson

Grafana Dashboards

Creates production-ready Grafana dashboards with proper JSON structure, variable templating, and alert configuration. Follows RED/USE monitoring methodologies a

Install
Terminal · npx
$npx skills add https://github.com/wshobson/agents --skill grafana-dashboards
Works with Paperclip

How Grafana Dashboards fits into a Paperclip company.

Grafana Dashboards drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

S
SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59
Explore pack
Source file
SKILL.md382 lines
Expand
---name: grafana-dashboardsdescription: Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.--- # Grafana Dashboards Create and manage production-ready Grafana dashboards for comprehensive system observability. ## Purpose Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics. ## When to Use - Visualize Prometheus metrics- Create custom dashboards- Implement SLO dashboards- Monitor infrastructure- Track business KPIs ## Dashboard Design Principles ### 1. Hierarchy of Information ```┌─────────────────────────────────────┐│  Critical Metrics (Big Numbers)     │├─────────────────────────────────────┤│  Key Trends (Time Series)           │├─────────────────────────────────────┤│  Detailed Metrics (Tables/Heatmaps) │└─────────────────────────────────────┘``` ### 2. RED Method (Services) - **Rate** - Requests per second- **Errors** - Error rate- **Duration** - Latency/response time ### 3. USE Method (Resources) - **Utilization** - % time resource is busy- **Saturation** - Queue length/wait time- **Errors** - Error count ## Dashboard Structure ### API Monitoring Dashboard ```json{  "dashboard": {    "title": "API Monitoring",    "tags": ["api", "production"],    "timezone": "browser",    "refresh": "30s",    "panels": [      {        "title": "Request Rate",        "type": "graph",        "targets": [          {            "expr": "sum(rate(http_requests_total[5m])) by (service)",            "legendFormat": "{{service}}"          }        ],        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }      },      {        "title": "Error Rate %",        "type": "graph",        "targets": [          {            "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",            "legendFormat": "Error Rate"          }        ],        "alert": {          "conditions": [            {              "evaluator": { "params": [5], "type": "gt" },              "operator": { "type": "and" },              "query": { "params": ["A", "5m", "now"] },              "type": "query"            }          ]        },        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }      },      {        "title": "P95 Latency",        "type": "graph",        "targets": [          {            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",            "legendFormat": "{{service}}"          }        ],        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }      }    ]  }}``` **Reference:** See `assets/api-dashboard.json` ## Panel Types ### 1. Stat Panel (Single Value) ```json{  "type": "stat",  "title": "Total Requests",  "targets": [    {      "expr": "sum(http_requests_total)"    }  ],  "options": {    "reduceOptions": {      "values": false,      "calcs": ["lastNotNull"]    },    "orientation": "auto",    "textMode": "auto",    "colorMode": "value"  },  "fieldConfig": {    "defaults": {      "thresholds": {        "mode": "absolute",        "steps": [          { "value": 0, "color": "green" },          { "value": 80, "color": "yellow" },          { "value": 90, "color": "red" }        ]      }    }  }}``` ### 2. Time Series Graph ```json{  "type": "graph",  "title": "CPU Usage",  "targets": [    {      "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"    }  ],  "yaxes": [    { "format": "percent", "max": 100, "min": 0 },    { "format": "short" }  ]}``` ### 3. Table Panel ```json{  "type": "table",  "title": "Service Status",  "targets": [    {      "expr": "up",      "format": "table",      "instant": true    }  ],  "transformations": [    {      "id": "organize",      "options": {        "excludeByName": { "Time": true },        "indexByName": {},        "renameByName": {          "instance": "Instance",          "job": "Service",          "Value": "Status"        }      }    }  ]}``` ### 4. Heatmap ```json{  "type": "heatmap",  "title": "Latency Heatmap",  "targets": [    {      "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",      "format": "heatmap"    }  ],  "dataFormat": "tsbuckets",  "yAxis": {    "format": "s"  }}``` ## Variables ### Query Variables ```json{  "templating": {    "list": [      {        "name": "namespace",        "type": "query",        "datasource": "Prometheus",        "query": "label_values(kube_pod_info, namespace)",        "refresh": 1,        "multi": false      },      {        "name": "service",        "type": "query",        "datasource": "Prometheus",        "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",        "refresh": 1,        "multi": true      }    ]  }}``` ### Use Variables in Queries ```sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))``` ## Alerts in Dashboards ```json{  "alert": {    "name": "High Error Rate",    "conditions": [      {        "evaluator": {          "params": [5],          "type": "gt"        },        "operator": { "type": "and" },        "query": {          "params": ["A", "5m", "now"]        },        "reducer": { "type": "avg" },        "type": "query"      }    ],    "executionErrorState": "alerting",    "for": "5m",    "frequency": "1m",    "message": "Error rate is above 5%",    "noDataState": "no_data",    "notifications": [{ "uid": "slack-channel" }]  }}``` ## Dashboard Provisioning **dashboards.yml:** ```yamlapiVersion: 1 providers:  - name: "default"    orgId: 1    folder: "General"    type: file    disableDeletion: false    updateIntervalSeconds: 10    allowUiUpdates: true    options:      path: /etc/grafana/dashboards``` ## Common Dashboard Patterns ### Infrastructure Dashboard **Key Panels:** - CPU utilization per node- Memory usage per node- Disk I/O- Network traffic- Pod count by namespace- Node status **Reference:** See `assets/infrastructure-dashboard.json` ### Database Dashboard **Key Panels:** - Queries per second- Connection pool usage- Query latency (P50, P95, P99)- Active connections- Database size- Replication lag- Slow queries **Reference:** See `assets/database-dashboard.json` ### Application Dashboard **Key Panels:** - Request rate- Error rate- Response time (percentiles)- Active users/sessions- Cache hit rate- Queue length ## Best Practices 1. **Start with templates** (Grafana community dashboards)2. **Use consistent naming** for panels and variables3. **Group related metrics** in rows4. **Set appropriate time ranges** (default: Last 6 hours)5. **Use variables** for flexibility6. **Add panel descriptions** for context7. **Configure units** correctly8. **Set meaningful thresholds** for colors9. **Use consistent colors** across dashboards10. **Test with different time ranges** ## Dashboard as Code ### Terraform Provisioning ```hclresource "grafana_dashboard" "api_monitoring" {  config_json = file("${path.module}/dashboards/api-monitoring.json")  folder      = grafana_folder.monitoring.id} resource "grafana_folder" "monitoring" {  title = "Production Monitoring"}``` ### Ansible Provisioning ```yaml- name: Deploy Grafana dashboards  copy:    src: "{{ item }}"    dest: /etc/grafana/dashboards/  with_fileglob:    - "dashboards/*.json"  notify: restart grafana```  ## Related Skills - `prometheus-configuration` - For metric collection- `slo-implementation` - For SLO dashboards