npx skills add https://github.com/wshobson/agents --skill grafana-dashboardsHow Grafana Dashboards fits into a Paperclip company.
Grafana Dashboards drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
SKILL.md382 linesExpandCollapse
---name: grafana-dashboardsdescription: Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.--- # Grafana Dashboards Create and manage production-ready Grafana dashboards for comprehensive system observability. ## Purpose Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics. ## When to Use - Visualize Prometheus metrics- Create custom dashboards- Implement SLO dashboards- Monitor infrastructure- Track business KPIs ## Dashboard Design Principles ### 1. Hierarchy of Information ```┌─────────────────────────────────────┐│ Critical Metrics (Big Numbers) │├─────────────────────────────────────┤│ Key Trends (Time Series) │├─────────────────────────────────────┤│ Detailed Metrics (Tables/Heatmaps) │└─────────────────────────────────────┘``` ### 2. RED Method (Services) - **Rate** - Requests per second- **Errors** - Error rate- **Duration** - Latency/response time ### 3. USE Method (Resources) - **Utilization** - % time resource is busy- **Saturation** - Queue length/wait time- **Errors** - Error count ## Dashboard Structure ### API Monitoring Dashboard ```json{ "dashboard": { "title": "API Monitoring", "tags": ["api", "production"], "timezone": "browser", "refresh": "30s", "panels": [ { "title": "Request Rate", "type": "graph", "targets": [ { "expr": "sum(rate(http_requests_total[5m])) by (service)", "legendFormat": "{{service}}" } ], "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 } }, { "title": "Error Rate %", "type": "graph", "targets": [ { "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100", "legendFormat": "Error Rate" } ], "alert": { "conditions": [ { "evaluator": { "params": [5], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": ["A", "5m", "now"] }, "type": "query" } ] }, "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 } }, { "title": "P95 Latency", "type": "graph", "targets": [ { "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))", "legendFormat": "{{service}}" } ], "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 } } ] }}``` **Reference:** See `assets/api-dashboard.json` ## Panel Types ### 1. Stat Panel (Single Value) ```json{ "type": "stat", "title": "Total Requests", "targets": [ { "expr": "sum(http_requests_total)" } ], "options": { "reduceOptions": { "values": false, "calcs": ["lastNotNull"] }, "orientation": "auto", "textMode": "auto", "colorMode": "value" }, "fieldConfig": { "defaults": { "thresholds": { "mode": "absolute", "steps": [ { "value": 0, "color": "green" }, { "value": 80, "color": "yellow" }, { "value": 90, "color": "red" } ] } } }}``` ### 2. Time Series Graph ```json{ "type": "graph", "title": "CPU Usage", "targets": [ { "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)" } ], "yaxes": [ { "format": "percent", "max": 100, "min": 0 }, { "format": "short" } ]}``` ### 3. Table Panel ```json{ "type": "table", "title": "Service Status", "targets": [ { "expr": "up", "format": "table", "instant": true } ], "transformations": [ { "id": "organize", "options": { "excludeByName": { "Time": true }, "indexByName": {}, "renameByName": { "instance": "Instance", "job": "Service", "Value": "Status" } } } ]}``` ### 4. Heatmap ```json{ "type": "heatmap", "title": "Latency Heatmap", "targets": [ { "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)", "format": "heatmap" } ], "dataFormat": "tsbuckets", "yAxis": { "format": "s" }}``` ## Variables ### Query Variables ```json{ "templating": { "list": [ { "name": "namespace", "type": "query", "datasource": "Prometheus", "query": "label_values(kube_pod_info, namespace)", "refresh": 1, "multi": false }, { "name": "service", "type": "query", "datasource": "Prometheus", "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)", "refresh": 1, "multi": true } ] }}``` ### Use Variables in Queries ```sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))``` ## Alerts in Dashboards ```json{ "alert": { "name": "High Error Rate", "conditions": [ { "evaluator": { "params": [5], "type": "gt" }, "operator": { "type": "and" }, "query": { "params": ["A", "5m", "now"] }, "reducer": { "type": "avg" }, "type": "query" } ], "executionErrorState": "alerting", "for": "5m", "frequency": "1m", "message": "Error rate is above 5%", "noDataState": "no_data", "notifications": [{ "uid": "slack-channel" }] }}``` ## Dashboard Provisioning **dashboards.yml:** ```yamlapiVersion: 1 providers: - name: "default" orgId: 1 folder: "General" type: file disableDeletion: false updateIntervalSeconds: 10 allowUiUpdates: true options: path: /etc/grafana/dashboards``` ## Common Dashboard Patterns ### Infrastructure Dashboard **Key Panels:** - CPU utilization per node- Memory usage per node- Disk I/O- Network traffic- Pod count by namespace- Node status **Reference:** See `assets/infrastructure-dashboard.json` ### Database Dashboard **Key Panels:** - Queries per second- Connection pool usage- Query latency (P50, P95, P99)- Active connections- Database size- Replication lag- Slow queries **Reference:** See `assets/database-dashboard.json` ### Application Dashboard **Key Panels:** - Request rate- Error rate- Response time (percentiles)- Active users/sessions- Cache hit rate- Queue length ## Best Practices 1. **Start with templates** (Grafana community dashboards)2. **Use consistent naming** for panels and variables3. **Group related metrics** in rows4. **Set appropriate time ranges** (default: Last 6 hours)5. **Use variables** for flexibility6. **Add panel descriptions** for context7. **Configure units** correctly8. **Set meaningful thresholds** for colors9. **Use consistent colors** across dashboards10. **Test with different time ranges** ## Dashboard as Code ### Terraform Provisioning ```hclresource "grafana_dashboard" "api_monitoring" { config_json = file("${path.module}/dashboards/api-monitoring.json") folder = grafana_folder.monitoring.id} resource "grafana_folder" "monitoring" { title = "Production Monitoring"}``` ### Ansible Provisioning ```yaml- name: Deploy Grafana dashboards copy: src: "{{ item }}" dest: /etc/grafana/dashboards/ with_fileglob: - "dashboards/*.json" notify: restart grafana``` ## Related Skills - `prometheus-configuration` - For metric collection- `slo-implementation` - For SLO dashboardsAccessibility Compliance
This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns
If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration
Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app