Name: Azure Resource Health Diagnose
Author: Github

Install

Terminal · npx

$npx skills add https://github.com/github/awesome-copilot --skill azure-resource-health-diagnose

Works with Paperclip

How Azure Resource Health Diagnose fits into a Paperclip company.

Azure Resource Health Diagnose drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md290 linesmarkdown

Expand

1---2name: azure-resource-health-diagnose3description: 'Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.'4---5 6# Azure Resource Health & Issue Diagnosis7 8This workflow analyzes a specific Azure resource to assess its health status, diagnose potential issues using logs and telemetry data, and develop a comprehensive remediation plan for any problems discovered.9 10## Prerequisites11- Azure MCP server configured and authenticated12- Target Azure resource identified (name and optionally resource group/subscription)13- Resource must be deployed and running to generate logs/telemetry14- Prefer Azure MCP tools (`azmcp-*`) over direct Azure CLI when available15 16## Workflow Steps17 18### Step 1: Get Azure Best Practices19**Action**: Retrieve diagnostic and troubleshooting best practices20**Tools**: Azure MCP best practices tool21**Process**:221. **Load Best Practices**:23   - Execute Azure best practices tool to get diagnostic guidelines24   - Focus on health monitoring, log analysis, and issue resolution patterns25   - Use these practices to inform diagnostic approach and remediation recommendations26 27### Step 2: Resource Discovery & Identification28**Action**: Locate and identify the target Azure resource29**Tools**: Azure MCP tools + Azure CLI fallback30**Process**:311. **Resource Lookup**:32   - If only resource name provided: Search across subscriptions using `azmcp-subscription-list`33   - Use `az resource list --name <resource-name>` to find matching resources34   - If multiple matches found, prompt user to specify subscription/resource group35   - Gather detailed resource information:36     - Resource type and current status37     - Location, tags, and configuration38     - Associated services and dependencies39 402. **Resource Type Detection**:41   - Identify resource type to determine appropriate diagnostic approach:42     - **Web Apps/Function Apps**: Application logs, performance metrics, dependency tracking43     - **Virtual Machines**: System logs, performance counters, boot diagnostics44     - **Cosmos DB**: Request metrics, throttling, partition statistics45     - **Storage Accounts**: Access logs, performance metrics, availability46     - **SQL Database**: Query performance, connection logs, resource utilization47     - **Application Insights**: Application telemetry, exceptions, dependencies48     - **Key Vault**: Access logs, certificate status, secret usage49     - **Service Bus**: Message metrics, dead letter queues, throughput50 51### Step 3: Health Status Assessment52**Action**: Evaluate current resource health and availability53**Tools**: Azure MCP monitoring tools + Azure CLI54**Process**:551. **Basic Health Check**:56   - Check resource provisioning state and operational status57   - Verify service availability and responsiveness58   - Review recent deployment or configuration changes59   - Assess current resource utilization (CPU, memory, storage, etc.)60 612. **Service-Specific Health Indicators**:62   - **Web Apps**: HTTP response codes, response times, uptime63   - **Databases**: Connection success rate, query performance, deadlocks64   - **Storage**: Availability percentage, request success rate, latency65   - **VMs**: Boot diagnostics, guest OS metrics, network connectivity66   - **Functions**: Execution success rate, duration, error frequency67 68### Step 4: Log & Telemetry Analysis69**Action**: Analyze logs and telemetry to identify issues and patterns70**Tools**: Azure MCP monitoring tools for Log Analytics queries71**Process**:721. **Find Monitoring Sources**:73   - Use `azmcp-monitor-workspace-list` to identify Log Analytics workspaces74   - Locate Application Insights instances associated with the resource75   - Identify relevant log tables using `azmcp-monitor-table-list`76 772. **Execute Diagnostic Queries**:78   Use `azmcp-monitor-log-query` with targeted KQL queries based on resource type:79 80   **General Error Analysis**:81   ```kql82   // Recent errors and exceptions83   union isfuzzy=true 84       AzureDiagnostics,85       AppServiceHTTPLogs,86       AppServiceAppLogs,87       AzureActivity88   | where TimeGenerated > ago(24h)89   | where Level == "Error" or ResultType != "Success"90   | summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)91   | order by TimeGenerated desc92   ```93 94   **Performance Analysis**:95   ```kql96   // Performance degradation patterns97   Perf98   | where TimeGenerated > ago(7d)99   | where ObjectName == "Processor" and CounterName == "% Processor Time"100   | summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)101   | where avg_CounterValue > 80102   ```103 104   **Application-Specific Queries**:105   ```kql106   // Application Insights - Failed requests107   requests108   | where timestamp > ago(24h)109   | where success == false110   | summarize FailureCount=count() by resultCode, bin(timestamp, 1h)111   | order by timestamp desc112   113   // Database - Connection failures114   AzureDiagnostics115   | where ResourceProvider == "MICROSOFT.SQL"116   | where Category == "SQLSecurityAuditEvents"117   | where action_name_s == "CONNECTION_FAILED"118   | summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)119   ```120 1213. **Pattern Recognition**:122   - Identify recurring error patterns or anomalies123   - Correlate errors with deployment times or configuration changes124   - Analyze performance trends and degradation patterns125   - Look for dependency failures or external service issues126 127### Step 5: Issue Classification & Root Cause Analysis128**Action**: Categorize identified issues and determine root causes129**Process**:1301. **Issue Classification**:131   - **Critical**: Service unavailable, data loss, security breaches132   - **High**: Performance degradation, intermittent failures, high error rates133   - **Medium**: Warnings, suboptimal configuration, minor performance issues134   - **Low**: Informational alerts, optimization opportunities135 1362. **Root Cause Analysis**:137   - **Configuration Issues**: Incorrect settings, missing dependencies138   - **Resource Constraints**: CPU/memory/disk limitations, throttling139   - **Network Issues**: Connectivity problems, DNS resolution, firewall rules140   - **Application Issues**: Code bugs, memory leaks, inefficient queries141   - **External Dependencies**: Third-party service failures, API limits142   - **Security Issues**: Authentication failures, certificate expiration143 1443. **Impact Assessment**:145   - Determine business impact and affected users/systems146   - Evaluate data integrity and security implications147   - Assess recovery time objectives and priorities148 149### Step 6: Generate Remediation Plan150**Action**: Create a comprehensive plan to address identified issues151**Process**:1521. **Immediate Actions** (Critical issues):153   - Emergency fixes to restore service availability154   - Temporary workarounds to mitigate impact155   - Escalation procedures for complex issues156 1572. **Short-term Fixes** (High/Medium issues):158   - Configuration adjustments and resource scaling159   - Application updates and patches160   - Monitoring and alerting improvements161 1623. **Long-term Improvements** (All issues):163   - Architectural changes for better resilience164   - Preventive measures and monitoring enhancements165   - Documentation and process improvements166 1674. **Implementation Steps**:168   - Prioritized action items with specific Azure CLI commands169   - Testing and validation procedures170   - Rollback plans for each change171   - Monitoring to verify issue resolution172 173### Step 7: User Confirmation & Report Generation174**Action**: Present findings and get approval for remediation actions175**Process**:1761. **Display Health Assessment Summary**:177   ```178   🏥 Azure Resource Health Assessment179   180   📊 Resource Overview:181   • Resource: [Name] ([Type])182   • Status: [Healthy/Warning/Critical]183   • Location: [Region]184   • Last Analyzed: [Timestamp]185   186   🚨 Issues Identified:187   • Critical: X issues requiring immediate attention188   • High: Y issues affecting performance/reliability  189   • Medium: Z issues for optimization190   • Low: N informational items191   192   🔍 Top Issues:193   1. [Issue Type]: [Description] - Impact: [High/Medium/Low]194   2. [Issue Type]: [Description] - Impact: [High/Medium/Low]195   3. [Issue Type]: [Description] - Impact: [High/Medium/Low]196   197   🛠️ Remediation Plan:198   • Immediate Actions: X items199   • Short-term Fixes: Y items  200   • Long-term Improvements: Z items201   • Estimated Resolution Time: [Timeline]202   203   ❓ Proceed with detailed remediation plan? (y/n)204   ```205 2062. **Generate Detailed Report**:207   ```markdown208   # Azure Resource Health Report: [Resource Name]209   210   **Generated**: [Timestamp]  211   **Resource**: [Full Resource ID]  212   **Overall Health**: [Status with color indicator]213   214   ## 🔍 Executive Summary215   [Brief overview of health status and key findings]216   217   ## 📊 Health Metrics218   - **Availability**: X% over last 24h219   - **Performance**: [Average response time/throughput]220   - **Error Rate**: X% over last 24h221   - **Resource Utilization**: [CPU/Memory/Storage percentages]222   223   ## 🚨 Issues Identified224   225   ### Critical Issues226   - **[Issue 1]**: [Description]227     - **Root Cause**: [Analysis]228     - **Impact**: [Business impact]229     - **Immediate Action**: [Required steps]230   231   ### High Priority Issues  232   - **[Issue 2]**: [Description]233     - **Root Cause**: [Analysis]234     - **Impact**: [Performance/reliability impact]235     - **Recommended Fix**: [Solution steps]236   237   ## 🛠️ Remediation Plan238   239   ### Phase 1: Immediate Actions (0-2 hours)240   ```bash241   # Critical fixes to restore service242   [Azure CLI commands with explanations]243   ```244   245   ### Phase 2: Short-term Fixes (2-24 hours)246   ```bash247   # Performance and reliability improvements248   [Azure CLI commands with explanations]249   ```250   251   ### Phase 3: Long-term Improvements (1-4 weeks)252   ```bash253   # Architectural and preventive measures254   [Azure CLI commands and configuration changes]255   ```256   257   ## 📈 Monitoring Recommendations258   - **Alerts to Configure**: [List of recommended alerts]259   - **Dashboards to Create**: [Monitoring dashboard suggestions]260   - **Regular Health Checks**: [Recommended frequency and scope]261   262   ## ✅ Validation Steps263   - [ ] Verify issue resolution through logs264   - [ ] Confirm performance improvements265   - [ ] Test application functionality266   - [ ] Update monitoring and alerting267   - [ ] Document lessons learned268   269   ## 📝 Prevention Measures270   - [Recommendations to prevent similar issues]271   - [Process improvements]272   - [Monitoring enhancements]273   ```274 275## Error Handling276- **Resource Not Found**: Provide guidance on resource name/location specification277- **Authentication Issues**: Guide user through Azure authentication setup278- **Insufficient Permissions**: List required RBAC roles for resource access279- **No Logs Available**: Suggest enabling diagnostic settings and waiting for data280- **Query Timeouts**: Break down analysis into smaller time windows281- **Service-Specific Issues**: Provide generic health assessment with limitations noted282 283## Success Criteria284- ✅ Resource health status accurately assessed285- ✅ All significant issues identified and categorized286- ✅ Root cause analysis completed for major problems287- ✅ Actionable remediation plan with specific steps provided288- ✅ Monitoring and prevention recommendations included289- ✅ Clear prioritization of issues by business impact290- ✅ Implementation steps include validation and rollback procedures

Related skills

Add Educational Comments

Takes any code file and transforms it into a teaching resource by adding educational comments that explain syntax, design choices, and language concepts. Automa

Agent Governance

When your AI agents start calling APIs, touching databases, or executing shell commands, you need guardrails before something goes sideways. This gives you comp

Agentic Eval

Implements self-critique loops where Claude generates output, evaluates it against your criteria, then refines based on its own feedback. Includes evaluator-opt