
Azure Reliability
Audit Azure Functions reliability in a live subscription and apply zone redundancy, storage resilience, and multi-region failover with confirmed CLI or IaC steps.
Overview
Azure Reliability is an agent skill most often used in Operate (also Ship) that assesses Azure Functions reliability and drives confirmed remediation for zone redundancy, resilient storage, health probes, and multi-regio
Install
npx skills add https://github.com/microsoft/azure-skills --skill azure-reliabilityWhat is this skill?
- Feature-pivoted reliability assessment table for deployed Azure Functions workloads
- Zone redundancy and ZRS storage configuration guidance with staged remediation
- Multi-region failover setup including multi-region IaC generation
- Discovery via Azure Resource Graph queries and Azure CLI commands (MCP tools)
- User-confirmed end-to-end remediation flow—scan, checklist, then CLI or IaC patches
- Supported services in v1.0.1: Azure Functions (App Service hosting); Container Apps planned for a future version
- Workflow: scan deployed resources → feature-pivoted checklist → staged remediation with user confirmation
Adoption & trust: 58.7k installs on skills.sh; 1.2k GitHub stars; 2/3 security scanners passed (skills.sh audits).
What problem does it solve?
You shipped Azure Functions but do not know if zone redundancy, storage, probes, and regional failover are actually configured—or where single points of failure remain.
Who is it for?
Indie builders with Azure Functions in production who want a guided HA/DR audit before scaling traffic or answering customer uptime questions.
Skip if: Teams only planning greenfield architecture on paper, workloads outside Azure Functions in v1, or anyone who wants changes applied without review and confirmation gates.
When should I use this skill?
User mentions assess reliability, check reliability, zone redundant, multi-region failover, high availability, disaster recovery, single points of failure, or reliability posture for Azure Functions.
What do I get? / Deliverables
You get a scanned reliability checklist for your Functions app and staged, user-approved CLI or IaC changes that improve high availability and disaster recovery posture.
- Reliability assessment table pivoted by feature area
- Staged remediation plan with CLI commands or IaC patches
Recommended Skills
Journey fit
Spans multiple journey phases - primary shelf plus alternate fits below.
Reliability posture is an ongoing production concern; canonical shelf is Operate because the skill scans deployed resources and remediates live Azure workloads. Infra is the right subphase: zone redundancy, ZRS, health probes, and failover are infrastructure availability patterns, not day-to-day error triage.
Where it fits
Before a public launch, scan Functions hosting, storage, and probes so you are not shipping on a single zone or brittle storage SKU.
After an incident or traffic spike, reassess redundancy and walk confirmed CLI or IaC patches for zone redundancy and failover.
Validate that health probe and availability settings align with how you expect alerts and load balancing to behave.
When uptime complaints rise, use the assessment table to separate app bugs from underlying Azure availability gaps.
How it compares
Use instead of ad-hoc Azure portal clicking or generic “best practices” chat—this skill ties Resource Graph discovery to a Functions-specific checklist and remediation path.
Common Questions / FAQ
Who is azure-reliability for?
Solo and small-team builders running Azure Functions who need to verify zone redundancy, storage resilience, and failover without becoming a full-time SRE.
When should I use azure-reliability?
In Operate when hardening production infra; in Ship when launch prep includes HA checks; whenever you ask to assess reliability, find single points of failure, or enable multi-region failover for Functions.
Is azure-reliability safe to install?
It is designed around user confirmation before remediation, but it can invoke Azure CLI and graph queries against live subscriptions—review the Security Audits panel on this page and scope credentials least-privilege before running.
SKILL.md
READMESKILL.md - Azure Reliability
# Azure Reliability Assessment & Configuration ## Quick Reference | Property | Details | |---|---| | Best for | Reliability posture assessment, zone redundancy enablement, multi-region failover setup | | Primary capabilities | Reliability assessment table, Zone Redundancy Configuration, Multi-Region IaC Generation | | Supported services | Azure Functions (App Service and Container Apps planned for a future version) | | MCP tools | Azure Resource Graph queries, Azure CLI commands | ## When to Use This Skill Activate this skill when user wants to: - "Assess my Functions app's reliability" - "Check the reliability of my resource group" (Functions resources only) - "Is my function app zone redundant?" - "Make my function app zone redundant" - "Set up multi-region failover for my Functions app" - "Check my reliability posture" - "Find single points of failure" (in Functions workloads) - "Enable high availability for my Functions app" - "Check disaster recovery readiness" - "Improve my Functions app's resilience" > **Scope note:** This skill currently covers **Azure Functions** only. If the user asks about Azure App Service or Azure Container Apps reliability, acknowledge that support is planned but not yet available, and only proceed with the parts that apply to Functions resources in scope. ## Prerequisites - Authentication: user is logged in to Azure via `az login` - Permissions: Reader access on target subscription/resource group (for assessment) - Permissions: Contributor access (for configuration changes) - Azure Resource Graph extension: `az extension add --name resource-graph` ## MCP Tools | Tool | Purpose | |------|---------| | `mcp_azure_mcp_extension_cli_generate` | Generate `az` CLI commands for resource queries and configuration | | `mcp_azure_mcp_subscription_list` | List available subscriptions | | `mcp_azure_mcp_group_list` | List resource groups | Primary query method: Azure Resource Graph via `az graph query` (requires `az extension add --name resource-graph`). ## Assessment Workflow ### Phase 1: Discover Resources 1. **Identify scope** — Ask user for resource group, subscription, or app name 2. **Query Azure Resource Graph** to discover all resources in scope 3. **Classify resources** by service type (Functions, Storage, etc.). If non-Functions compute (App Service sites that aren't Function Apps, Container Apps) is found, **note it but do not deep-dive** — those services are planned for a future version of this skill. **Important:** Always scope queries to the user's specified resource group or subscription. Add these filters to every Resource Graph query: - Resource group: `| where resourceGroup =~ '<rg-name>'` - Subscription: Use `--subscriptions <sub-id>` flag on `az graph query` - App name: `| where name =~ '<app-name>'` ### Phase 2: Assess Reliability Two-step assessment: **platform-level discovery first, then per-service deep dive.** **Step 1 — Platform discovery (find what's there).** Use these to enumerate resources in scope and detect cross-cutting reliability gaps: | Platform check | Reference | |---|---| | Zone redundancy — discovery | [references/zone-redundancy-checks.md](references/zone-redundancy-checks.md) | | Storage redundancy (cross-service) | [references/storage-redundancy-checks.md](references/storage-redundancy-checks.md) | | Multi-region & global load balancers | [references/multi-region-che