Grafana Dashboards

Name: Grafana Dashboards
Author: wshobson

wshobson/agents

10.2k installs
38.3k repo stars
Updated July 22, 2026
wshobson/agents

How to create effective Grafana dashboards that visualize system and application metrics, apply monitoring methodologies, and integrate alerts for production observability.

About

This skill covers designing and managing Grafana dashboards for comprehensive system observability using metrics from Prometheus and other sources. Developers use it to visualize application and infrastructure metrics, implement monitoring patterns like RED (Rate, Errors, Duration) and USE (Utilization, Saturation, Errors), and configure alerts. Key workflows include structuring dashboard hierarchies with stat panels and time series graphs, using variables for multi-service filtering, provisioning dashboards via code (Terraform/Ansible), and creating specialized dashboards for APIs, databases, and infrastructure monitoring.

Implement RED and USE monitoring methods for services and resources
Build hierarchical dashboards with stat panels, time series, tables, and heatmaps
Use query variables to filter by namespace, service, and other dimensions
Configure panel-level alerts with thresholds and notification channels
Provision dashboards as code using Terraform or Ansible for reproducible monitoring infrastructure

Grafana Dashboards by the numbers

10,242 all-time installs (skills.sh)
+222 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #25 of 1,453 DevOps & CI/CD skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

grafana-dashboards capabilities & compatibility

Capabilities: design dashboard hierarchies and information arc · write promql queries for metric aggregation and · configure stat panels, time series graphs, table · define query variables for multi dimensional fil · set up alert conditions and notification channel · provision dashboards via terraform or ansible · apply red and use monitoring methodologies
Works with: grafana
Use cases: devops · debugging
Platforms: macOS · Linux · Windows
Runs: Hosted SaaS
Pricing: Free

From the docs

What grafana-dashboards says it does

Create and manage production-ready Grafana dashboards for comprehensive system observability.

skill:wshobson/agents#grafana-dashboards > header

npx skills add https://github.com/wshobson/agents --skill grafana-dashboards

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/wshobson/agents/grafana-dashboards.svg)](https://skillselion.com/skills/wshobson/agents/grafana-dashboards)

Installs	10.2k
repo stars	★ 38.3k
Security audit	2 / 3 scanners passed
Last updated	July 22, 2026
Repository	wshobson/agents ↗

What it does

Design production dashboards to visualize system metrics, track application health, and set up alerts for infrastructure and service observability.

Who is it for?

DevOps engineers, SREs, backend developers, and platform teams building observability infrastructure for applications and infrastructure.

Skip if: Exploratory data analysis, business intelligence dashboarding without operational context, real-time log analysis (use Loki instead).

When should I use this skill?

Monitoring a production service, setting up infrastructure observability, implementing SLOs, creating on-call runbooks, or standardizing metrics visualization.

What you get

A production-ready Grafana dashboard that visualizes key metrics, implements appropriate thresholds and alerts, and supports multi-dimensional filtering via variables.

Grafana dashboard JSON files
Alert rules and notification channels
Terraform/Ansible provisioning code

By the numbers

RED method covers 3 core service metrics: rate, errors, duration
USE method covers 3 resource metrics: utilization, saturation, errors
4 primary panel types shown: stat, graph (time series), table, heatmap

Files

SKILL.mdMarkdownGitHub ↗

Grafana Dashboards

Create and manage production-ready Grafana dashboards for comprehensive system observability.

Purpose

Design effective Grafana dashboards for monitoring applications, infrastructure, and business metrics.

When to Use

Visualize Prometheus metrics
Create custom dashboards
Implement SLO dashboards
Monitor infrastructure
Track business KPIs

Dashboard Design Principles

1. Hierarchy of Information

┌─────────────────────────────────────┐
│  Critical Metrics (Big Numbers)     │
├─────────────────────────────────────┤
│  Key Trends (Time Series)           │
├─────────────────────────────────────┤
│  Detailed Metrics (Tables/Heatmaps) │
└─────────────────────────────────────┘

2. RED Method (Services)

Rate - Requests per second
Errors - Error rate
Duration - Latency/response time

3. USE Method (Resources)

Utilization - % time resource is busy
Saturation - Queue length/wait time
Errors - Error count

Dashboard Structure

API Monitoring Dashboard

{
  "dashboard": {
    "title": "API Monitoring",
    "tags": ["api", "production"],
    "timezone": "browser",
    "refresh": "30s",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(http_requests_total[5m])) by (service)",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": { "x": 0, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "Error Rate %",
        "type": "graph",
        "targets": [
          {
            "expr": "(sum(rate(http_requests_total{status=~\"5..\"}[5m])) / sum(rate(http_requests_total[5m]))) * 100",
            "legendFormat": "Error Rate"
          }
        ],
        "alert": {
          "conditions": [
            {
              "evaluator": { "params": [5], "type": "gt" },
              "operator": { "type": "and" },
              "query": { "params": ["A", "5m", "now"] },
              "type": "query"
            }
          ]
        },
        "gridPos": { "x": 12, "y": 0, "w": 12, "h": 8 }
      },
      {
        "title": "P95 Latency",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
            "legendFormat": "{{service}}"
          }
        ],
        "gridPos": { "x": 0, "y": 8, "w": 24, "h": 8 }
      }
    ]
  }
}

Reference: See assets/api-dashboard.json

Panel Types

1. Stat Panel (Single Value)

{
  "type": "stat",
  "title": "Total Requests",
  "targets": [
    {
      "expr": "sum(http_requests_total)"
    }
  ],
  "options": {
    "reduceOptions": {
      "values": false,
      "calcs": ["lastNotNull"]
    },
    "orientation": "auto",
    "textMode": "auto",
    "colorMode": "value"
  },
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "value": 0, "color": "green" },
          { "value": 80, "color": "yellow" },
          { "value": 90, "color": "red" }
        ]
      }
    }
  }
}

2. Time Series Graph

{
  "type": "graph",
  "title": "CPU Usage",
  "targets": [
    {
      "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
    }
  ],
  "yaxes": [
    { "format": "percent", "max": 100, "min": 0 },
    { "format": "short" }
  ]
}

3. Table Panel

{
  "type": "table",
  "title": "Service Status",
  "targets": [
    {
      "expr": "up",
      "format": "table",
      "instant": true
    }
  ],
  "transformations": [
    {
      "id": "organize",
      "options": {
        "excludeByName": { "Time": true },
        "indexByName": {},
        "renameByName": {
          "instance": "Instance",
          "job": "Service",
          "Value": "Status"
        }
      }
    }
  ]
}

4. Heatmap

{
  "type": "heatmap",
  "title": "Latency Heatmap",
  "targets": [
    {
      "expr": "sum(rate(http_request_duration_seconds_bucket[5m])) by (le)",
      "format": "heatmap"
    }
  ],
  "dataFormat": "tsbuckets",
  "yAxis": {
    "format": "s"
  }
}

Variables

Query Variables

{
  "templating": {
    "list": [
      {
        "name": "namespace",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_pod_info, namespace)",
        "refresh": 1,
        "multi": false
      },
      {
        "name": "service",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(kube_service_info{namespace=\"$namespace\"}, service)",
        "refresh": 1,
        "multi": true
      }
    ]
  }
}

Use Variables in Queries

sum(rate(http_requests_total{namespace="$namespace", service=~"$service"}[5m]))

Alerts in Dashboards

{
  "alert": {
    "name": "High Error Rate",
    "conditions": [
      {
        "evaluator": {
          "params": [5],
          "type": "gt"
        },
        "operator": { "type": "and" },
        "query": {
          "params": ["A", "5m", "now"]
        },
        "reducer": { "type": "avg" },
        "type": "query"
      }
    ],
    "executionErrorState": "alerting",
    "for": "5m",
    "frequency": "1m",
    "message": "Error rate is above 5%",
    "noDataState": "no_data",
    "notifications": [{ "uid": "slack-channel" }]
  }
}

Dashboard Provisioning

dashboards.yml:

apiVersion: 1

providers:
  - name: "default"
    orgId: 1
    folder: "General"
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    allowUiUpdates: true
    options:
      path: /etc/grafana/dashboards

Common Dashboard Patterns

Infrastructure Dashboard

Key Panels:

CPU utilization per node
Memory usage per node
Disk I/O
Network traffic
Pod count by namespace
Node status

Reference: See assets/infrastructure-dashboard.json

Database Dashboard

Key Panels:

Queries per second
Connection pool usage
Query latency (P50, P95, P99)
Active connections
Database size
Replication lag
Slow queries

Reference: See assets/database-dashboard.json

Application Dashboard

Key Panels:

Request rate
Error rate
Response time (percentiles)
Active users/sessions
Cache hit rate
Queue length

Best Practices

1. Start with templates (Grafana community dashboards) 2. Use consistent naming for panels and variables 3. Group related metrics in rows 4. Set appropriate time ranges (default: Last 6 hours) 5. Use variables for flexibility 6. Add panel descriptions for context 7. Configure units correctly 8. Set meaningful thresholds for colors 9. Use consistent colors across dashboards 10. Test with different time ranges

Dashboard as Code

Terraform Provisioning

resource "grafana_dashboard" "api_monitoring" {
  config_json = file("${path.module}/dashboards/api-monitoring.json")
  folder      = grafana_folder.monitoring.id
}

resource "grafana_folder" "monitoring" {
  title = "Production Monitoring"
}

Ansible Provisioning

- name: Deploy Grafana dashboards
  copy:
    src: "{{ item }}"
    dest: /etc/grafana/dashboards/
  with_fileglob:
    - "dashboards/*.json"
  notify: restart grafana

Related Skills

prometheus-configuration - For metric collection
slo-implementation - For SLO dashboards

Related skills

Azure DeploySafely execute production deployments of already-prepared applications to Microsoft Azure.478k1.3k

Azure ValidateRun deep pre-deployment checks on Azure configuration, infrastructure definitions, RBAC roles, and managed identities before pushing to production.477k1.3k

Github Actions DocsGet precise, docs-grounded answers about GitHub Actions workflows, syntax, security, and migration instead of relying on stale knowledge.275k72

Setup Pre CommitAutomatically run Prettier, type checking, and tests on every commit via Husky and lint-staged.161k188k

Deploy To VercelSafely turn any local project into a live Vercel preview with one instruction.97.8k29.5k

Vercel Cli With TokensDeploy projects to Vercel from agents and scripts using token authentication instead of interactive browser login.73.4k29.5k

How it compares

Choose grafana-dashboards when you need opinionated Grafana layout and SLO structure, not generic chart tutorials or non-Prometheus backends.

FAQ

What monitoring method should I use for my dashboard?

Use RED (Rate, Errors, Duration) for services and USE (Utilization, Saturation, Errors) for infrastructure resources. Organize panels hierarchically with critical metrics at top, trends in middle, and detailed metrics below.

How do I make dashboards flexible across multiple services?

Use query variables (e.g., namespace, service) defined via label_values() queries on your datasource. Reference them in metric queries as $variable to filter data dynamically.

Can I version control my dashboards?

Yes. Export dashboards as JSON and provision them via Terraform or Ansible. Store JSON files in git and update dashboards automatically when files change using provisioning configuration.

Is Grafana Dashboards safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

DevOps & CI/CDmonitoringinfra

About

Grafana Dashboards by the numbers

grafana-dashboards capabilities & compatibility

What grafana-dashboards says it does

Add your badge

What it does

Who is it for?

When should I use this skill?

What you get

By the numbers

Files

Grafana Dashboards

Purpose

When to Use

Dashboard Design Principles

1. Hierarchy of Information

2. RED Method (Services)

3. USE Method (Resources)

Dashboard Structure

API Monitoring Dashboard

Panel Types

1. Stat Panel (Single Value)

2. Time Series Graph

3. Table Panel

4. Heatmap

Variables

Query Variables

Use Variables in Queries

Alerts in Dashboards

Dashboard Provisioning

Common Dashboard Patterns

Infrastructure Dashboard

Database Dashboard

Application Dashboard

Best Practices

Dashboard as Code

Terraform Provisioning

Ansible Provisioning

Related Skills

Related skills

How it compares

FAQ

What monitoring method should I use for my dashboard?

How do I make dashboards flexible across multiple services?

Can I version control my dashboards?

Is Grafana Dashboards safe to install?

This week in AI coding