
Service Mesh Observability
Stand up metrics, distributed tracing, and mesh dashboards so you can debug service-to-service latency and hit SLOs on Istio or Linkerd.
Install
npx skills add https://github.com/wshobson/agents --skill service-mesh-observabilityWhat is this skill?
- Three-pillars model: metrics, distributed traces, and access or audit logs for mesh traffic
- Golden signals table for mesh latency, errors, throughput, and saturation with alert thresholds
- Patterns for dependency visualization, bottleneck finding, and connectivity troubleshooting
- Guidance for defining SLOs on service-to-service communication
- Istio and Linkerd oriented deployment and debugging workflows
Adoption & trust: 6.7k installs on skills.sh; 36.5k GitHub stars; 3/3 security scanners passed (skills.sh audits).
Recommended Skills
Azure Deploymicrosoft/azure-skills
Azure Preparemicrosoft/azure-skills
Azure Storagemicrosoft/azure-skills
Azure Validatemicrosoft/azure-skills
Appinsights Instrumentationmicrosoft/azure-skills
Azure Resource Lookupmicrosoft/azure-skills
Journey fit
Primary fit
Mesh observability is chiefly a production concern—keeping communication healthy after deploy—so Operate is the primary journey shelf. Monitoring covers golden signals, tracing, dashboards, and SLO alerting across the data plane.
Common Questions / FAQ
Is Service Mesh Observability safe to install?
skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.
SKILL.md
READMESKILL.md - Service Mesh Observability
# Service Mesh Observability Complete guide to observability patterns for Istio, Linkerd, and service mesh deployments. ## When to Use This Skill - Setting up distributed tracing across services - Implementing service mesh metrics and dashboards - Debugging latency and error issues - Defining SLOs for service communication - Visualizing service dependencies - Troubleshooting mesh connectivity ## Core Concepts ### 1. Three Pillars of Observability ``` ┌─────────────────────────────────────────────────────┐ │ Observability │ ├─────────────────┬─────────────────┬─────────────────┤ │ Metrics │ Traces │ Logs │ │ │ │ │ │ • Request rate │ • Span context │ • Access logs │ │ • Error rate │ • Latency │ • Error details │ │ • Latency P50 │ • Dependencies │ • Debug info │ │ • Saturation │ • Bottlenecks │ • Audit trail │ └─────────────────┴─────────────────┴─────────────────┘ ``` ### 2. Golden Signals for Mesh | Signal | Description | Alert Threshold | | -------------- | ------------------------- | ----------------- | | **Latency** | Request duration P50, P99 | P99 > 500ms | | **Traffic** | Requests per second | Anomaly detection | | **Errors** | 5xx error rate | > 1% | | **Saturation** | Resource utilization | > 80% | ## Templates ### Template 1: Istio with Prometheus & Grafana ```yaml # Install Prometheus apiVersion: v1 kind: ConfigMap metadata: name: prometheus namespace: istio-system data: prometheus.yml: | global: scrape_interval: 15s scrape_configs: - job_name: 'istio-mesh' kubernetes_sd_configs: - role: endpoints namespaces: names: - istio-system relabel_configs: - source_labels: [__meta_kubernetes_service_name] action: keep regex: istio-telemetry --- # ServiceMonitor for Prometheus Operator apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: istio-mesh namespace: istio-system spec: selector: matchLabels: app: istiod endpoints: - port: http-monitoring interval: 15s ``` ### Template 2: Key Istio Metrics Queries ```promql # Request rate by service sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service_name) # Error rate (5xx) sum(rate(istio_requests_total{reporter="destination", response_code=~"5.."}[5m])) / sum(rate(istio_requests_total{reporter="destination"}[5m])) * 100 # P99 latency histogram_quantile(0.99, sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m])) by (le, destination_service_name)) # TCP connections sum(istio_tcp_connections_opened_total{reporter="destination"}) by (destination_service_name) # Request size histogram_quantile(0.99, sum(rate(istio_request_bytes_bucket{reporter="destination"}[5m])) by (le, destination_service_name)) ``` ### Template 3: Jaeger Distributed Tracing ```yaml # Jaeger installation for Istio apiVersion: install.istio.io/v1alpha1 kind: IstioOperator spec: meshConfig: enableTracing: true defaultConfig: tracing: sampling: 100.0 # 100% in dev, lower in prod zipkin: address: jaeger-collector.istio-system:9411 --- # Jaeger deployment apiVersion: apps/v1 kind: Deployment metadata: name: jaeger namespace: istio-system spec: selector: matchLabels: app: jaeger template: metadata: labels: app: jaeger spec: containers: - name: jaeger image: jae