Dt Obs Problems

Name: Dt Obs Problems
Author: dynatrace

dynatrace/dynatrace-for-ai·Apache-2.0

Query and investigate Dynatrace DAVIS-detected problems—active triage, root cause, and trends—without leaving your coding agent.

Overview

dt-obs-problems is an agent skill for the Operate phase that queries and investigates Dynatrace DAVIS-detected problems including root cause, impact, and correlation with other telemetry.

Install

npx skills add https://github.com/dynatrace/dynatrace-for-ai --skill dt-obs-problems

What is this skill?

Active problem triage with prioritized open issues, category, user impact, and display IDs
Root cause investigation for a specific problem (e.g. P-12345) with affected entities and blast radius
Problem trending and history for recurring or pattern-based incidents
Filters and queries aligned to triggers: Kubernetes-affected problems, problems by service, recurring problems
Explicit scope boundary: DAVIS problems only—not distributed traces, host-only metrics, or ad-hoc log search
3 documented problem-analysis workflows: active triage, root cause investigation, and problem trending

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 697 installs on skills.sh; 87 GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

You see alerts or user impact but need a fast, structured view of active Dynatrace problems, their root cause entity, and who else is affected.

Who is it for?

Solo builders or tiny teams on-call in Dynatrace who want agent-driven triage of active problems, root-cause lookup by display ID, and service- or Kubernetes-scoped problem lists.

Skip if: Explaining arbitrary DQL, product documentation Q&A, generic log searching, distributed tracing deep dives, or host-level resource monitoring without a DAVIS problem context.

When should I use this skill?

User asks about active problems, root cause analysis, problem impact, affected users, problem display IDs (e.g. P-12345), recurring or trending problems, blast radius, or problems filtered by Kubernetes or service—within

What do I get? / Deliverables

You get a prioritized active problem list, a root-cause and blast-radius breakdown for a specific problem ID, or trend/history context you can act on in Dynatrace or downstream runbooks.

Prioritized list of active problems with category, user impact, and display IDs
Root cause entity identification with affected entities and blast radius for a given problem
Problem history or trending summary for recurring incidents

Recommended Skills

Azure Deploymicrosoft/azure-skills

Azure Deploy is a Microsoft agent skill that executes cloud releases for applications that are already planned and valid…374k installs·1.2k stars

Azure Preparemicrosoft/azure-skills

Azure Prepare is Microsoft's skill for getting applications ready to run on Azure—writing the deployment plan, generatin…374k installs·1.2k stars

Azure Storagemicrosoft/azure-skills

Azure Storage skill helps agents pick the right Azure storage service—Blob for objects, Files for SMB shares, Queues for…374k installs·1.2k stars

Azure Validatemicrosoft/azure-skills

Microsoft-guided preflight validation for Azure deployments including IaC, identity, and service-specific readiness.374k installs·1.2k stars

Appinsights Instrumentationmicrosoft/azure-skills

appinsights-instrumentation is a Microsoft Azure-skills package that walks solo builders through enabling Application In…374k installs·1.2k stars

Azure Resource Lookupmicrosoft/azure-skills

Azure Resource Lookup is a Microsoft agent skill that helps solo builders and small teams answer “what do I have in Azur…373k installs·1.2k stars

Journey fit

Primary fit

OperateMonitoring & observability

Problem detection, prioritization, and root-cause analysis are production observability tasks that belong on the Operate shelf once the product is live. Monitoring is the canonical home for listing active problems, assessing blast radius, and correlating issues with telemetry rather than one-off debugging or generic log search.

Also useful

OperateError tracking

How it compares

Use for Dynatrace-native problem and root-cause workflows instead of ad-hoc log grep or trace skills that do not start from a detected problem.

Common Questions / FAQ

Who is dt-obs-problems for?

It is for solo and indie builders (and small teams) who ship to production monitored by Dynatrace and want their coding agent to list, prioritize, and investigate DAVIS problems using the same triggers they would type in chat.

When should I use dt-obs-problems?

Use it in Operate when you need active problem triage, root cause for a display ID like P-12345, recurring problem history, blast-radius checks, or filters such as problems affecting Kubernetes or a named service—especially during deploys, incidents, and customer-impact investiga

Is dt-obs-problems safe to install?

Treat it like any third-party agent skill that can call observability APIs: review the Security Audits panel on this Prism page, confirm what network and API access your agent grants, and scope Dynatrace tokens to least privilege before enabling it in production workflows.

SKILL.md

READMESKILL.md - Dt Obs Problems

# Problem Analysis Skill

Analyze Dynatrace AI-detected problems including root cause identification, impact assessment, and correlation with logs and metrics.

---

## Use Cases

### 1. Active Problem Triage
- **Goal:** List and prioritize currently active problems
- **Trigger:** "active problems", "what problems are open", "current issues", "availability issues"
- **Done:** Prioritized list of active problems with category, user impact, and display IDs

### 2. Root Cause Investigation
- **Goal:** Identify the root cause entity for a specific problem
- **Trigger:** "root cause of P-12345", "what caused this problem", "which entity is the root cause"
- **Done:** Root cause entity identified with affected entity list and blast radius

### 3. Problem Trending
- **Goal:** Analyze problem patterns over time to identify recurring issues
- **Trigger:** "recurring problems", "problem history", "problem trends last 30 days"
- **Done:** Trend data showing problem frequency, recurring root causes, and resolution times

---

## Overview

Dynatrace automatically detects anomalies, performance degradations, and failures across your environment, creating **problems** that aggregate related alert, warning and info-level events and provide root cause and impact insights.

### What are Problems?

Problems are automatically detected, software and infrastructure health and resilience issues that:

- **Automatically correlate** related alert, warning, and info-level events across services, infrastructure, frontend applications, and user sessions
- **Identify root causes** using causal analysis of Smartscape dependencies
- **Assess business impact** by tracking affected users and services
- **Reduce alert noise** by grouping related symptoms into single problems that share the same root cause and impact
- **Track problem lifecycle** from early detection through resolution

### Event Kinds

The `event.kind` field (stable, permission) identifies the high-level event type:

| `event.kind` value | Description |
|---|---|
| `DAVIS_EVENT` | Davis-detected infrastructure/application events |
| `BIZ_EVENT` | Business events (ingested via API or captured from spans) |
| `RUM_EVENT` | Real User Monitoring events |
| `AUDIT_EVENT` | Administrative/security audit events |

`event.provider` (stable, permission) identifies the event source.

## Problem Categories

Common `event.category` values:

| Category | Description | Example |
|----------|-------------|---------|
| **AVAILABILITY** | Infrastructure or service unavailable | Web service returns no data, synthetic test actively fails, database connection lost |
| **ERROR** | Increased error rates beyond baseline | API error rate jumped from 0.1% to 15% |
| **SLOWDOWN** | Performance degradation | Response time increased from 200ms to 5000ms |
| **RESOURCE** | Resource saturation | Container memory at 95%, causing OOM kills |
| **CUSTOM** | Custom anomaly detections | Business KPI (orders/minute) dropped below threshold |

## Problem Lifecycle

```text
Detection → ACTIVE → Under Investigation → CLOSED
```

- **ACTIVE**: Currently occurring issues requiring attention
- **CLOSED**: Resolved issues used for historical analysis

## Essential Fields

### Common Field Name Mistakes

| ❌ WRONG | ✅ CORRECT | Description |
|------

What is this skill?

Active problem triage with prioritized open issues, category, user impact, and display IDs

Root cause investigation for a specific problem (e.g. P-12345) with affected entities and blast radius

Problem trending and history for recurring or pattern-based incidents

Filters and queries aligned to triggers: Kubernetes-affected problems, problems by service, recurring problems

Explicit scope boundary: DAVIS problems only—not distributed traces, host-only metrics, or ad-hoc log search

3 documented problem-analysis workflows: active triage, root cause investigation, and problem trending

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 697 installs on skills.sh; 87 GitHub stars; 3/3 security scanners passed (skills.sh audits).

Who is it for?

Solo builders or tiny teams on-call in Dynatrace who want agent-driven triage of active problems, root-cause lookup by display ID, and service- or Kubernetes-scoped problem lists.

Skip if: Explaining arbitrary DQL, product documentation Q&A, generic log searching, distributed tracing deep dives, or host-level resource monitoring without a DAVIS problem context.

What do I get? / Deliverables

You get a prioritized active problem list, a root-cause and blast-radius breakdown for a specific problem ID, or trend/history context you can act on in Dynatrace or downstream runbooks.

Prioritized list of active problems with category, user impact, and display IDs

Root cause entity identification with affected entities and blast radius for a given problem

Problem history or trending summary for recurring incidents

Journey fit

Primary fit

OperateMonitoring & observability

Also useful

OperateError tracking

SKILL.md

READMESKILL.md - Dt Obs Problems

# Problem Analysis Skill

Analyze Dynatrace AI-detected problems including root cause identification, impact assessment, and correlation with logs and metrics.

---

## Use Cases

### 1. Active Problem Triage
- **Goal:** List and prioritize currently active problems
- **Trigger:** "active problems", "what problems are open", "current issues", "availability issues"
- **Done:** Prioritized list of active problems with category, user impact, and display IDs

### 2. Root Cause Investigation
- **Goal:** Identify the root cause entity for a specific problem
- **Trigger:** "root cause of P-12345", "what caused this problem", "which entity is the root cause"
- **Done:** Root cause entity identified with affected entity list and blast radius

### 3. Problem Trending
- **Goal:** Analyze problem patterns over time to identify recurring issues
- **Trigger:** "recurring problems", "problem history", "problem trends last 30 days"
- **Done:** Trend data showing problem frequency, recurring root causes, and resolution times

---

## Overview

Dynatrace automatically detects anomalies, performance degradations, and failures across your environment, creating **problems** that aggregate related alert, warning and info-level events and provide root cause and impact insights.

### What are Problems?

Problems are automatically detected, software and infrastructure health and resilience issues that:

- **Automatically correlate** related alert, warning, and info-level events across services, infrastructure, frontend applications, and user sessions
- **Identify root causes** using causal analysis of Smartscape dependencies
- **Assess business impact** by tracking affected users and services
- **Reduce alert noise** by grouping related symptoms into single problems that share the same root cause and impact
- **Track problem lifecycle** from early detection through resolution

### Event Kinds

The `event.kind` field (stable, permission) identifies the high-level event type:

| `event.kind` value | Description |
|---|---|
| `DAVIS_EVENT` | Davis-detected infrastructure/application events |
| `BIZ_EVENT` | Business events (ingested via API or captured from spans) |
| `RUM_EVENT` | Real User Monitoring events |
| `AUDIT_EVENT` | Administrative/security audit events |

`event.provider` (stable, permission) identifies the event source.

## Problem Categories

Common `event.category` values:

| Category | Description | Example |
|----------|-------------|---------|
| **AVAILABILITY** | Infrastructure or service unavailable | Web service returns no data, synthetic test actively fails, database connection lost |
| **ERROR** | Increased error rates beyond baseline | API error rate jumped from 0.1% to 15% |
| **SLOWDOWN** | Performance degradation | Response time increased from 200ms to 5000ms |
| **RESOURCE** | Resource saturation | Container memory at 95%, causing OOM kills |
| **CUSTOM** | Custom anomaly detections | Business KPI (orders/minute) dropped below threshold |

## Problem Lifecycle

```text
Detection → ACTIVE → Under Investigation → CLOSED
```

- **ACTIVE**: Currently occurring issues requiring attention
- **CLOSED**: Resolved issues used for historical analysis

## Essential Fields

### Common Field Name Mistakes

| ❌ WRONG | ✅ CORRECT | Description |
|------

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is dt-obs-problems for?

When should I use dt-obs-problems?

Is dt-obs-problems safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is dt-obs-problems for?

When should I use dt-obs-problems?

Is dt-obs-problems safe to install?

SKILL.md