
Dt Obs Problems
Query and investigate Dynatrace DAVIS-detected problems—active triage, root cause, and trends—without leaving your coding agent.
Overview
dt-obs-problems is an agent skill for the Operate phase that queries and investigates Dynatrace DAVIS-detected problems including root cause, impact, and correlation with other telemetry.
Install
npx skills add https://github.com/dynatrace/dynatrace-for-ai --skill dt-obs-problemsWhat is this skill?
- Active problem triage with prioritized open issues, category, user impact, and display IDs
- Root cause investigation for a specific problem (e.g. P-12345) with affected entities and blast radius
- Problem trending and history for recurring or pattern-based incidents
- Filters and queries aligned to triggers: Kubernetes-affected problems, problems by service, recurring problems
- Explicit scope boundary: DAVIS problems only—not distributed traces, host-only metrics, or ad-hoc log search
- 3 documented problem-analysis workflows: active triage, root cause investigation, and problem trending
Adoption & trust: 697 installs on skills.sh; 87 GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You see alerts or user impact but need a fast, structured view of active Dynatrace problems, their root cause entity, and who else is affected.
Who is it for?
Solo builders or tiny teams on-call in Dynatrace who want agent-driven triage of active problems, root-cause lookup by display ID, and service- or Kubernetes-scoped problem lists.
Skip if: Explaining arbitrary DQL, product documentation Q&A, generic log searching, distributed tracing deep dives, or host-level resource monitoring without a DAVIS problem context.
When should I use this skill?
User asks about active problems, root cause analysis, problem impact, affected users, problem display IDs (e.g. P-12345), recurring or trending problems, blast radius, or problems filtered by Kubernetes or service—within
What do I get? / Deliverables
You get a prioritized active problem list, a root-cause and blast-radius breakdown for a specific problem ID, or trend/history context you can act on in Dynatrace or downstream runbooks.
- Prioritized list of active problems with category, user impact, and display IDs
- Root cause entity identification with affected entities and blast radius for a given problem
- Problem history or trending summary for recurring incidents
Recommended Skills
Journey fit
Problem detection, prioritization, and root-cause analysis are production observability tasks that belong on the Operate shelf once the product is live. Monitoring is the canonical home for listing active problems, assessing blast radius, and correlating issues with telemetry rather than one-off debugging or generic log search.
How it compares
Use for Dynatrace-native problem and root-cause workflows instead of ad-hoc log grep or trace skills that do not start from a detected problem.
Common Questions / FAQ
Who is dt-obs-problems for?
It is for solo and indie builders (and small teams) who ship to production monitored by Dynatrace and want their coding agent to list, prioritize, and investigate DAVIS problems using the same triggers they would type in chat.
When should I use dt-obs-problems?
Use it in Operate when you need active problem triage, root cause for a display ID like P-12345, recurring problem history, blast-radius checks, or filters such as problems affecting Kubernetes or a named service—especially during deploys, incidents, and customer-impact investiga
Is dt-obs-problems safe to install?
Treat it like any third-party agent skill that can call observability APIs: review the Security Audits panel on this Prism page, confirm what network and API access your agent grants, and scope Dynatrace tokens to least privilege before enabling it in production workflows.
SKILL.md
READMESKILL.md - Dt Obs Problems
# Problem Analysis Skill Analyze Dynatrace AI-detected problems including root cause identification, impact assessment, and correlation with logs and metrics. --- ## Use Cases ### 1. Active Problem Triage - **Goal:** List and prioritize currently active problems - **Trigger:** "active problems", "what problems are open", "current issues", "availability issues" - **Done:** Prioritized list of active problems with category, user impact, and display IDs ### 2. Root Cause Investigation - **Goal:** Identify the root cause entity for a specific problem - **Trigger:** "root cause of P-12345", "what caused this problem", "which entity is the root cause" - **Done:** Root cause entity identified with affected entity list and blast radius ### 3. Problem Trending - **Goal:** Analyze problem patterns over time to identify recurring issues - **Trigger:** "recurring problems", "problem history", "problem trends last 30 days" - **Done:** Trend data showing problem frequency, recurring root causes, and resolution times --- ## Overview Dynatrace automatically detects anomalies, performance degradations, and failures across your environment, creating **problems** that aggregate related alert, warning and info-level events and provide root cause and impact insights. ### What are Problems? Problems are automatically detected, software and infrastructure health and resilience issues that: - **Automatically correlate** related alert, warning, and info-level events across services, infrastructure, frontend applications, and user sessions - **Identify root causes** using causal analysis of Smartscape dependencies - **Assess business impact** by tracking affected users and services - **Reduce alert noise** by grouping related symptoms into single problems that share the same root cause and impact - **Track problem lifecycle** from early detection through resolution ### Event Kinds The `event.kind` field (stable, permission) identifies the high-level event type: | `event.kind` value | Description | |---|---| | `DAVIS_EVENT` | Davis-detected infrastructure/application events | | `BIZ_EVENT` | Business events (ingested via API or captured from spans) | | `RUM_EVENT` | Real User Monitoring events | | `AUDIT_EVENT` | Administrative/security audit events | `event.provider` (stable, permission) identifies the event source. ## Problem Categories Common `event.category` values: | Category | Description | Example | |----------|-------------|---------| | **AVAILABILITY** | Infrastructure or service unavailable | Web service returns no data, synthetic test actively fails, database connection lost | | **ERROR** | Increased error rates beyond baseline | API error rate jumped from 0.1% to 15% | | **SLOWDOWN** | Performance degradation | Response time increased from 200ms to 5000ms | | **RESOURCE** | Resource saturation | Container memory at 95%, causing OOM kills | | **CUSTOM** | Custom anomaly detections | Business KPI (orders/minute) dropped below threshold | ## Problem Lifecycle ```text Detection → ACTIVE → Under Investigation → CLOSED ``` - **ACTIVE**: Currently occurring issues requiring attention - **CLOSED**: Resolved issues used for historical analysis ## Essential Fields ### Common Field Name Mistakes | ❌ WRONG | ✅ CORRECT | Description | |------