
Adk Architecture
Implement correct checkpoint, interrupt, and resume behavior when building multi-step agents with Google ADK Python and human-in-the-loop tools.
Overview
ADK Architecture is an agent skill for the Build phase that explains Google ADK Python checkpoint, interrupt, resume, and HITL workflow behavior for reliable multi-step agents.
Install
npx skills add https://github.com/google/adk-python --skill adk-architectureWhat is this skill?
- Documents HITL interrupt → persist → resume → continue lifecycle for ADK workflows
- Explains long_running_tool_ids propagation and matching FunctionResponse on resume
- Clarifies run_id reuse on resume versus fresh run_id on new dispatches
- Tables rerun_on_resume behavior for partial versus full human responses on workflow nodes
- Covers session event persistence and invocation_id reconstruction in the Runner
- 4-step HITL pattern: interrupt, persist, resume, continue
- rerun_on_resume behavior matrix for partial vs all FunctionResponses
Adoption & trust: 1 installs on skills.sh; 20k GitHub stars; 3/3 security scanners passed (skills.sh audits); trending (+100% hot-view momentum).
What problem does it solve?
Your ADK agent hangs or double-runs after human approval because interrupt persistence and resume_inputs handling do not match the framework lifecycle.
Who is it for?
Indie developers building production-grade ADK agents with workflows, long-running tools, or mandatory human approval steps.
Skip if: Teams only prompting a single LLM call with no ADK graphs, or builders who have not chosen Python ADK as their agent stack.
When should I use this skill?
You are implementing or debugging ADK Python workflows with interrupts, session resume, or long_running_tool_ids.
What do I get? / Deliverables
You can design and debug ADK graphs with correct interrupt propagation, session resume, and rerun_on_resume semantics before shipping HITL features.
- Architecture-aligned interrupt and resume design for your ADK graph
- Debugging checklist for partial FRs, run_id, and rerun_on_resume cases
Recommended Skills
Journey fit
ADK architecture is core agent framework knowledge during product construction, not a launch or growth tactic. Agent-tooling subphase is where orchestration lifecycles, HITL, and workflow graphs are designed and coded.
How it compares
Framework lifecycle reference for ADK—not a generic ‘build an AI app’ starter or an MCP integration skill.
Common Questions / FAQ
Who is adk-architecture for?
Solo builders and small teams implementing Google ADK Python agents that need HITL, checkpoints, and workflow resume behavior to be correct under load.
When should I use adk-architecture?
Use it during Build while designing agent-tooling graphs, before Ship when reviewing interrupt/resume edge cases, and when debugging Operate incidents involving stuck long-running tools.
Is adk-architecture safe to install?
See the Security Audits panel on this Prism page; the skill is documentation-oriented architecture guidance and does not by itself execute shell or network actions.
SKILL.md
READMESKILL.md - Adk Architecture
# Checkpoint and Resume Lifecycle HITL (Human-in-the-Loop) follows this pattern: 1. **Interrupt**: Node yields an event with `long_running_tool_ids`. Each ancestor propagates the interrupt upward via `ctx.interrupt_ids`. 2. **Persist**: Only the leaf node's interrupt event is persisted to session. Workflow sets `ctx._interrupt_ids` directly (no internal event needed). 3. **Resume**: User sends a `FunctionResponse` message. The Runner scans session events to find the matching `invocation_id`, then reconstructs node state from persisted events. 4. **Continue**: The interrupted node receives the FR and continues execution. Downstream nodes receive the resumed node's output. ## run_id on resume Resumed nodes reuse the same `run_id` from the original execution. From the node's perspective, the execution never paused — events before and after the resume share the same run_id. Fresh dispatches (first run, loop re-trigger) get a new run_id. ## Resume behavior by `rerun_on_resume` A node with multiple interrupt IDs may receive partial FRs (only some resolved). The behavior depends on `rerun_on_resume`: **`rerun_on_resume=True`** (Workflow, orchestration nodes): | FRs received | Status | Behavior | |---|---|---| | Partial | PENDING | Re-execute immediately with partial `resume_inputs`. Node handles remaining interrupts internally (e.g., Workflow dispatches resolved children, keeps unresolved as WAITING). | | All | PENDING | Re-execute with all `resume_inputs`. | This is critical for Workflow — when one child's FR arrives, it re-runs immediately to dispatch that resolved child. It doesn't wait for all children's FRs. **`rerun_on_resume=False`** (leaf nodes, simple HITL): | FRs received | Status | Behavior | |---|---|---| | Partial | WAITING | Stay waiting. Need all FRs. | | All | COMPLETED | Auto-complete. Output = aggregated `resolved_responses`. No re-execution. | ## Resume with prior output and interrupts A node can produce output AND interrupt in the same execution (e.g., a Workflow where child A completes with output and child B interrupts). On resume: - Some interrupt IDs are resolved (provided in `resume_inputs`) - Remaining interrupt IDs carry forward via `prior_interrupt_ids` - Prior output carries forward via `prior_output` - NodeRunner pre-populates ctx with these values before re-executing ```python runner = NodeRunner( node=node, parent_ctx=ctx, run_id=prior_run_id, # reuse prior_output=cached_output, prior_interrupt_ids={'fc-2'}, # still unresolved ) child_ctx = await runner.run( node_input=input, resume_inputs={'fc-1': response}, ) ``` # Context ## Architecture The runtime uses two scoping objects: - **InvocationContext** — singleton per invocation. Holds shared state (session, services, event queue) accessible by all nodes. Pydantic model at `agents/invocation_context.py`. - **Context** — one per node execution. Holds per-node results (output, route, interrupt_ids) and provides the API surface for node code. At `agents/context.py`. Every Context holds a reference to the same InvocationContext (`_invocation_context`). Service access (artifacts, memory, auth) is delegated through it. ``` Root Context ← created by Runner from IC └── Context [runner.node] ← the root node (e.g., Workflow) ├── Context [child_a] ← child node A └── Context [child_b] ← child node B └── Context [grandchild] ← nested child ``` The Runner creates `root_ctx = Context(ic)` as the tree root and passes it as `parent_ctx` to `NodeRunner(node=self.node)`. The root Context has no node_path or run_id — it exists solely as the parent for the Runner's root node. All Contexts in the tree share the same InvocationContext singleton. InvocationContext contents: - `session`, `agent`, `user_content` - `invocation_id`, `app_name`, `user_id` - Services: `artifact_service`, `memory_service`, `credential_service` - `run_config`, `li