Adk Architecture

Name: Adk Architecture
Author: google

google/adk-python

1 installs
20.9k repo stars
Updated July 28, 2026
google/adk-python

adk-architecture skill documents ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration.

About

adk-architecture skill documents ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration. Use this skill whenever you need to understand the architecture, event flow, or state management of the ADK system, or when designing or modifying core compone. name: adk-architecture description: ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration. Use this skill whenever you need to understand the architecture, event flow, or state management of the ADK system, or when designing or modifying core components. Triggers on "how does X work", "design of", "architecture o

ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM co
Platform-specific setup patterns for adk-architecture.
Evidence-backed steps from upstream SKILL.md.
When-to-use criteria for adk-architecture versus alternatives.

Adk Architecture by the numbers

1 all-time installs (skills.sh)
Ranked #2,003 of 2,742 Automation & Workflows skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

adk-architecture capabilities & compatibility

Capabilities: adk architecture quick start · adk architecture when to use guidance · adk architecture integration patterns

npx skills add https://github.com/google/adk-python --skill adk-architecture

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/google/adk-python/adk-architecture.svg)](https://skillselion.com/skills/google/adk-python/adk-architecture)

Installs	1
repo stars	★ 20.9k
Security audit	3 / 3 scanners passed
Last updated	July 28, 2026
Repository	google/adk-python ↗

How do I use adk-architecture correctly?

ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration. Use this skill whenever you need to understand the architec

Who is it for?

Teams implementing adk-architecture workflows from the catalog.

Skip if: Skip when requirements clearly match a different specialized stack.

When should I use this skill?

User asks about adk-architecture, adk architectural knowledge - graph orchestration, resumption, execution flow, node contra.

What you get

Working adk-architecture setup with validated configuration and next steps.

Files

SKILL.mdMarkdownGitHub ↗

ADK Architecture Guide

Core Interfaces (references/interfaces/)

BaseNode — node contract, output/streaming, state/routing, HITL, configuration
Workflow — graph orchestration, dynamic nodes (tracking/dedup/resume), transitive dynamic nodes, interrupt propagation, design rules for node authors
Runner — The public interface for executing workflows and agents. Documents entrance methods run and run_async.
Agent — Blueprint defining identity, instructions, and tools. Documents that run is the preferred entrance method.
BaseAgent — Base class for all agents. Defines the contract for subclassing with _run_impl as the primary override point.
Event — Core data structure for state reconstruction and communication. Represents a conversation turn or action.

Key Principles (references/principles/)

API Principles — stability, backward compatibility, and self-containment. Use when making design choices that affect the public API surface.

Runtime Knowledge (references/architecture/)

Context — 1:1 node-context mapping, InvocationContext singleton, property reference
NodeRunner — two communication channels, execution flow, output delegation. Internal runtime details.
Runner Roles — Runner vs NodeRunner vs Workflow separation. Explains why they are separate to avoid deadlocks.
Checkpoint and Resume — HITL lifecycle, rerun_on_resume, run_id
Observability — span-on-Context design, NodeRunner integration, correlated logs, metrics
LLM Context Orchestration — relationship between events and LLM context, task delegation translation, branch isolation. Use when modifying event processing, context preparation for LLMs, or debugging context pollution issues.

Checkpoint and Resume Lifecycle

HITL (Human-in-the-Loop) follows this pattern:

1. Interrupt: Node yields an event with long_running_tool_ids. Each ancestor propagates the interrupt upward via ctx.interrupt_ids. 2. Persist: Only the leaf node's interrupt event is persisted to session. Workflow sets ctx._interrupt_ids directly (no internal event needed). 3. Resume: User sends a FunctionResponse message. The Runner scans session events to find the matching invocation_id, then reconstructs node state from persisted events. 4. Continue: The interrupted node receives the FR and continues execution. Downstream nodes receive the resumed node's output.

run_id on resume

Resumed nodes reuse the same run_id from the original execution. From the node's perspective, the execution never paused — events before and after the resume share the same run_id.

Fresh dispatches (first run, loop re-trigger) get a new run_id.

Resume behavior by `rerun_on_resume`

A node with multiple interrupt IDs may receive partial FRs (only some resolved). The behavior depends on rerun_on_resume:

`rerun_on_resume=True` (Workflow, orchestration nodes):

FRs received	Status	Behavior
Partial	PENDING	Re-execute immediately with partial `resume_inputs`. Node handles remaining interrupts internally (e.g., Workflow dispatches resolved children, keeps unresolved as WAITING).
All	PENDING	Re-execute with all `resume_inputs`.

This is critical for Workflow — when one child's FR arrives, it re-runs immediately to dispatch that resolved child. It doesn't wait for all children's FRs.

`rerun_on_resume=False` (leaf nodes, simple HITL):

FRs received	Status	Behavior
Partial	WAITING	Stay waiting. Need all FRs.
All	COMPLETED	Auto-complete. Output = aggregated `resolved_responses`. No re-execution.

Resume with prior output and interrupts

A node can produce output AND interrupt in the same execution (e.g., a Workflow where child A completes with output and child B interrupts). On resume:

Some interrupt IDs are resolved (provided in resume_inputs)
Remaining interrupt IDs carry forward via prior_interrupt_ids
Prior output carries forward via prior_output
NodeRunner pre-populates ctx with these values before re-executing

runner = NodeRunner(
    node=node, parent_ctx=ctx,
    run_id=prior_run_id,  # reuse
    prior_output=cached_output,
    prior_interrupt_ids={'fc-2'},  # still unresolved
)
child_ctx = await runner.run(
    node_input=input,
    resume_inputs={'fc-1': response},
)

Context

Architecture

The runtime uses two scoping objects:

InvocationContext — singleton per invocation. Holds shared

state (session, services, event queue) accessible by all nodes. Pydantic model at agents/invocation_context.py.

Context — one per node execution. Holds per-node results

(output, route, interrupt_ids) and provides the API surface for node code. At agents/context.py.

Every Context holds a reference to the same InvocationContext (_invocation_context). Service access (artifacts, memory, auth) is delegated through it.

Root Context                      ← created by Runner from IC
└── Context [runner.node]         ← the root node (e.g., Workflow)
    ├── Context [child_a]         ← child node A
    └── Context [child_b]         ← child node B
        └── Context [grandchild]  ← nested child

The Runner creates root_ctx = Context(ic) as the tree root and passes it as parent_ctx to NodeRunner(node=self.node). The root Context has no node_path or run_id — it exists solely as the parent for the Runner's root node. All Contexts in the tree share the same InvocationContext singleton.

InvocationContext contents:

session, agent, user_content
invocation_id, app_name, user_id
Services: artifact_service, memory_service, credential_service
run_config, live_request_queue
process_queue — shared event queue consumed by the main loop

1:1 node-context mapping

Every node execution gets its own Context instance. The relationship is strictly 1:1: one node, one Context. The Context tree mirrors the node execution tree.

NodeRunner creates the child Context from the parent's Context via _create_child_context(). The child inherits:

_invocation_context — same singleton (shared session, services)
node_path — parent path + node name (e.g., wf/child_a)
run_id — unique per execution (reused on resume)
event_author — inherited from parent
schedule_dynamic_node_internal — inherited from parent

The child does NOT inherit output, route, or interrupt_ids — those are per-execution results, starting fresh (unless resume carries forward prior_output / prior_interrupt_ids).

Node result properties

These properties on Context are the primary mechanism for communicating results between nodes:

`ctx.output` — the node's result value. Set once per

execution. Can be set via yield value (framework sets it) or ctx.output = X directly. Second write raises ValueError.

`ctx.route` — routing value for conditional edges. Set

independently of output. Workflow-specific.

`ctx.interrupt_ids` — accumulated interrupt IDs. Read-only

for user code. Set by framework when node yields an Event with long_running_tool_ids.

Output and interrupts can coexist — the orchestrator's _finalize decides what to propagate. The orchestrator reads these properties after the child node finishes.

Class hierarchy

ReadonlyContext          (agents/readonly_context.py)
  └── Context            (agents/context.py)

ReadonlyContext — read-only view used in callbacks and plugins:

user_content, invocation_id, agent_name
state (returns MappingProxyType — immutable view)
session, user_id, run_config

Context(ReadonlyContext) — full read-write context for node execution. Extends ReadonlyContext with mutable state, node results, workflow metadata, and service methods. See property reference below.

Property reference

Category	Properties
State & actions	`state` (mutable `State`), `actions` (EventActions)
Node results	`output`, `route`, `interrupt_ids` (read-only)
Workflow	`node_path`, `run_id`, `triggered_by`, `in_nodes`, `resume_inputs`, `retry_count`, `event_author`
Methods	`run_node()`, `get_next_child_run_id()`
Artifacts	`load_artifact()`, `save_artifact()`, `list_artifacts()`
Memory	`search_memory()`, `add_session_to_memory()`, `add_events_to_memory()`, `add_memory()`
Auth	`request_credential()`, `load_credential()`, `save_credential()`
Tools	`request_confirmation()`, `function_call_id`

LLM Context Orchestration from Events

Core Principle

In ADK, there is a clear distinction between the Event Stream and the LLM Context:

Events are the Ground Truth: They are immutable records of what has happened in a session (user messages, model responses, tool calls, results). They serve as the audit log and persistence state.
LLM Context is an Orchestrated View: The context passed to an LLM is not merely a dump of the raw event log. It is a carefully orchestrated view, filtered and transformed to match the specific role, task, and branch of the agent currently executing.

Orchestration Strategies

The framework orchestrates the translation of events into LLM context using several strategies:

1. Task Delegation Translation

When a coordinator agent delegates a task to a sub-agent (Task Agent) via a tool call:

Source Event: Coordinator calls a tool like request_task_<sub_agent_name>(args...).
Orchestrated Context:
The arguments in the request_task_<sub_agent_name> tool call are extracted and placed in the System Instruction (SI) or treated as the core instruction for the sub-agent.
The first user message presented to the sub-agent is synthesized to represent the goal (e.g., "Finish task of [sub_agent_name] with arguments [args]").
Goal: Isolate the sub-agent from the coordinator's full history and give it a crisp, clear starting point.

2. Branch Isolation

In complex workflows with parallel execution:

Source Events: Events from all nodes and branches are stored in the same session chronologically.
Orchestrated Context: The framework filters events by branch (e.g., node:path.name). An agent only sees events that belong to its own execution path.
Goal: Prevent cross-node event pollution and ensure deterministic behavior in isolated tasks.

3. History Trimming and Compaction

To prevent context window overflow and stale instruction loops:

Source Events: A long history of retries, tool calls, and interactions.
Orchestrated Context: The framework may trim older events or summarize them (event compaction). In task mode, it might keep only the essential setup events, ignoring stale retry loops that would otherwise confuse the LLM.
Goal: Maintain a focused and efficient context window for the LLM.

Summary

The relationship is one of Source vs. View. Events are the source of truth for the session, while LLM context is a highly orchestrated view of that truth, tailored for the active agent.

NodeRunner

NodeRunner is the per-node executor. It drives BaseNode.run(), creates the child Context, enriches events, and writes results to ctx.

Two communication channels

The runtime has two distinct channels for data flow:

Context — parent ↔ child communication. Output, route, state,

resume_inputs, and interrupt_ids flow through ctx. The orchestrator reads ctx after the child completes to decide what to do next.

Event — persistence and streaming. Events are appended to the

session and streamed to the caller. They carry message, state deltas, function calls, and interrupt markers.

A node writes to ctx to communicate with its parent. A node yields Events to persist data and stream messages to the user.

Execution flow

Orchestrator
  │
  ├─ NodeRunner(node=child, parent_ctx=ctx)
  │    │
  │    ├─ _create_child_context()     → child Context
  │    ├─ _execute_node()             → iterate node.run()
  │    │    ├─ _track_event_in_context()  → write to ctx
  │    │    └─ _enqueue_event()           → enrich + persist
  │    ├─ _flush_output_and_deltas()  → emit deferred output
  │    └─ return child ctx
  │
  └─ reads ctx.output, ctx.route, ctx.interrupt_ids

1. Create child Context — inherits _invocation_context (shared singleton), builds node_path from parent, assigns run_id.

2. Iterate `node.run()` — for each yielded Event:

Track in context — _track_event_in_context writes output, route, and interrupt_ids from the event to ctx (source of truth).

Enrich — _enrich_event stamps metadata before persistence:

event.author — node name (or event_author override)
event.invocation_id — from InvocationContext
event.node_info.path — full path (e.g., wf/child_a)
event.node_info.run_id — unique per execution
event.node_info.output_for — ancestor paths when

use_as_output=True

Flush deltas — for non-partial events, _flush_deltas moves pending state/artifact deltas from ctx.actions onto the event before enqueueing.

Enqueue — ic.enqueue_event puts the event on the shared process queue for session persistence.

3. Flush deferred output — if ctx.output was set directly (not via yield), _flush_output_and_deltas emits the output Event after _run_impl returns. Bundles any remaining state/artifact deltas onto the same Event.

4. Return child ctx — the orchestrator reads ctx.output, ctx.route, and ctx.interrupt_ids.

Output delegation (`use_as_output`)

When a child is scheduled with use_as_output=True, its output Event also counts as the parent's output. NodeRunner:

Sets ctx._output_delegated = True on the parent
Skips emitting the parent's own output Event
Stamps event.node_info.output_for with ancestor paths

Observability

Design: span on Context

Each Context carries a _span field. Since Context forms a 1:1 parent-child tree with node executions (see Context), span hierarchy follows naturally — no separate span management needed.

Root Context._span (invocation)     ← Runner sets this
└── ctx[workflow]._span             ← NodeRunner creates
    ├── ctx[child_a]._span          ← NodeRunner creates
    │   ├── (call_llm span)         ← auto-parented
    │   └── (execute_tool span)     ← auto-parented
    ├── ctx[child_b]._span          ← NodeRunner creates
    │   └── ctx[grandchild]._span   ← nested
    └── ctx[child_c]._span          ← ctx.run_node()

Runner creates root_ctx and the invocation span, storing it as root_ctx._span. This becomes the parent for all node spans.

NodeRunner creates each node's span, explicitly parented to parent_ctx._span, stores it on child_ctx._span, and closes it before returning (see NodeRunner for the execution flow).

Always use `ctx._span` explicitly — never rely on OTel's implicit "current span" context. In a concurrent asyncio.Task runtime, implicit context can be unreliable across concurrent nodes. All tracing operations (attributes, logs, child spans) should go through ctx._span.

Span lifecycle:

1. NodeRunner.run() creates span via tracer.start_span(), parented to parent_ctx._span, stored on ctx._span 2. Node executes; all tracing goes through ctx._span explicitly 3. NodeRunner.run() calls ctx._span.end() before returning 4. BatchSpanProcessor buffers ended spans, exports periodically 5. OTLPSpanExporter sends batch to the OTLP endpoint

Interrupted nodes: Span ends immediately when NodeRunner returns — not left open waiting for resume. Otherwise the span would be invisible to the backend until resume (which could be minutes, hours, or never). The resumed execution starts a fresh span in a new Runner.run_async() call (same invocation_id, different trace — possibly on a different server).

NodeRunner integration

Context changes — add _span field:

class Context(ReadonlyContext):
    _span: Span | None = None

NodeRunner.run():

NodeRunner.run() lifecycle:

1. Create child ctx 2. Create span, parented to parent_ctx._span 3. Store on ctx._span 4. Set node attributes (name, path, run_id, type) 5. Execute node

Node can add custom attributes to ctx._span during

execution (e.g., SingleAgentReactNode adds gen_ai.agent.name, gen_ai.request_model)

On interrupt: mark span node.interrupted = True
On error: set span status ERROR, record exception

6. Set result attributes (has_output, interrupted, resumed) 7. Close span (ctx._span.end()) — always, even on interrupt 8. Return ctx

Key points:

Use tracer.start_span() with explicit parent context from

parent_ctx._span — never rely on implicit OTel context in concurrent async code

Span always ends before run() returns, even on interrupt

Span attributes and semantic conventions

Set at span creation (available for sampling decisions):

Attribute	Source	Example
`node.name`	`self._node.name`	`"call_llm"`
`node.path`	`ctx.node_path`	`"wf/child_a"`
`node.run_id`	`self._run_id`	`"child_a_abc123"`
`node.type`	`type(self._node).__name__`	`"CallLlmNode"`

Set after execution (result attributes):

Attribute	Source	Example
`node.has_output`	`ctx.output is not None`	`true`
`node.interrupted`	`bool(ctx.interrupt_ids)`	`false`
`node.resumed`	`bool(resume_inputs)`	`false`

GenAI semantic conventions for node spans:

gen_ai.operation.name = "invoke_agent" for agent nodes
gen_ai.operation.name = "execute_tool" for tool nodes
gen_ai.agent.name, gen_ai.tool.name as appropriate
Span kind: INTERNAL (in-process orchestration)

Correlated logs

Use the OTel Logs API for point-in-time occurrences within a node's span. Context provides emit_log() for better DX — wraps set_span_in_context(self._span) internally so callers don't manage OTel context:

# On Context:
def emit_log(self, body: str, **attributes):
    span_ctx = set_span_in_context(self._span)
    otel_logger.emit(
        LogRecord(body=body, attributes=attributes),
        context=span_ctx,
    )

# Usage:
ctx.emit_log('node.event.yielded',
    has_output=event.output is not None,
    has_message=event.content is not None,
)

Python logging

Use the google_adk logger namespace:

Level	What to log
`DEBUG`	Node started, node completed, event enqueued
`INFO`	Node interrupted, node resumed, dynamic node scheduled
`WARNING`	Node timeout, retry triggered
`ERROR`	Node failed, unhandled exception

logger = logging.getLogger("google_adk." + __name__)

logger.debug(
    'Node %s started (run_id=%s, path=%s)',
    node.name, run_id, ctx.node_path,
)

Use %-style formatting (lazy evaluation) for logging, not f-strings.

Metrics (future)

Metric	Type	Description
`node.execution.duration`	Histogram	Per node type
`node.execution.count`	Counter	Per node type and status
`node.interrupt.count`	Counter	HITL interrupts
`node.resume.count`	Counter	Resumed executions
`workflow.active_nodes`	UpDownCounter	Currently executing

BaseNode

BaseNode is the primitive unit of execution in the workflow runtime. Every computation — LLM calls, tool execution, orchestration — is a node. It is a Pydantic BaseModel subclass.

The node contract

Every node follows a two-method pattern:

run() is @final — normalizes yields to Events. Never override.
_run_impl() is the extension point — subclasses implement their

logic here as an async generator.

class MyNode(BaseNode):
    async def _run_impl(self, *, ctx, node_input):
        result = do_work(node_input)
        yield result  # becomes Event(output=result)

Why this split: run() guarantees consistent normalization regardless of what the subclass does. The subclass only thinks about its domain logic.

Normalization rules (run() applies these to each yield):

None → skipped
Event → pass through
RequestInput → interrupt Event
any other value → Event(output=value)

Generator conventions:

A node can yield three types of data:

Output — the node's result value. Flows between nodes

(parent reads ctx.output after child completes). At most one per execution (second raises ValueError).

Message — user-visible content streamed to the end user

(e.g., progress text, partial responses). Multiple allowed.

Route — Workflow-specific concept. Triggers conditional

edges in the graph. Set via ctx.route or event.actions.route.

Additional rules:

Yielding nothing produces no output event
yield None is silently skipped

A custom node interacts with the runtime through two arguments:

`ctx` (Context) — communicate results to the parent node
`node_input` — data passed by the parent/orchestrator

Output and streaming

Three ways to produce output (pick one per execution):

# 1. Yield a value (most common)
async def _run_impl(self, *, ctx, node_input):
    yield compute(node_input)

# 2. Set ctx.output directly
async def _run_impl(self, *, ctx, node_input):
    ctx.output = compute(node_input)
    return
    yield  # generator contract

# 3. Yield an Event with output
async def _run_impl(self, *, ctx, node_input):
    yield Event(output=compute(node_input))

A second output raises ValueError — at most one per execution.

Streaming messages — yield Events with message to send user-visible text (message is an alias for content on Event):

async def _run_impl(self, *, ctx, node_input):
    yield Event(message='working...')
    yield final_result  # this is the output

State and routing

Mutating state:

async def _run_impl(self, *, ctx, node_input):
    ctx.state['key'] = 'value'  # recorded as state_delta
    yield result

Setting route for conditional edges:

async def _run_impl(self, *, ctx, node_input):
    ctx.route = 'approve' if score > 0.8 else 'reject'
    yield node_input

Advanced: child nodes and HITL

Running child nodes via ctx.run_node():

async def _run_impl(self, *, ctx, node_input):
    child_result = await ctx.run_node(some_node, node_input)
    yield f'child said: {child_result}'

Requires rerun_on_resume = True on the calling node.

Requesting interrupt (HITL):

async def _run_impl(self, *, ctx, node_input):
    if ctx.resume_inputs and 'fc-1' in ctx.resume_inputs:
        yield f'approved: {ctx.resume_inputs["fc-1"]}'
        return
    yield Event(long_running_tool_ids={'fc-1'})

Configuration reference

Field	Type	Default	Purpose
`name`	`str`	required	Unique identifier
`description`	`str`	`''`	Human-readable description
`rerun_on_resume`	`bool`	`False`	Re-execute on resume (required for `ctx.run_node()`)
`wait_for_output`	`bool`	`False`	Stay WAITING until output is yielded (for join nodes)
`retry_config`	`RetryConfig \	None`	`None`
`timeout`	`float \	None`	`None`
`input_schema`	`SchemaType \	None`	`None`
`output_schema`	`SchemaType \	None`	`None`

Workflow

Workflow is a graph-based orchestration node. It extends BaseNode and implements _run_impl() as a scheduling loop that drives static graph nodes and tracks dynamic nodes spawned by ctx.run_node().

Two kinds of child nodes

Workflow manages two kinds of child nodes:

Static (graph) nodes — declared in edges, compiled into a

WorkflowGraph. Scheduled by the orchestration loop via triggers and asyncio.Tasks. Tracked in _LoopState.nodes by node name.

Dynamic nodes — spawned at runtime via ctx.run_node() from

inside a graph node's _run_impl. Tracked in _LoopState.dynamic_nodes by full node_path. Managed by DynamicNodeScheduler.

Static and dynamic nodes share the same _LoopState.interrupt_ids set, so the Workflow sees a unified view of all pending interrupts.

Implementing a graph node

A graph node is a regular BaseNode placed in a Workflow's edges. The Workflow wraps it in a NodeRunner, creates a child Context, and reads ctx.output, ctx.route, and ctx.interrupt_ids after it completes.

Output — two paths. At most one per execution. The Workflow reads the output to pass downstream.

# Yield (persisted immediately)
async def _run_impl(self, *, ctx, node_input):
    yield compute(node_input)

# ctx (deferred until node end)
async def _run_impl(self, *, ctx, node_input):
    ctx.output = compute(node_input)
    return
    yield

Routing — two paths. The Workflow uses the route to select conditional edges.

# Yield (persisted immediately)
async def _run_impl(self, *, ctx, node_input):
    yield Event(route='approve' if node_input > 0.8 else 'reject')

# ctx (deferred until node end)
async def _run_impl(self, *, ctx, node_input):
    ctx.route = 'approve' if node_input > 0.8 else 'reject'
    yield node_input

State — two paths. ctx.state deltas are flushed onto the next yielded Event, or a final Event at node end.

# Yield (persisted immediately)
async def _run_impl(self, *, ctx, node_input):
    yield Event(state={'count': 1})

# ctx (flushed onto next/final Event)
async def _run_impl(self, *, ctx, node_input):
    ctx.state['count'] = 1
    yield result

Interrupts — yield only (ctx.interrupt_ids is read-only). The Workflow marks the node WAITING and propagates the interrupt IDs upward. On resume, if rerun_on_resume=True (default for Workflow), the node is re-executed with ctx.resume_inputs populated.

async def _run_impl(self, *, ctx, node_input):
    if ctx.resume_inputs and 'fc-1' in ctx.resume_inputs:
        yield f'approved: {ctx.resume_inputs["fc-1"]}'
        return
    yield Event(long_running_tool_ids={'fc-1'})

Dynamic nodes via ctx.run_node()

A graph node can spawn child nodes at runtime:

class Orchestrator(BaseNode):
    rerun_on_resume: bool = True  # required

    async def _run_impl(self, *, ctx, node_input):
        result = await ctx.run_node(some_node, input_data)
        yield f'child returned: {result}'

Requirements

The calling node must have rerun_on_resume = True. Without

this, the Workflow cannot re-execute the node on resume to let it re-acquire its dynamic children's results.

Tracking

Dynamic nodes are tracked by full node_path, not by name alone. The path is parent_path/child_name:

wf/graph_node_a/dynamic_child     ← dynamic node under graph_node_a
wf/graph_node_a/dynamic_child/inner  ← transitive dynamic node

The child_name comes from either:

The name parameter on ctx.run_node(node, name='explicit')
The node's own name field (default)

Each unique node_path is tracked exactly once in _LoopState.dynamic_nodes. This enables:

Dedup — if the same path is encountered again (after resume),

the cached output is returned without re-execution.

Resume — if the node was interrupted, its state is

reconstructed from session events via lazy scan.

Dedup and resume protocol (DynamicNodeScheduler)

When ctx.run_node() is called, the scheduler checks three cases:

1. Fresh — no prior events for this node_path. Execute via NodeRunner, record output or interrupts in _LoopState.

2. Completed — prior events show the node produced output. Return cached output immediately. No re-execution.

3. Waiting — prior events show the node was interrupted:

Unresolved interrupts → propagate interrupt IDs to the caller

(via _LoopState.interrupt_ids). The caller raises NodeInterruptedError.

All resolved → re-execute with resume_inputs from the

resolved function responses.

State reconstruction is lazy: the scheduler scans session events only on the first ctx.run_node() call for a given path, not upfront. This avoids scanning for dynamic nodes that won't be re-invoked.

Interrupt propagation

When a dynamic child interrupts:

1. DynamicNodeScheduler._record_result sets the child's status to WAITING and adds its interrupt IDs to _LoopState.interrupt_ids. 2. ctx.run_node() checks child_ctx.interrupt_ids. If non-empty, it propagates them to the calling node's ctx._interrupt_ids and raises NodeInterruptedError. 3. NodeRunner catches NodeInterruptedError in _execute_node and records the interrupt on the calling node's Context. 4. The Workflow's _handle_completion sees the interrupt and marks the graph node as WAITING.

On resume, the Workflow re-executes the graph node (because rerun_on_resume=True). The graph node calls ctx.run_node() again, which hits the scheduler. The scheduler lazily scans events, finds the resolved FR, and either returns cached output or re-executes the dynamic child with resume_inputs.

Output delegation (use_as_output)

ctx.run_node(node, use_as_output=True) makes the dynamic child's output count as the calling node's output:

class Delegator(BaseNode):
    rerun_on_resume: bool = True

    async def _run_impl(self, *, ctx, node_input):
        # child's output becomes this node's output
        await ctx.run_node(worker, node_input, use_as_output=True)

Sets ctx._output_delegated = True on the parent
NodeRunner stamps event.node_info.output_for with ancestor paths
Only one use_as_output=True per execution (second raises

ValueError)

Dynamic nodes from dynamic nodes (transitive)

A dynamic node can itself call ctx.run_node(), creating a transitive chain:

class Outer(BaseNode):
    rerun_on_resume: bool = True

    async def _run_impl(self, *, ctx, node_input):
        result = await ctx.run_node(Inner(name='inner'), 'data')
        yield result

class Inner(BaseNode):
    rerun_on_resume: bool = True

    async def _run_impl(self, *, ctx, node_input):
        sub = await ctx.run_node(Leaf(name='leaf'), node_input)
        yield f'inner got: {sub}'

This works because:

All dynamic nodes in the subtree are tracked by the same

enclosing Workflow. The scheduler is inherited down the Context tree automatically.

Each level gets a unique node_path:

wf/graph_node/outer/inner/leaf

Nested interrupts are correctly attributed — the scheduler

matches events from any descendant under a given path.

Only a nested orchestration node (another Workflow or

SingleAgentReactNode) takes over scheduling. Regular nodes inherit the enclosing Workflow's scheduler.

Scoping

Each Workflow has its own DynamicNodeScheduler and _LoopState. A nested Workflow creates a new scheduler, so dynamic nodes within it are scoped to that inner Workflow — not mixed with the outer Workflow's state.

event_author

Workflow sets ctx.event_author = self.name at the start of _run_impl. This propagates to all child Contexts via NodeRunner. All events emitted by children carry this author, giving the UI consistent attribution.

An inner orchestration node (nested Workflow, SingleAgentReactNode) overrides event_author with its own name, so events are attributed to the nearest orchestration ancestor.

Orchestration loop lifecycle

_run_impl
  ├─ SETUP: resume from events OR seed start triggers
  ├─ ctx._schedule_dynamic_node_internal = DynamicNodeScheduler
  ├─ LOOP:
  │    ├─ _schedule_ready_nodes → pop triggers, create NodeRunners
  │    ├─ asyncio.wait(FIRST_COMPLETED)
  │    └─ _handle_completion → update state, buffer downstream
  ├─ await dynamic_pending_tasks
  ├─ _collect_remaining_interrupts
  └─ FINALIZE: set ctx.output or ctx._interrupt_ids

Key behaviors:

Concurrency — max_concurrency limits parallel graph nodes.

Dynamic nodes are excluded (they run inline, throttling would deadlock).

Terminal output — nodes with no outgoing edges are terminal.

Their output is delegated to the Workflow's own output via output_for. Only one terminal node may produce output.

Loop edges — a completed node can be re-triggered by a

downstream edge pointing back to it. Its status resets to PENDING.

Resume from session events

On resume (ctx.resume_inputs is non-empty), the Workflow reconstructs static node states from session events:

1. Scan — single forward pass through events for this invocation. For each direct child, track output, interrupts, and resolved FRs. 2. Derive status per child:

Unresolved interrupts → WAITING
All interrupts resolved → PENDING (re-run with resume_inputs)
Has output → COMPLETED
Partial resume across children: if child A's interrupt is

resolved but child B's is not, A becomes PENDING (re-runs) while B stays WAITING. The Workflow re-interrupts with B's remaining IDs.

Partial resume within a child: if a single child emitted

multiple interrupts (e.g., fc-1 and fc-2) and only fc-1 is resolved:

rerun_on_resume=True (e.g., nested Workflow): re-run with

partial resume_inputs so it can dispatch resolved grandchildren internally. Remaining interrupts propagate back up.

rerun_on_resume=False: stay WAITING until all interrupts

are resolved. 3. Seed triggers — PENDING nodes get triggers so the loop re-executes them with resume_inputs.

Dynamic node state is not scanned upfront — it's lazily reconstructed by DynamicNodeScheduler when ctx.run_node() is called during the re-execution.

Key design rules for node authors

1. Set `rerun_on_resume = True` if your node calls ctx.run_node(). The Workflow must be able to re-execute your node so it can re-acquire dynamic children's results.

2. Use deterministic names for dynamic children. The name parameter on ctx.run_node() determines the node_path, which is the dedup/resume key. Non-deterministic names break resume.

3. Always `await` ctx.run_node() directly. Do not wrap in asyncio.create_task() — the task won't be tracked by the scheduler, errors are swallowed, and cancellation on interrupt won't work.

4. Yield output after all dynamic children complete. If your node calls ctx.run_node() and then yields, the output is emitted only after all children finish. This is the expected pattern.

5. Handle `NodeInterruptedError` only if you need custom logic. Normally, ctx.run_node() raises NodeInterruptedError when a child interrupts. NodeRunner catches it automatically. Only catch it yourself if you need to clean up or adjust state before the interrupt propagates.

6. Don't set `ctx.event_author` unless your node is an orchestration node (like Workflow or SingleAgentReactNode). The Workflow sets it for you and it propagates to all descendants.

Related skills

Agent BrowserGive their coding agent reliable, high-fidelity control over any website or Electron desktop app.577k39.1k

Lark ApprovalLet their AI coding agent create, read, update, and approve items in Lark (Feishu) approval workflows without leaving the coding environment.471k

Lark EventHandle Feishu/Lark bot events, webhooks, and subscription callbacks in agent-driven backend code.471k

Lark Workflow Meeting SummaryAutomatically generate structured meeting summaries and action items from Lark/Feishu calls without manual note-taking.470k

Lark Workflow Standup ReportAutomatically generate and post daily standup reports in Lark/Feishu from their workflow and activity data.470k

Lark Vc AgentGive their coding agent the ability to read, create, and update documents, tasks, and wiki pages inside Feishu (Lark).415k

FAQ

What does adk-architecture do?

adk-architecture skill documents ADK architectural knowledge - graph orchestration, resumption, execution flow, node contracts, observability, and LLM context orchestration.

When should I use adk-architecture?

User asks about adk-architecture, adk architectural knowledge - graph orchestration, resumption, execution flow, node contra.

Is this skill safe to install?

Review the Security Audits panel on this page before installing in production.

Automation & Workflowsworkflow

About

Adk Architecture by the numbers

adk-architecture capabilities & compatibility

Add your badge

How do I use adk-architecture correctly?

Who is it for?

When should I use this skill?

What you get

Files

ADK Architecture Guide

Core Interfaces (references/interfaces/)

Key Principles (references/principles/)

Runtime Knowledge (references/architecture/)

Checkpoint and Resume Lifecycle

run_id on resume

Resume behavior by rerun_on_resume

Resume with prior output and interrupts

Context

Architecture

1:1 node-context mapping

Node result properties

Class hierarchy

Property reference

LLM Context Orchestration from Events

Core Principle

Orchestration Strategies

1. Task Delegation Translation

2. Branch Isolation

3. History Trimming and Compaction

Summary

NodeRunner

Two communication channels

Execution flow

Output delegation (use_as_output)

Observability

Design: span on Context

NodeRunner integration

Span attributes and semantic conventions

Correlated logs

Python logging

Metrics (future)

Runner vs NodeRunner vs Workflow

Agent

Key Characteristics

Entrance Methods

run (Preferred Entrance)

run_async (Deprecated)

run_live

from_config

BaseAgent

Core Contract for Subclasses

_run_impl (Preferred Override Point)

Legacy Methods (Deprecated for Node Execution)

Key Attributes to Configure

BaseNode

The node contract

Output and streaming

State and routing

Advanced: child nodes and HITL

Configuration reference

Event

Purpose

Key Fields

Methods of Interest

Runner

Entrance Methods

run_async

run

Workflow

Two kinds of child nodes

Implementing a graph node

Dynamic nodes via ctx.run_node()

Requirements

Tracking

Dedup and resume protocol (DynamicNodeScheduler)

Interrupt propagation

Output delegation (use_as_output)

Dynamic nodes from dynamic nodes (transitive)

Scoping

event_author

Resume behavior by `rerun_on_resume`

Output delegation (`use_as_output`)

`run` (Preferred Entrance)

`run_async` (Deprecated)

`run_live`

`from_config`

`_run_impl` (Preferred Override Point)

`run_async`

`run`