Apify Generate Output Schema

Name: Apify Generate Output Schema
Author: apify

apify/agent-skills

Generate Apify Actor dataset, output, and key-value store schema JSON from real source code so Console displays run results correctly.

Overview

Apify Generate Output Schema is an agent skill for the Build phase that generates and updates Apify Actor output schema files by analyzing source code.

Install

npx skills add https://github.com/apify/agent-skills --skill apify-generate-output-schema

What is this skill?

Produces dataset_schema.json, output_schema.json, and key_value_store_schema.json when applicable
Code-first analysis: infer fields from what the Actor actually pushes, never guess
Mandatory nullable: true on fields for unpredictable web/API outputs
Cross-checks TypeScript types against runtime push code and reuses repo schema patterns
Updates actor.json to register schemas for Apify Console
Targets three schema artifacts: dataset_schema.json, output_schema.json, and key_value_store_schema.json when KV store i
Phase 1 workflow step: discover Actor structure before generating schemas

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 5.2k installs on skills.sh; 2.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

What problem does it solve?

Your Apify Actor runs but Console cannot display results well because dataset and output schemas are missing, stale, or disconnected from what the code actually stores.

Who is it for?

Solo builders shipping or maintaining Apify Actors who want schemas derived from code rather than hand-waved field lists.

Skip if: Projects with no Apify Actor or teams that only need generic JSON Schema for non-Apify APIs.

When should I use this skill?

Creating or updating Actor output schemas for Apify Console display.

What do I get? / Deliverables

Schema JSON files and actor.json reflect real pushed fields with nullable, anonymized examples so Apify Console can present runs accurately.

dataset_schema.json and output_schema.json aligned to code
key_value_store_schema.json when applicable
Updated actor.json registering output schemas

Recommended Skills

Agent Browservercel-labs/agent-browser

agent-browser is a Node-installed browser automation CLI built for AI agents that need dependable programmatic web inter…428k installs·35.5k stars

Lark Imlarksuite/cli

Lark IM is a Larksuite agent skill that exposes Feishu/Lark instant messaging to Claude Code, Cursor, and similar agents…210k installs·13.7k stars

Lark Calendarlarksuite/cli

lark-calendar is an agent skill for Feishu/Lark Calendar v4 exposed via lark-cli. Solo builders and small teams who alre…209k installs·13.7k stars

Lark Sheetslarksuite/cli

Skill for programmatic Feishu spreadsheet and worksheet management—create tables, bulk data IO, lookup, and export—using…209k installs·13.7k stars

Lark Vclarksuite/cli

lark-vc is an agent skill for Feishu/Lark video conferencing history and artifacts through lark-cli. After calls end, so…208k installs·13.7k stars

Lark Contactlarksuite/cli

CLI skill for Lark directory lookup: search employees and fetch metadata by open_id, with clear boundaries vs IM, calend…208k installs·13.7k stars

Journey fit

Primary fit

BuildIntegrations & version control

Output schemas are created while building or updating Apify Actors, which is integration work on the scraping/automation product itself. The skill wires Apify Console presentation to code-derived dataset and store shapes—classic third-party platform integration during Build.

How it compares

Use as an Apify-specific schema generator tied to actor.json, not as a general-purpose OpenAPI or Prisma schema skill.

Common Questions / FAQ

Who is apify-generate-output-schema for?

Developers building Apify Actors who need Console-ready output schemas synchronized with their scraping or automation code.

When should I use apify-generate-output-schema?

When creating a new Actor’s output schemas or updating schemas after you change dataset or key-value store writes during Build integrations work.

Is apify-generate-output-schema safe to install?

The skill reads Actor source locally; review Security Audits on this Prism page and avoid putting real PII in schema examples per the skill’s anonymization rules.

SKILL.md

READMESKILL.md - Apify Generate Output Schema

# Generate Actor output schema

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create `dataset_schema.json`, `output_schema.json`, and `key_value_store_schema.json` (if the Actor uses key-value store), and update `actor.json`.

## Core principles

- **Analyze code first**: Read the Actor's source to understand what data it actually pushes to the dataset — never guess
- **Every field is nullable**: APIs and websites are unpredictable — always set `"nullable": true`
- **Anonymize examples**: Never use real user IDs, usernames, or personal data in examples
- **Verify against code**: If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
- **Reuse existing patterns**: Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
- **Don't reinvent the wheel**: Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

---

## Phase 1: Discover Actor structure

**Goal**: Locate the Actor and understand its output

Initial request: $ARGUMENTS

**Actions**:
1. Create todo list with all phases
2. Find the `.actor/` directory containing `actor.json`
3. Read `actor.json` to understand the Actor's configuration
4. Check if `dataset_schema.json`, `output_schema.json`, and `key_value_store_schema.json` already exist
5. **Search for existing schemas in the repository**: Look for other `.actor/` directories or schema files (e.g., `**/dataset_schema.json`, `**/output_schema.json`, `**/key_value_store_schema.json`) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure
6. Find all places where data is pushed to the dataset:
   - **JavaScript/TypeScript**: Search for `Actor.pushData(`, `dataset.pushData(`, `Dataset.pushData(`
   - **Python**: Search for `Actor.push_data(`, `dataset.push_data(`, `Dataset.push_data(`
7. Find all places where data is stored in the key-value store:
   - **JavaScript/TypeScript**: Search for `Actor.setValue(`, `keyValueStore.setValue(`, `KeyValueStore.setValue(`
   - **Python**: Search for `Actor.set_value(`, `key_value_store.set_value(`, `KeyValueStore.set_value(`
8. Find output type definitions — **reuse them directly** instead of recreating from scratch:
   - **TypeScript**: Look for output type interfaces/types (e.g., in `src/types/`, `src/types/output.ts`). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition
   - **Python**: Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth
9. Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic
10. If inline `storages.dataset` or `storages.keyValueStore` config exists in `actor.json`, note it for migration

Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.

---

## Phase 2: Generate `dataset_schema.json`

**Goal**: Create a complete dataset schema with field definitions and display views

### File structure

```json
{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,

What is this skill?

Produces dataset_schema.json, output_schema.json, and key_value_store_schema.json when applicable

Code-first analysis: infer fields from what the Actor actually pushes, never guess

Mandatory nullable: true on fields for unpredictable web/API outputs

Cross-checks TypeScript types against runtime push code and reuses repo schema patterns

Updates actor.json to register schemas for Apify Console

Targets three schema artifacts: dataset_schema.json, output_schema.json, and key_value_store_schema.json when KV store i

Phase 1 workflow step: discover Actor structure before generating schemas

Compatible agents: Claude Code, Cursor, Codex, any compatible agent

Adoption & trust: 5.2k installs on skills.sh; 2.1k GitHub stars; 3/3 security scanners passed (skills.sh audits).

Journey fit

Primary fit

BuildIntegrations & version control

SKILL.md

READMESKILL.md - Apify Generate Output Schema

# Generate Actor output schema

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create `dataset_schema.json`, `output_schema.json`, and `key_value_store_schema.json` (if the Actor uses key-value store), and update `actor.json`.

## Core principles

- **Analyze code first**: Read the Actor's source to understand what data it actually pushes to the dataset — never guess
- **Every field is nullable**: APIs and websites are unpredictable — always set `"nullable": true`
- **Anonymize examples**: Never use real user IDs, usernames, or personal data in examples
- **Verify against code**: If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
- **Reuse existing patterns**: Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
- **Don't reinvent the wheel**: Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

---

## Phase 1: Discover Actor structure

**Goal**: Locate the Actor and understand its output

Initial request: $ARGUMENTS

**Actions**:
1. Create todo list with all phases
2. Find the `.actor/` directory containing `actor.json`
3. Read `actor.json` to understand the Actor's configuration
4. Check if `dataset_schema.json`, `output_schema.json`, and `key_value_store_schema.json` already exist
5. **Search for existing schemas in the repository**: Look for other `.actor/` directories or schema files (e.g., `**/dataset_schema.json`, `**/output_schema.json`, `**/key_value_store_schema.json`) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure
6. Find all places where data is pushed to the dataset:
   - **JavaScript/TypeScript**: Search for `Actor.pushData(`, `dataset.pushData(`, `Dataset.pushData(`
   - **Python**: Search for `Actor.push_data(`, `dataset.push_data(`, `Dataset.push_data(`
7. Find all places where data is stored in the key-value store:
   - **JavaScript/TypeScript**: Search for `Actor.setValue(`, `keyValueStore.setValue(`, `KeyValueStore.setValue(`
   - **Python**: Search for `Actor.set_value(`, `key_value_store.set_value(`, `KeyValueStore.set_value(`
8. Find output type definitions — **reuse them directly** instead of recreating from scratch:
   - **TypeScript**: Look for output type interfaces/types (e.g., in `src/types/`, `src/types/output.ts`). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition
   - **Python**: Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth
9. Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic
10. If inline `storages.dataset` or `storages.keyValueStore` config exists in `actor.json`, note it for migration

Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.

---

## Phase 2: Generate `dataset_schema.json`

**Goal**: Create a complete dataset schema with field definitions and display views

### File structure

```json
{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is apify-generate-output-schema for?

When should I use apify-generate-output-schema?

Is apify-generate-output-schema safe to install?

SKILL.md

This week for builders

Overview

Install

What is this skill?

What problem does it solve?

Who is it for?

When should I use this skill?

What do I get? / Deliverables

Recommended Skills

Journey fit

Who is apify-generate-output-schema for?

When should I use apify-generate-output-schema?

Is apify-generate-output-schema safe to install?

SKILL.md