Agent Designer

Name: Agent Designer
Author: alirezarezvani

alirezarezvani/claude-skills

614 installs
23.5k repo stars
Updated July 17, 2026
alirezarezvani/claude-skills

agent-designer is an AI skill that analyzes multi-agent execution logs and returns success rates, cost breakdowns, latency distributions, and concrete optimization recommendations for developers operating agent systems.

About

agent-designer is a Python-based Agent Evaluator skill that ingests JSON execution logs containing tasks, actions, results, elapsed time, and token usage. It computes task success rate, average cost per task, latency distributions, error patterns, and tool usage efficiency, then surfaces bottlenecks and improvement opportunities. Developers reach for agent-designer when multi-agent pipelines are running but performance is opaque—high spend, slow steps, or recurring failures need quantified diagnosis. Output is a structured performance report with bottleneck analysis and optimization recommendations suitable for the next engineering iteration.

Analyzes JSON execution logs from agent runs
Calculates success rate, average cost per task, latency percentiles, and tool efficiency
Identifies recurring error patterns and performance bottlenecks
Generates prioritized optimization recommendations
Outputs structured performance report with bottleneck analysis

Agent Designer by the numbers

614 all-time installs (skills.sh)
Ranked #1,549 of 16,565 AI & Agent Building skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 31, 2026 (Skillselion catalog sync)

npx skills add https://github.com/alirezarezvani/claude-skills --skill agent-designer

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/alirezarezvani/claude-skills/agent-designer.svg)](https://skillselion.com/skills/alirezarezvani/claude-skills/agent-designer)

Installs	614
repo stars	★ 23.5k
Security audit	3 / 3 scanners passed
Last updated	July 17, 2026
Repository	alirezarezvani/claude-skills ↗

How do you analyze multi-agent execution log performance?

Automatically analyze multi-agent execution logs and receive success rates, cost breakdowns, latency distributions, and concrete optimization recommendations.

Who is it for?

Engineers operating multi-agent systems who already capture execution logs and need quantitative bottleneck analysis before changing orchestration.

Skip if: Teams still designing prompts from scratch or lacking structured JSON logs with tasks, actions, timing, and token counts.

When should I use this skill?

Multi-agent execution logs exist and the task is measuring success rate, cost, latency, or identifying inefficient tool usage.

What you get

Performance report with success rate, cost breakdown, latency distribution, error patterns, and prioritized optimization recommendations.

performance report
bottleneck analysis
optimization recommendations

Files

SKILL.mdMarkdownGitHub ↗

Agent Designer — Multi-Agent System Architecture

Design, schema-generate, and evaluate multi-agent systems with three deterministic tools. The scripts are the workflow — do not freehand an architecture when the planner can score one from requirements.

When to use

Designing a new multi-agent system from requirements (pattern choice, roles, comms)
Generating provider-ready tool schemas (Anthropic + OpenAI formats) from plain tool descriptions
Evaluating execution logs: success rate, latency distribution, cost, bottlenecks

When NOT to use: Claude Code Workflow-tool automations → workflow-builder; single-agent workflow scaffolds → agent-workflow-designer; multi-agent fan-out at runtime → agenthub.

Pattern decision table

Choose	When	Watch out for
Single agent	One bounded task, < ~5 tools	Don't add agents you don't need
Supervisor	Central decomposition, specialists report back	Supervisor becomes the bottleneck
Pipeline	Strictly sequential stages with handoffs	Rigid order; slowest stage gates throughput
Hierarchical	Multiple org layers, > ~8 agents	Communication overhead per level
Swarm	Parallel peers, fault tolerance over predictability	Hard to debug; needs consensus rules

The planner applies this scoring deterministically — run it rather than picking by feel.

Workflow

All paths relative to this skill folder. Each step's JSON output is the next step's design input.

1. Design the architecture

Write a requirements JSON (copy assets/sample_system_requirements.json — keys: goal, tasks[], constraints{max_response_time, budget_per_task, concurrent_tasks}, team_size):

python3 agent_planner.py requirements.json --format json -o arch

Emits arch.json with architecture_design (pattern, agents, communication links), mermaid_diagram, and implementation_roadmap. Read architecture_design.pattern and the per-agent role list; present the mermaid diagram to the user.

2. Generate tool schemas

Describe each agent's tools in plain JSON (copy assets/sample_tool_descriptions.json), then:

python3 tool_schema_generator.py tool_descriptions.json --validate -o tools

Emits tools.json (tool_schemas, validation_summary) plus provider-specific tools_anthropic.json / tools_openai.json. Gate: every tool must print `✓ Valid`. Fix any invalid schema before proceeding — never hand an agent an unvalidated schema.

3. Evaluate execution logs

Once the system runs (or against assets/sample_execution_logs.json for a dry run):

python3 agent_evaluator.py execution_logs.json --detailed -o eval

Emits eval.json with summary, agent_metrics, bottleneck_analysis, error_analysis, cost_breakdown, sla_compliance, and optimization_recommendations, plus split files (eval_errors.json, eval_recommendations.json).

4. Verification loop

The design is not done until:

1. tool_schema_generator.py --validate reports 0 invalid schemas. 2. agent_evaluator.py on a pilot run reports 0 critical issues (the tool prints CRITICAL: N critical issues when found). If N > 0, apply the top item in eval_recommendations.json, re-run the pilot, and re-evaluate. 3. Compare your outputs against expected_outputs/ to confirm the schema shape you're consuming hasn't drifted.

References

references/agent_architecture_patterns.md — pattern trade-offs in depth
references/tool_design_best_practices.md — schema, idempotency, error-handling rules
references/evaluation_methodology.md — metric definitions the evaluator implements

#!/usr/bin/env python3
"""
Agent Evaluator - Multi-Agent System Performance Analysis

Takes agent execution logs (task, actions taken, results, time, tokens used) 
and evaluates performance: task success rate, average cost per task, latency 
distribution, error patterns, tool usage efficiency, identifies bottlenecks 
and improvement opportunities.

Input: execution logs JSON
Output: performance report + bottleneck analysis + optimization recommendations
"""

import json
import argparse
import sys
import statistics
from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, asdict
from collections import defaultdict, Counter
from datetime import datetime, timedelta
import re


@dataclass
class ExecutionLog:
    """Single execution log entry"""
    task_id: str
    agent_id: str
    task_type: str
    task_description: str
    start_time: str
    end_time: str
    duration_ms: int
    status: str  # success, failure, partial, timeout
    actions: List[Dict[str, Any]]
    results: Dict[str, Any]
    tokens_used: Dict[str, int]  # input_tokens, output_tokens, total_tokens
    cost_usd: float
    error_details: Optional[Dict[str, Any]]
    tools_used: List[str]
    retry_count: int
    metadata: Dict[str, Any]


@dataclass
class PerformanceMetrics:
    """Performance metrics for an agent or system"""
    total_tasks: int
    successful_tasks: int
    failed_tasks: int
    partial_tasks: int
    timeout_tasks: int
    success_rate: float
    failure_rate: float
    average_duration_ms: float
    median_duration_ms: float
    percentile_95_duration_ms: float
    min_duration_ms: int
    max_duration_ms: int
    total_tokens_used: int
    average_tokens_per_task: float
    total_cost_usd: float
    average_cost_per_task: float
    cost_per_token: float
    throughput_tasks_per_hour: float
    error_rate: float
    retry_rate: float


@dataclass
class ErrorAnalysis:
    """Error pattern analysis"""
    error_type: str
    count: int
    percentage: float
    affected_agents: List[str]
    affected_task_types: List[str]
    common_patterns: List[str]
    suggested_fixes: List[str]
    impact_level: str  # high, medium, low


@dataclass
class BottleneckAnalysis:
    """System bottleneck analysis"""
    bottleneck_type: str  # agent, tool, communication, resource
    location: str
    severity: str  # critical, high, medium, low
    description: str
    impact_on_performance: Dict[str, float]
    affected_workflows: List[str]
    optimization_suggestions: List[str]
    estimated_improvement: Dict[str, float]


@dataclass
class OptimizationRecommendation:
    """Performance optimization recommendation"""
    category: str  # performance, cost, reliability, scalability
    priority: str  # high, medium, low
    title: str
    description: str
    implementation_effort: str  # low, medium, high
    expected_impact: Dict[str, Any]
    estimated_cost_savings: Optional[float]
    estimated_performance_gain: Optional[float]
    implementation_steps: List[str]
    risks: List[str]
    prerequisites: List[str]


@dataclass
class EvaluationReport:
    """Complete evaluation report"""
    summary: Dict[str, Any]
    system_metrics: PerformanceMetrics
    agent_metrics: Dict[str, PerformanceMetrics]
    task_type_metrics: Dict[str, PerformanceMetrics]
    tool_usage_analysis: Dict[str, Any]
    error_analysis: List[ErrorAnalysis]
    bottleneck_analysis: List[BottleneckAnalysis]
    optimization_recommendations: List[OptimizationRecommendation]
    trends_analysis: Dict[str, Any]
    cost_breakdown: Dict[str, Any]
    sla_compliance: Dict[str, Any]
    metadata: Dict[str, Any]


class AgentEvaluator:
    """Evaluate multi-agent system performance from execution logs"""
    
    def __init__(self):
        self.error_patterns = self._define_error_patterns()
        self.performance_thresholds = self._define_performance_thresholds()
        self.cost_benchmarks = self._define_cost_benchmarks()
    
    def _define_error_patterns(self) -> Dict[str, Dict[str, Any]]:
        """Define common error patterns and their classifications"""
        return {
            "timeout": {
                "patterns": [r"timeout", r"timed out", r"deadline exceeded"],
                "category": "performance",
                "severity": "high",
                "common_fixes": [
                    "Increase timeout values",
                    "Optimize slow operations",
                    "Add retry logic with exponential backoff",
                    "Parallelize independent operations"
                ]
            },
            "rate_limit": {
                "patterns": [r"rate limit", r"too many requests", r"quota exceeded"],
                "category": "resource",
                "severity": "medium",
                "common_fixes": [
                    "Implement request throttling",
                    "Add circuit breaker pattern",
                    "Use request queuing",
                    "Negotiate higher limits"
                ]
            },
            "authentication": {
                "patterns": [r"unauthorized", r"authentication failed", r"invalid credentials"],
                "category": "security",
                "severity": "high",
                "common_fixes": [
                    "Check credential rotation",
                    "Implement token refresh logic",
                    "Add authentication retry",
                    "Verify permission scopes"
                ]
            },
            "network": {
                "patterns": [r"connection refused", r"network error", r"dns resolution"],
                "category": "infrastructure",
                "severity": "high",
                "common_fixes": [
                    "Add network retry logic",
                    "Implement fallback endpoints",
                    "Use connection pooling",
                    "Add health checks"
                ]
            },
            "validation": {
                "patterns": [r"validation error", r"invalid input", r"schema violation"],
                "category": "data",
                "severity": "medium",
                "common_fixes": [
                    "Strengthen input validation",
                    "Add data sanitization",
                    "Improve error messages",
                    "Add input examples"
                ]
            },
            "resource": {
                "patterns": [r"out of memory", r"disk full", r"cpu overload"],
                "category": "resource",
                "severity": "critical",
                "common_fixes": [
                    "Scale up resources",
                    "Optimize memory usage",
                    "Add resource monitoring",
                    "Implement graceful degradation"
                ]
            }
        }
    
    def _define_performance_thresholds(self) -> Dict[str, Any]:
        """Define performance thresholds for different metrics"""
        return {
            "success_rate": {"excellent": 0.98, "good": 0.95, "acceptable": 0.90, "poor": 0.80},
            "average_duration": {"excellent": 1000, "good": 3000, "acceptable": 10000, "poor": 30000},
            "error_rate": {"excellent": 0.01, "good": 0.03, "acceptable": 0.05, "poor": 0.10},
            "retry_rate": {"excellent": 0.05, "good": 0.10, "acceptable": 0.20, "poor": 0.40},
            "cost_per_task": {"excellent": 0.01, "good": 0.05, "acceptable": 0.10, "poor": 0.25},
            "throughput": {"excellent": 100, "good": 50, "acceptable": 20, "poor": 5}  # tasks per hour
        }
    
    def _define_cost_benchmarks(self) -> Dict[str, Any]:
        """Define cost benchmarks for different operations"""
        return {
            "token_costs": {
                "gpt-4": {"input": 0.00003, "output": 0.00006},
                "gpt-3.5-turbo": {"input": 0.000002, "output": 0.000002},
                "claude-3": {"input": 0.000015, "output": 0.000075}
            },
            "operation_costs": {
                "simple_task": 0.005,
                "complex_task": 0.050,
                "research_task": 0.020,
                "analysis_task": 0.030,
                "generation_task": 0.015
            }
        }
    
    def parse_execution_logs(self, logs_data: List[Dict[str, Any]]) -> List[ExecutionLog]:
        """Parse raw execution logs into structured format"""
        logs = []
        
        for log_entry in logs_data:
            try:
                log = ExecutionLog(
                    task_id=log_entry.get("task_id", ""),
                    agent_id=log_entry.get("agent_id", ""),
                    task_type=log_entry.get("task_type", "unknown"),
                    task_description=log_entry.get("task_description", ""),
                    start_time=log_entry.get("start_time", ""),
                    end_time=log_entry.get("end_time", ""),
                    duration_ms=log_entry.get("duration_ms", 0),
                    status=log_entry.get("status", "unknown"),
                    actions=log_entry.get("actions", []),
                    results=log_entry.get("results", {}),
                    tokens_used=log_entry.get("tokens_used", {"total_tokens": 0}),
                    cost_usd=log_entry.get("cost_usd", 0.0),
                    error_details=log_entry.get("error_details"),
                    tools_used=log_entry.get("tools_used", []),
                    retry_count=log_entry.get("retry_count", 0),
                    metadata=log_entry.get("metadata", {})
                )
                logs.append(log)
            except Exception as e:
                print(f"Warning: Failed to parse log entry: {e}", file=sys.stderr)
                continue
        
        return logs
    
    def calculate_performance_metrics(self, logs: List[ExecutionLog]) -> PerformanceMetrics:
        """Calculate performance metrics from execution logs"""
        if not logs:
            return PerformanceMetrics(
                total_tasks=0, successful_tasks=0, failed_tasks=0, partial_tasks=0,
                timeout_tasks=0, success_rate=0.0, failure_rate=0.0,
                average_duration_ms=0.0, median_duration_ms=0.0, percentile_95_duration_ms=0.0,
                min_duration_ms=0, max_duration_ms=0, total_tokens_used=0,
                average_tokens_per_task=0.0, total_cost_usd=0.0, average_cost_per_task=0.0,
                cost_per_token=0.0, throughput_tasks_per_hour=0.0, error_rate=0.0, retry_rate=0.0
            )
        
        total_tasks = len(logs)
        successful_tasks = sum(1 for log in logs if log.status == "success")
        failed_tasks = sum(1 for log in logs if log.status == "failure")
        partial_tasks = sum(1 for log in logs if log.status == "partial")
        timeout_tasks = sum(1 for log in logs if log.status == "timeout")
        
        success_rate = successful_tasks / total_tasks if total_tasks > 0 else 0.0
        failure_rate = (failed_tasks + timeout_tasks) / total_tasks if total_tasks > 0 else 0.0
        
        durations = [log.duration_ms for log in logs if log.duration_ms > 0]
        if durations:
            average_duration_ms = statistics.mean(durations)
            median_duration_ms = statistics.median(durations)
            percentile_95_duration_ms = self._percentile(durations, 95)
            min_duration_ms = min(durations)
            max_duration_ms = max(durations)
        else:
            average_duration_ms = median_duration_ms = percentile_95_duration_ms = 0.0
            min_duration_ms = max_duration_ms = 0
        
        total_tokens = sum(log.tokens_used.get("total_tokens", 0) for log in logs)
        average_tokens_per_task = total_tokens / total_tasks if total_tasks > 0 else 0.0
        
        total_cost = sum(log.cost_usd for log in logs)
        average_cost_per_task = total_cost / total_tasks if total_tasks > 0 else 0.0
        cost_per_token = total_cost / total_tokens if total_tokens > 0 else 0.0
        
        # Calculate throughput (tasks per hour)
        if logs and len(logs) > 1:
            start_time = min(log.start_time for log in logs if log.start_time)
            end_time = max(log.end_time for log in logs if log.end_time)
            if start_time and end_time:
                try:
                    start_dt = datetime.fromisoformat(start_time.replace("Z", "+00:00"))
                    end_dt = datetime.fromisoformat(end_time.replace("Z", "+00:00"))
                    time_diff_hours = (end_dt - start_dt).total_seconds() / 3600
                    throughput_tasks_per_hour = total_tasks / time_diff_hours if time_diff_hours > 0 else 0.0
                except:
                    throughput_tasks_per_hour = 0.0
            else:
                throughput_tasks_per_hour = 0.0
        else:
            throughput_tasks_per_hour = 0.0
        
        error_rate = sum(1 for log in logs if log.error_details) / total_tasks if total_tasks > 0 else 0.0
        retry_rate = sum(1 for log in logs if log.retry_count > 0) / total_tasks if total_tasks > 0 else 0.0
        
        return PerformanceMetrics(
            total_tasks=total_tasks,
            successful_tasks=successful_tasks,
            failed_tasks=failed_tasks,
            partial_tasks=partial_tasks,
            timeout_tasks=timeout_tasks,
            success_rate=success_rate,
            failure_rate=failure_rate,
            average_duration_ms=average_duration_ms,
            median_duration_ms=median_duration_ms,
            percentile_95_duration_ms=percentile_95_duration_ms,
            min_duration_ms=min_duration_ms,
            max_duration_ms=max_duration_ms,
            total_tokens_used=total_tokens,
            average_tokens_per_task=average_tokens_per_task,
            total_cost_usd=total_cost,
            average_cost_per_task=average_cost_per_task,
            cost_per_token=cost_per_token,
            throughput_tasks_per_hour=throughput_tasks_per_hour,
            error_rate=error_rate,
            retry_rate=retry_rate
        )
    
    def _percentile(self, data: List[float], percentile: int) -> float:
        """Calculate percentile value from data"""
        if not data:
            return 0.0
        sorted_data = sorted(data)
        index = (percentile / 100) * (len(sorted_data) - 1)
        if index.is_integer():
            return sorted_data[int(index)]
        else:
            lower_index = int(index)
            upper_index = lower_index + 1
            weight = index - lower_index
            return sorted_data[lower_index] * (1 - weight) + sorted_data[upper_index] * weight
    
    def analyze_errors(self, logs: List[ExecutionLog]) -> List[ErrorAnalysis]:
        """Analyze error patterns in execution logs"""
        error_analyses = []
        
        # Collect all errors
        errors = []
        for log in logs:
            if log.error_details:
                errors.append({
                    "error": log.error_details,
                    "agent_id": log.agent_id,
                    "task_type": log.task_type,
                    "task_id": log.task_id
                })
        
        if not errors:
            return error_analyses
        
        # Group errors by pattern
        error_groups = defaultdict(list)
        unclassified_errors = []
        
        for error in errors:
            error_message = str(error.get("error", {})).lower()
            classified = False
            
            for pattern_name, pattern_info in self.error_patterns.items():
                for pattern in pattern_info["patterns"]:
                    if re.search(pattern, error_message):
                        error_groups[pattern_name].append(error)
                        classified = True
                        break
                if classified:
                    break
            
            if not classified:
                unclassified_errors.append(error)
        
        # Analyze each error group
        total_errors = len(errors)
        
        for error_type, error_list in error_groups.items():
            count = len(error_list)
            percentage = (count / total_errors) * 100 if total_errors > 0 else 0.0
            
            affected_agents = list(set(error["agent_id"] for error in error_list))
            affected_task_types = list(set(error["task_type"] for error in error_list))
            
            # Extract common patterns from error messages
            common_patterns = self._extract_common_patterns([str(e["error"]) for e in error_list])
            
            # Get suggested fixes
            pattern_info = self.error_patterns.get(error_type, {})
            suggested_fixes = pattern_info.get("common_fixes", [])
            
            # Determine impact level
            if percentage > 20 or pattern_info.get("severity") == "critical":
                impact_level = "high"
            elif percentage > 10 or pattern_info.get("severity") == "high":
                impact_level = "medium"
            else:
                impact_level = "low"
            
            error_analysis = ErrorAnalysis(
                error_type=error_type,
                count=count,
                percentage=percentage,
                affected_agents=affected_agents,
                affected_task_types=affected_task_types,
                common_patterns=common_patterns,
                suggested_fixes=suggested_fixes,
                impact_level=impact_level
            )
            
            error_analyses.append(error_analysis)
        
        # Handle unclassified errors
        if unclassified_errors:
            count = len(unclassified_errors)
            percentage = (count / total_errors) * 100
            
            error_analysis = ErrorAnalysis(
                error_type="unclassified",
                count=count,
                percentage=percentage,
                affected_agents=list(set(error["agent_id"] for error in unclassified_errors)),
                affected_task_types=list(set(error["task_type"] for error in unclassified_errors)),
                common_patterns=self._extract_common_patterns([str(e["error"]) for e in unclassified_errors]),
                suggested_fixes=["Review and classify error patterns", "Add specific error handling"],
                impact_level="medium" if percentage > 10 else "low"
            )
            
            error_analyses.append(error_analysis)
        
        # Sort by impact and count
        error_analyses.sort(key=lambda x: (x.impact_level == "high", x.count), reverse=True)
        
        return error_analyses
    
    def _extract_common_patterns(self, error_messages: List[str]) -> List[str]:
        """Extract common patterns from error messages"""
        if not error_messages:
            return []
        
        # Simple pattern extraction - find common phrases
        word_counts = Counter()
        for message in error_messages:
            words = re.findall(r'\w+', message.lower())
            for word in words:
                if len(word) > 3:  # Ignore short words
                    word_counts[word] += 1
        
        # Return most common words/patterns
        common_patterns = [word for word, count in word_counts.most_common(5) 
                          if count > 1]
        
        return common_patterns
    
    def identify_bottlenecks(self, logs: List[ExecutionLog], 
                           agent_metrics: Dict[str, PerformanceMetrics]) -> List[BottleneckAnalysis]:
        """Identify system bottlenecks"""
        bottlenecks = []
        
        # Agent performance bottlenecks
        for agent_id, metrics in agent_metrics.items():
            if metrics.success_rate < 0.8:
                severity = "critical" if metrics.success_rate < 0.5 else "high"
                bottlenecks.append(BottleneckAnalysis(
                    bottleneck_type="agent",
                    location=agent_id,
                    severity=severity,
                    description=f"Agent {agent_id} has low success rate ({metrics.success_rate:.1%})",
                    impact_on_performance={
                        "success_rate_impact": (0.95 - metrics.success_rate) * 100,
                        "cost_impact": metrics.average_cost_per_task * metrics.failed_tasks
                    },
                    affected_workflows=self._get_agent_workflows(agent_id, logs),
                    optimization_suggestions=[
                        "Review and improve agent logic",
                        "Add better error handling",
                        "Optimize tool usage",
                        "Consider agent specialization"
                    ],
                    estimated_improvement={
                        "success_rate_gain": min(0.15, 0.95 - metrics.success_rate),
                        "cost_reduction": metrics.average_cost_per_task * 0.2
                    }
                ))
            
            if metrics.average_duration_ms > 30000:  # 30 seconds
                severity = "high" if metrics.average_duration_ms > 60000 else "medium"
                bottlenecks.append(BottleneckAnalysis(
                    bottleneck_type="agent",
                    location=agent_id,
                    severity=severity,
                    description=f"Agent {agent_id} has high latency ({metrics.average_duration_ms/1000:.1f}s avg)",
                    impact_on_performance={
                        "latency_impact": metrics.average_duration_ms - 10000,
                        "throughput_impact": max(0, 50 - metrics.total_tasks)
                    },
                    affected_workflows=self._get_agent_workflows(agent_id, logs),
                    optimization_suggestions=[
                        "Profile and optimize slow operations",
                        "Implement caching strategies",
                        "Parallelize independent tasks",
                        "Optimize API calls"
                    ],
                    estimated_improvement={
                        "latency_reduction": min(0.5, (metrics.average_duration_ms - 10000) / metrics.average_duration_ms),
                        "throughput_gain": 1.3
                    }
                ))
        
        # Tool usage bottlenecks
        tool_usage = self._analyze_tool_usage(logs)
        for tool, usage_stats in tool_usage.items():
            if usage_stats.get("error_rate", 0) > 0.2:
                bottlenecks.append(BottleneckAnalysis(
                    bottleneck_type="tool",
                    location=tool,
                    severity="high" if usage_stats["error_rate"] > 0.4 else "medium",
                    description=f"Tool {tool} has high error rate ({usage_stats['error_rate']:.1%})",
                    impact_on_performance={
                        "reliability_impact": usage_stats["error_rate"] * usage_stats["usage_count"],
                        "retry_overhead": usage_stats.get("retry_count", 0) * 1000  # ms
                    },
                    affected_workflows=usage_stats.get("affected_workflows", []),
                    optimization_suggestions=[
                        "Review tool implementation",
                        "Add better error handling for tool",
                        "Implement tool fallbacks",
                        "Consider alternative tools"
                    ],
                    estimated_improvement={
                        "error_reduction": usage_stats["error_rate"] * 0.7,
                        "performance_gain": 1.2
                    }
                ))
        
        # Communication bottlenecks
        communication_analysis = self._analyze_communication_patterns(logs)
        if communication_analysis.get("high_latency_communications", 0) > 5:
            bottlenecks.append(BottleneckAnalysis(
                bottleneck_type="communication",
                location="inter_agent_communication",
                severity="medium",
                description="High latency in inter-agent communications detected",
                impact_on_performance={
                    "communication_overhead": communication_analysis.get("avg_communication_latency", 0),
                    "coordination_efficiency": 0.8  # Assumed impact
                },
                affected_workflows=communication_analysis.get("affected_workflows", []),
                optimization_suggestions=[
                    "Optimize message serialization",
                    "Implement message batching",
                    "Add communication caching",
                    "Consider direct communication patterns"
                ],
                estimated_improvement={
                    "communication_latency_reduction": 0.4,
                    "overall_efficiency_gain": 1.15
                }
            ))
        
        # Resource bottlenecks
        resource_analysis = self._analyze_resource_usage(logs)
        if resource_analysis.get("high_token_usage_tasks", 0) > 10:
            bottlenecks.append(BottleneckAnalysis(
                bottleneck_type="resource",
                location="token_usage",
                severity="medium",
                description="High token usage detected in multiple tasks",
                impact_on_performance={
                    "cost_impact": resource_analysis.get("excess_token_cost", 0),
                    "latency_impact": resource_analysis.get("token_processing_overhead", 0)
                },
                affected_workflows=resource_analysis.get("high_usage_workflows", []),
                optimization_suggestions=[
                    "Optimize prompt engineering",
                    "Implement response caching",
                    "Use more efficient models for simple tasks",
                    "Add token usage monitoring"
                ],
                estimated_improvement={
                    "cost_reduction": 0.3,
                    "efficiency_gain": 1.1
                }
            ))
        
        # Sort bottlenecks by severity and impact
        severity_order = {"critical": 0, "high": 1, "medium": 2, "low": 3}
        bottlenecks.sort(key=lambda x: (severity_order[x.severity], 
                                       -sum(x.impact_on_performance.values())))
        
        return bottlenecks
    
    def _get_agent_workflows(self, agent_id: str, logs: List[ExecutionLog]) -> List[str]:
        """Get workflows affected by a specific agent"""
        workflows = set()
        for log in logs:
            if log.agent_id == agent_id:
                workflows.add(log.task_type)
        return list(workflows)
    
    def _analyze_tool_usage(self, logs: List[ExecutionLog]) -> Dict[str, Dict[str, Any]]:
        """Analyze tool usage patterns"""
        tool_stats = defaultdict(lambda: {
            "usage_count": 0,
            "error_count": 0,
            "total_duration": 0,
            "affected_workflows": set(),
            "retry_count": 0
        })
        
        for log in logs:
            for tool in log.tools_used:
                stats = tool_stats[tool]
                stats["usage_count"] += 1
                stats["total_duration"] += log.duration_ms
                stats["affected_workflows"].add(log.task_type)
                
                if log.error_details:
                    stats["error_count"] += 1
                if log.retry_count > 0:
                    stats["retry_count"] += log.retry_count
        
        # Calculate derived metrics
        result = {}
        for tool, stats in tool_stats.items():
            result[tool] = {
                "usage_count": stats["usage_count"],
                "error_rate": stats["error_count"] / stats["usage_count"] if stats["usage_count"] > 0 else 0,
                "avg_duration": stats["total_duration"] / stats["usage_count"] if stats["usage_count"] > 0 else 0,
                "affected_workflows": list(stats["affected_workflows"]),
                "retry_count": stats["retry_count"]
            }
        
        return result
    
    def _analyze_communication_patterns(self, logs: List[ExecutionLog]) -> Dict[str, Any]:
        """Analyze communication patterns between agents"""
        # This is a simplified analysis - in a real system, you'd have more detailed communication logs
        communication_actions = []
        for log in logs:
            for action in log.actions:
                if action.get("type") in ["message", "delegate", "coordinate", "respond"]:
                    communication_actions.append({
                        "duration": action.get("duration_ms", 0),
                        "success": action.get("success", True),
                        "workflow": log.task_type
                    })
        
        if not communication_actions:
            return {}
        
        avg_latency = sum(action["duration"] for action in communication_actions) / len(communication_actions)
        high_latency_count = sum(1 for action in communication_actions if action["duration"] > 5000)
        
        return {
            "total_communications": len(communication_actions),
            "avg_communication_latency": avg_latency,
            "high_latency_communications": high_latency_count,
            "affected_workflows": list(set(action["workflow"] for action in communication_actions))
        }
    
    def _analyze_resource_usage(self, logs: List[ExecutionLog]) -> Dict[str, Any]:
        """Analyze resource usage patterns"""
        token_usage = [log.tokens_used.get("total_tokens", 0) for log in logs]
        
        if not token_usage:
            return {}
        
        avg_tokens = sum(token_usage) / len(token_usage)
        high_usage_threshold = avg_tokens * 2
        high_usage_tasks = sum(1 for tokens in token_usage if tokens > high_usage_threshold)
        
        # Estimate excess cost
        excess_tokens = sum(max(0, tokens - avg_tokens) for tokens in token_usage)
        excess_cost = excess_tokens * 0.00002  # Rough estimate
        
        return {
            "avg_token_usage": avg_tokens,
            "high_token_usage_tasks": high_usage_tasks,
            "excess_token_cost": excess_cost,
            "token_processing_overhead": high_usage_tasks * 500,  # Estimated overhead in ms
            "high_usage_workflows": [log.task_type for log in logs 
                                   if log.tokens_used.get("total_tokens", 0) > high_usage_threshold]
        }
    
    def generate_optimization_recommendations(self, 
                                            system_metrics: PerformanceMetrics,
                                            error_analyses: List[ErrorAnalysis],
                                            bottlenecks: List[BottleneckAnalysis]) -> List[OptimizationRecommendation]:
        """Generate optimization recommendations based on analysis"""
        recommendations = []
        
        # Performance optimization recommendations
        if system_metrics.success_rate < 0.9:
            recommendations.append(OptimizationRecommendation(
                category="reliability",
                priority="high",
                title="Improve System Reliability",
                description=f"System success rate is {system_metrics.success_rate:.1%}, below target of 90%",
                implementation_effort="medium",
                expected_impact={
                    "success_rate_improvement": min(0.1, 0.95 - system_metrics.success_rate),
                    "cost_reduction": system_metrics.average_cost_per_task * 0.15
                },
                estimated_cost_savings=system_metrics.total_cost_usd * 0.1,
                estimated_performance_gain=1.2,
                implementation_steps=[
                    "Identify and fix top error patterns",
                    "Implement better error handling and retries",
                    "Add comprehensive monitoring and alerting",
                    "Implement graceful degradation patterns"
                ],
                risks=["Temporary increase in complexity", "Potential initial performance overhead"],
                prerequisites=["Error analysis completion", "Monitoring infrastructure"]
            ))
        
        # Cost optimization recommendations
        if system_metrics.average_cost_per_task > 0.1:
            recommendations.append(OptimizationRecommendation(
                category="cost",
                priority="medium",
                title="Optimize Token Usage and Costs",
                description=f"Average cost per task (${system_metrics.average_cost_per_task:.3f}) is above optimal range",
                implementation_effort="low",
                expected_impact={
                    "cost_reduction": system_metrics.average_cost_per_task * 0.3,
                    "efficiency_improvement": 1.15
                },
                estimated_cost_savings=system_metrics.total_cost_usd * 0.3,
                estimated_performance_gain=1.05,
                implementation_steps=[
                    "Implement prompt optimization",
                    "Add response caching for repeated queries",
                    "Use smaller models for simple tasks",
                    "Implement token usage monitoring and alerts"
                ],
                risks=["Potential quality reduction with smaller models"],
                prerequisites=["Token usage analysis", "Caching infrastructure"]
            ))
        
        # Performance optimization recommendations
        if system_metrics.average_duration_ms > 10000:
            recommendations.append(OptimizationRecommendation(
                category="performance",
                priority="high",
                title="Reduce Task Latency",
                description=f"Average task duration ({system_metrics.average_duration_ms/1000:.1f}s) exceeds target",
                implementation_effort="high",
                expected_impact={
                    "latency_reduction": min(0.5, (system_metrics.average_duration_ms - 5000) / system_metrics.average_duration_ms),
                    "throughput_improvement": 1.5
                },
                estimated_cost_savings=None,
                estimated_performance_gain=1.4,
                implementation_steps=[
                    "Profile and optimize slow operations",
                    "Implement parallel processing where possible",
                    "Add caching for expensive operations",
                    "Optimize API calls and reduce round trips"
                ],
                risks=["Increased system complexity", "Potential resource usage increase"],
                prerequisites=["Performance profiling tools", "Caching infrastructure"]
            ))
        
        # Error-based recommendations
        high_impact_errors = [ea for ea in error_analyses if ea.impact_level == "high"]
        if high_impact_errors:
            for error_analysis in high_impact_errors[:3]:  # Top 3 high impact errors
                recommendations.append(OptimizationRecommendation(
                    category="reliability",
                    priority="high",
                    title=f"Address {error_analysis.error_type.title()} Errors",
                    description=f"{error_analysis.error_type.title()} errors occur in {error_analysis.percentage:.1f}% of cases",
                    implementation_effort="medium",
                    expected_impact={
                        "error_reduction": error_analysis.percentage / 100,
                        "reliability_improvement": 1.1
                    },
                    estimated_cost_savings=system_metrics.total_cost_usd * (error_analysis.percentage / 100) * 0.5,
                    estimated_performance_gain=None,
                    implementation_steps=error_analysis.suggested_fixes,
                    risks=["May require significant code changes"],
                    prerequisites=["Root cause analysis", "Testing framework"]
                ))
        
        # Bottleneck-based recommendations
        critical_bottlenecks = [b for b in bottlenecks if b.severity in ["critical", "high"]]
        for bottleneck in critical_bottlenecks[:2]:  # Top 2 critical bottlenecks
            recommendations.append(OptimizationRecommendation(
                category="performance",
                priority="high" if bottleneck.severity == "critical" else "medium",
                title=f"Address {bottleneck.bottleneck_type.title()} Bottleneck",
                description=bottleneck.description,
                implementation_effort="medium",
                expected_impact=bottleneck.estimated_improvement,
                estimated_cost_savings=None,
                estimated_performance_gain=list(bottleneck.estimated_improvement.values())[0] if bottleneck.estimated_improvement else 1.1,
                implementation_steps=bottleneck.optimization_suggestions,
                risks=["System downtime during implementation", "Potential cascade effects"],
                prerequisites=["Impact assessment", "Rollback plan"]
            ))
        
        # Scalability recommendations
        if system_metrics.throughput_tasks_per_hour < 20:
            recommendations.append(OptimizationRecommendation(
                category="scalability",
                priority="medium",
                title="Improve System Scalability",
                description="Current throughput indicates potential scalability issues",
                implementation_effort="high",
                expected_impact={
                    "throughput_improvement": 2.0,
                    "scalability_headroom": 5.0
                },
                estimated_cost_savings=None,
                estimated_performance_gain=2.0,
                implementation_steps=[
                    "Implement horizontal scaling for agents",
                    "Add load balancing and resource pooling",
                    "Optimize resource allocation algorithms",
                    "Implement auto-scaling policies"
                ],
                risks=["High implementation complexity", "Increased operational overhead"],
                prerequisites=["Infrastructure scaling capability", "Monitoring and metrics"]
            ))
        
        # Sort recommendations by priority and impact
        priority_order = {"high": 0, "medium": 1, "low": 2}
        recommendations.sort(key=lambda x: (
            priority_order[x.priority],
            -x.estimated_performance_gain if x.estimated_performance_gain else 0,
            -x.estimated_cost_savings if x.estimated_cost_savings else 0
        ))
        
        return recommendations
    
    def generate_report(self, logs: List[ExecutionLog]) -> EvaluationReport:
        """Generate complete evaluation report"""
        
        # Calculate system metrics
        system_metrics = self.calculate_performance_metrics(logs)
        
        # Calculate per-agent metrics
        agents = set(log.agent_id for log in logs)
        agent_metrics = {}
        for agent_id in agents:
            agent_logs = [log for log in logs if log.agent_id == agent_id]
            agent_metrics[agent_id] = self.calculate_performance_metrics(agent_logs)
        
        # Calculate per-task-type metrics
        task_types = set(log.task_type for log in logs)
        task_type_metrics = {}
        for task_type in task_types:
            task_logs = [log for log in logs if log.task_type == task_type]
            task_type_metrics[task_type] = self.calculate_performance_metrics(task_logs)
        
        # Analyze tool usage
        tool_usage_analysis = self._analyze_tool_usage(logs)
        
        # Analyze errors
        error_analysis = self.analyze_errors(logs)
        
        # Identify bottlenecks
        bottleneck_analysis = self.identify_bottlenecks(logs, agent_metrics)
        
        # Generate optimization recommendations
        optimization_recommendations = self.generate_optimization_recommendations(
            system_metrics, error_analysis, bottleneck_analysis)
        
        # Generate trends analysis (simplified)
        trends_analysis = self._generate_trends_analysis(logs)
        
        # Generate cost breakdown
        cost_breakdown = self._generate_cost_breakdown(logs, agent_metrics)
        
        # Check SLA compliance
        sla_compliance = self._check_sla_compliance(system_metrics)
        
        # Create summary
        summary = {
            "evaluation_period": {
                "start_time": min(log.start_time for log in logs if log.start_time) if logs else None,
                "end_time": max(log.end_time for log in logs if log.end_time) if logs else None,
                "total_duration_hours": system_metrics.total_tasks / system_metrics.throughput_tasks_per_hour if system_metrics.throughput_tasks_per_hour > 0 else 0
            },
            "overall_health": self._assess_overall_health(system_metrics),
            "key_findings": self._extract_key_findings(system_metrics, error_analysis, bottleneck_analysis),
            "critical_issues": len([b for b in bottleneck_analysis if b.severity == "critical"]),
            "improvement_opportunities": len(optimization_recommendations)
        }
        
        # Create metadata
        metadata = {
            "generated_at": datetime.now().isoformat(),
            "evaluator_version": "1.0",
            "total_logs_processed": len(logs),
            "agents_analyzed": len(agents),
            "task_types_analyzed": len(task_types),
            "analysis_completeness": "full"
        }
        
        return EvaluationReport(
            summary=summary,
            system_metrics=system_metrics,
            agent_metrics=agent_metrics,
            task_type_metrics=task_type_metrics,
            tool_usage_analysis=tool_usage_analysis,
            error_analysis=error_analysis,
            bottleneck_analysis=bottleneck_analysis,
            optimization_recommendations=optimization_recommendations,
            trends_analysis=trends_analysis,
            cost_breakdown=cost_breakdown,
            sla_compliance=sla_compliance,
            metadata=metadata
        )
    
    def _generate_trends_analysis(self, logs: List[ExecutionLog]) -> Dict[str, Any]:
        """Generate trends analysis (simplified version)"""
        # Group logs by time periods (daily)
        daily_metrics = defaultdict(list)
        
        for log in logs:
            if log.start_time:
                try:
                    date = log.start_time.split('T')[0]  # Extract date part
                    daily_metrics[date].append(log)
                except:
                    continue
        
        trends = {}
        if len(daily_metrics) > 1:
            daily_success_rates = {}
            daily_avg_durations = {}
            daily_costs = {}
            
            for date, date_logs in daily_metrics.items():
                if date_logs:
                    metrics = self.calculate_performance_metrics(date_logs)
                    daily_success_rates[date] = metrics.success_rate
                    daily_avg_durations[date] = metrics.average_duration_ms
                    daily_costs[date] = metrics.total_cost_usd
            
            trends = {
                "daily_success_rates": daily_success_rates,
                "daily_avg_durations": daily_avg_durations,
                "daily_costs": daily_costs,
                "trend_direction": {
                    "success_rate": "stable",  # Simplified
                    "duration": "stable",
                    "cost": "stable"
                }
            }
        
        return trends
    
    def _generate_cost_breakdown(self, logs: List[ExecutionLog], 
                                agent_metrics: Dict[str, PerformanceMetrics]) -> Dict[str, Any]:
        """Generate cost breakdown analysis"""
        total_cost = sum(log.cost_usd for log in logs)
        
        # Cost by agent
        agent_costs = {}
        for agent_id, metrics in agent_metrics.items():
            agent_costs[agent_id] = metrics.total_cost_usd
        
        # Cost by task type
        task_type_costs = defaultdict(float)
        for log in logs:
            task_type_costs[log.task_type] += log.cost_usd
        
        # Token cost breakdown
        total_tokens = sum(log.tokens_used.get("total_tokens", 0) for log in logs)
        
        return {
            "total_cost": total_cost,
            "cost_by_agent": dict(agent_costs),
            "cost_by_task_type": dict(task_type_costs),
            "cost_per_token": total_cost / total_tokens if total_tokens > 0 else 0,
            "top_cost_drivers": sorted(task_type_costs.items(), key=lambda x: x[1], reverse=True)[:5]
        }
    
    def _check_sla_compliance(self, metrics: PerformanceMetrics) -> Dict[str, Any]:
        """Check SLA compliance"""
        thresholds = self.performance_thresholds
        
        compliance = {
            "success_rate": {
                "target": 0.95,
                "actual": metrics.success_rate,
                "compliant": metrics.success_rate >= 0.95,
                "gap": max(0, 0.95 - metrics.success_rate)
            },
            "average_latency": {
                "target": 10000,  # 10 seconds
                "actual": metrics.average_duration_ms,
                "compliant": metrics.average_duration_ms <= 10000,
                "gap": max(0, metrics.average_duration_ms - 10000)
            },
            "error_rate": {
                "target": 0.05,  # 5%
                "actual": metrics.error_rate,
                "compliant": metrics.error_rate <= 0.05,
                "gap": max(0, metrics.error_rate - 0.05)
            }
        }
        
        overall_compliance = all(sla["compliant"] for sla in compliance.values())
        
        return {
            "overall_compliant": overall_compliance,
            "sla_details": compliance,
            "compliance_score": sum(1 for sla in compliance.values() if sla["compliant"]) / len(compliance)
        }
    
    def _assess_overall_health(self, metrics: PerformanceMetrics) -> str:
        """Assess overall system health"""
        health_score = 0
        
        # Success rate contribution (40%)
        if metrics.success_rate >= 0.95:
            health_score += 40
        elif metrics.success_rate >= 0.90:
            health_score += 30
        elif metrics.success_rate >= 0.80:
            health_score += 20
        else:
            health_score += 10
        
        # Performance contribution (30%)
        if metrics.average_duration_ms <= 5000:
            health_score += 30
        elif metrics.average_duration_ms <= 10000:
            health_score += 20
        elif metrics.average_duration_ms <= 30000:
            health_score += 15
        else:
            health_score += 5
        
        # Error rate contribution (20%)
        if metrics.error_rate <= 0.02:
            health_score += 20
        elif metrics.error_rate <= 0.05:
            health_score += 15
        elif metrics.error_rate <= 0.10:
            health_score += 10
        else:
            health_score += 0
        
        # Cost efficiency contribution (10%)
        if metrics.cost_per_token <= 0.00005:
            health_score += 10
        elif metrics.cost_per_token <= 0.0001:
            health_score += 7
        else:
            health_score += 3
        
        if health_score >= 85:
            return "excellent"
        elif health_score >= 70:
            return "good"
        elif health_score >= 50:
            return "fair"
        else:
            return "poor"
    
    def _extract_key_findings(self, metrics: PerformanceMetrics, 
                            errors: List[ErrorAnalysis],
                            bottlenecks: List[BottleneckAnalysis]) -> List[str]:
        """Extract key findings from analysis"""
        findings = []
        
        # Performance findings
        if metrics.success_rate < 0.9:
            findings.append(f"Success rate ({metrics.success_rate:.1%}) below target")
        
        if metrics.average_duration_ms > 15000:
            findings.append(f"High average latency ({metrics.average_duration_ms/1000:.1f}s)")
        
        # Error findings
        high_impact_errors = [e for e in errors if e.impact_level == "high"]
        if high_impact_errors:
            findings.append(f"{len(high_impact_errors)} high-impact error patterns identified")
        
        # Bottleneck findings
        critical_bottlenecks = [b for b in bottlenecks if b.severity == "critical"]
        if critical_bottlenecks:
            findings.append(f"{len(critical_bottlenecks)} critical bottlenecks found")
        
        # Cost findings
        if metrics.cost_per_token > 0.0001:
            findings.append("Token usage costs above optimal range")
        
        return findings


def main():
    parser = argparse.ArgumentParser(description="Multi-Agent System Performance Evaluator")
    parser.add_argument("input_file", help="JSON file with execution logs")
    parser.add_argument("-o", "--output", help="Output file prefix (default: evaluation_report)")
    parser.add_argument("--format", choices=["json", "both"], default="both", 
                       help="Output format")
    parser.add_argument("--detailed", action="store_true", 
                       help="Include detailed analysis in output")
    
    args = parser.parse_args()
    
    try:
        # Load execution logs
        with open(args.input_file, 'r') as f:
            logs_data = json.load(f)
        
        # Parse logs
        evaluator = AgentEvaluator()
        logs = evaluator.parse_execution_logs(logs_data.get("execution_logs", []))
        
        if not logs:
            print("No valid execution logs found in input file", file=sys.stderr)
            sys.exit(1)
        
        # Generate evaluation report
        report = evaluator.generate_report(logs)
        
        # Prepare output
        output_data = asdict(report)
        
        # Output files
        output_prefix = args.output or "evaluation_report"
        
        if args.format in ["json", "both"]:
            with open(f"{output_prefix}.json", 'w') as f:
                json.dump(output_data, f, indent=2, default=str)
            print(f"JSON report written to {output_prefix}.json")
        
        if args.format == "both":
            # Generate separate detailed files
            
            # Performance summary
            summary_data = {
                "summary": report.summary,
                "system_metrics": asdict(report.system_metrics),
                "sla_compliance": report.sla_compliance
            }
            with open(f"{output_prefix}_summary.json", 'w') as f:
                json.dump(summary_data, f, indent=2, default=str)
            print(f"Summary report written to {output_prefix}_summary.json")
            
            # Recommendations
            recommendations_data = {
                "optimization_recommendations": [asdict(rec) for rec in report.optimization_recommendations],
                "bottleneck_analysis": [asdict(b) for b in report.bottleneck_analysis]
            }
            with open(f"{output_prefix}_recommendations.json", 'w') as f:
                json.dump(recommendations_data, f, indent=2)
            print(f"Recommendations written to {output_prefix}_recommendations.json")
            
            # Error analysis
            error_data = {
                "error_analysis": [asdict(e) for e in report.error_analysis],
                "error_summary": {
                    "total_errors": sum(e.count for e in report.error_analysis),
                    "high_impact_errors": len([e for e in report.error_analysis if e.impact_level == "high"])
                }
            }
            with open(f"{output_prefix}_errors.json", 'w') as f:
                json.dump(error_data, f, indent=2)
            print(f"Error analysis written to {output_prefix}_errors.json")
        
        # Print executive summary
        print(f"\n{'='*60}")
        print(f"AGENT SYSTEM EVALUATION REPORT")
        print(f"{'='*60}")
        print(f"Overall Health: {report.summary['overall_health'].upper()}")
        print(f"Total Tasks: {report.system_metrics.total_tasks}")
        print(f"Success Rate: {report.system_metrics.success_rate:.1%}")
        print(f"Average Duration: {report.system_metrics.average_duration_ms/1000:.1f}s")
        print(f"Total Cost: ${report.system_metrics.total_cost_usd:.2f}")
        print(f"Agents Analyzed: {len(report.agent_metrics)}")
        
        print(f"\nKey Findings:")
        for finding in report.summary['key_findings']:
            print(f"  • {finding}")
        
        print(f"\nTop Recommendations:")
        high_priority_recs = [r for r in report.optimization_recommendations if r.priority == "high"][:3]
        for i, rec in enumerate(high_priority_recs, 1):
            print(f"  {i}. {rec.title}")
        
        if report.summary['critical_issues'] > 0:
            print(f"\n⚠️  CRITICAL: {report.summary['critical_issues']} critical issues require immediate attention")
        
        print(f"\n📊 Detailed reports available in generated files")
        print(f"{'='*60}")
        
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)


if __name__ == "__main__":
    main()

#!/usr/bin/env python3
"""
Agent Planner - Multi-Agent System Architecture Designer

Given a system description (goal, tasks, constraints, team size), designs a multi-agent
architecture: defines agent roles, responsibilities, capabilities needed, communication
topology, tool requirements. Generates architecture diagram (Mermaid).

Input: system requirements JSON
Output: agent architecture + role definitions + Mermaid diagram + implementation roadmap
"""

import json
import argparse
import sys
from typing import Dict, List, Any, Optional, Tuple
from dataclasses import dataclass, asdict
from enum import Enum


class AgentArchitecturePattern(Enum):
    """Supported agent architecture patterns"""
    SINGLE_AGENT = "single_agent"
    SUPERVISOR = "supervisor"
    SWARM = "swarm"
    HIERARCHICAL = "hierarchical"
    PIPELINE = "pipeline"


class CommunicationPattern(Enum):
    """Agent communication patterns"""
    DIRECT_MESSAGE = "direct_message"
    SHARED_STATE = "shared_state"
    EVENT_DRIVEN = "event_driven"
    MESSAGE_QUEUE = "message_queue"


class AgentRole(Enum):
    """Standard agent role archetypes"""
    COORDINATOR = "coordinator"
    SPECIALIST = "specialist"
    INTERFACE = "interface"
    MONITOR = "monitor"


@dataclass
class Tool:
    """Tool definition for agents"""
    name: str
    description: str
    input_schema: Dict[str, Any]
    output_schema: Dict[str, Any]
    capabilities: List[str]
    reliability: str = "high"  # high, medium, low
    latency: str = "low"       # low, medium, high


@dataclass
class AgentDefinition:
    """Complete agent definition"""
    name: str
    role: str
    archetype: AgentRole
    responsibilities: List[str]
    capabilities: List[str]
    tools: List[Tool]
    communication_interfaces: List[str]
    constraints: Dict[str, Any]
    success_criteria: List[str]
    dependencies: List[str] = None


@dataclass
class CommunicationLink:
    """Communication link between agents"""
    from_agent: str
    to_agent: str
    pattern: CommunicationPattern
    data_format: str
    frequency: str
    criticality: str


@dataclass
class SystemRequirements:
    """Input system requirements"""
    goal: str
    description: str
    tasks: List[str]
    constraints: Dict[str, Any]
    team_size: int
    performance_requirements: Dict[str, Any]
    safety_requirements: List[str]
    integration_requirements: List[str]
    scale_requirements: Dict[str, Any]


@dataclass
class ArchitectureDesign:
    """Complete architecture design output"""
    pattern: AgentArchitecturePattern
    agents: List[AgentDefinition]
    communication_topology: List[CommunicationLink]
    shared_resources: List[Dict[str, Any]]
    guardrails: List[Dict[str, Any]]
    scaling_strategy: Dict[str, Any]
    failure_handling: Dict[str, Any]


class AgentPlanner:
    """Multi-agent system architecture planner"""
    
    def __init__(self):
        self.common_tools = self._define_common_tools()
        self.pattern_heuristics = self._define_pattern_heuristics()
    
    def _define_common_tools(self) -> Dict[str, Tool]:
        """Define commonly used tools across agents"""
        return {
            "web_search": Tool(
                name="web_search",
                description="Search the web for information",
                input_schema={"type": "object", "properties": {"query": {"type": "string"}}},
                output_schema={"type": "object", "properties": {"results": {"type": "array"}}},
                capabilities=["research", "information_gathering"],
                reliability="high",
                latency="medium"
            ),
            "code_executor": Tool(
                name="code_executor",
                description="Execute code in various languages",
                input_schema={"type": "object", "properties": {"language": {"type": "string"}, "code": {"type": "string"}}},
                output_schema={"type": "object", "properties": {"result": {"type": "string"}, "error": {"type": "string"}}},
                capabilities=["code_execution", "testing", "automation"],
                reliability="high",
                latency="low"
            ),
            "file_manager": Tool(
                name="file_manager",
                description="Manage files and directories",
                input_schema={"type": "object", "properties": {"action": {"type": "string"}, "path": {"type": "string"}}},
                output_schema={"type": "object", "properties": {"success": {"type": "boolean"}, "content": {"type": "string"}}},
                capabilities=["file_operations", "data_management"],
                reliability="high",
                latency="low"
            ),
            "data_analyzer": Tool(
                name="data_analyzer",
                description="Analyze and process data",
                input_schema={"type": "object", "properties": {"data": {"type": "object"}, "analysis_type": {"type": "string"}}},
                output_schema={"type": "object", "properties": {"insights": {"type": "array"}, "metrics": {"type": "object"}}},
                capabilities=["data_analysis", "statistics", "visualization"],
                reliability="high",
                latency="medium"
            ),
            "api_client": Tool(
                name="api_client",
                description="Make API calls to external services",
                input_schema={"type": "object", "properties": {"url": {"type": "string"}, "method": {"type": "string"}, "data": {"type": "object"}}},
                output_schema={"type": "object", "properties": {"response": {"type": "object"}, "status": {"type": "integer"}}},
                capabilities=["integration", "external_services"],
                reliability="medium",
                latency="medium"
            )
        }
    
    def _define_pattern_heuristics(self) -> Dict[AgentArchitecturePattern, Dict[str, Any]]:
        """Define heuristics for selecting architecture patterns"""
        return {
            AgentArchitecturePattern.SINGLE_AGENT: {
                "team_size_range": (1, 1),
                "task_complexity": "simple",
                "coordination_overhead": "none",
                "suitable_for": ["simple tasks", "prototyping", "single domain"],
                "scaling_limit": "low"
            },
            AgentArchitecturePattern.SUPERVISOR: {
                "team_size_range": (2, 8),
                "task_complexity": "medium",
                "coordination_overhead": "low",
                "suitable_for": ["hierarchical tasks", "clear delegation", "quality control"],
                "scaling_limit": "medium"
            },
            AgentArchitecturePattern.SWARM: {
                "team_size_range": (3, 20),
                "task_complexity": "high",
                "coordination_overhead": "high",
                "suitable_for": ["parallel processing", "distributed problem solving", "fault tolerance"],
                "scaling_limit": "high"
            },
            AgentArchitecturePattern.HIERARCHICAL: {
                "team_size_range": (5, 50),
                "task_complexity": "very high",
                "coordination_overhead": "medium",
                "suitable_for": ["large organizations", "complex workflows", "enterprise systems"],
                "scaling_limit": "very high"
            },
            AgentArchitecturePattern.PIPELINE: {
                "team_size_range": (3, 15),
                "task_complexity": "medium",
                "coordination_overhead": "low",
                "suitable_for": ["sequential processing", "data pipelines", "assembly line tasks"],
                "scaling_limit": "medium"
            }
        }
    
    def select_architecture_pattern(self, requirements: SystemRequirements) -> AgentArchitecturePattern:
        """Select the most appropriate architecture pattern based on requirements"""
        team_size = requirements.team_size
        task_count = len(requirements.tasks)
        performance_reqs = requirements.performance_requirements
        
        # Score each pattern based on requirements
        pattern_scores = {}
        
        for pattern, heuristics in self.pattern_heuristics.items():
            score = 0
            
            # Team size fit
            min_size, max_size = heuristics["team_size_range"]
            if min_size <= team_size <= max_size:
                score += 3
            elif abs(team_size - min_size) <= 2 or abs(team_size - max_size) <= 2:
                score += 1
            
            # Task complexity assessment
            complexity_indicators = [
                "parallel" in requirements.description.lower(),
                "sequential" in requirements.description.lower(),
                "hierarchical" in requirements.description.lower(),
                "distributed" in requirements.description.lower(),
                task_count > 5,
                len(requirements.constraints) > 3
            ]
            
            complexity_score = sum(complexity_indicators)
            
            if pattern == AgentArchitecturePattern.SINGLE_AGENT and complexity_score <= 2:
                score += 2
            elif pattern == AgentArchitecturePattern.SUPERVISOR and 2 <= complexity_score <= 4:
                score += 2
            elif pattern == AgentArchitecturePattern.PIPELINE and "sequential" in requirements.description.lower():
                score += 3
            elif pattern == AgentArchitecturePattern.SWARM and "parallel" in requirements.description.lower():
                score += 3
            elif pattern == AgentArchitecturePattern.HIERARCHICAL and complexity_score >= 4:
                score += 2
            
            # Performance requirements
            if performance_reqs.get("high_throughput", False) and pattern in [AgentArchitecturePattern.SWARM, AgentArchitecturePattern.PIPELINE]:
                score += 2
            if performance_reqs.get("fault_tolerance", False) and pattern == AgentArchitecturePattern.SWARM:
                score += 2
            if performance_reqs.get("low_latency", False) and pattern in [AgentArchitecturePattern.SINGLE_AGENT, AgentArchitecturePattern.PIPELINE]:
                score += 1
            
            pattern_scores[pattern] = score
        
        # Select the highest scoring pattern
        best_pattern = max(pattern_scores.items(), key=lambda x: x[1])[0]
        return best_pattern
    
    def design_agents(self, requirements: SystemRequirements, pattern: AgentArchitecturePattern) -> List[AgentDefinition]:
        """Design individual agents based on requirements and architecture pattern"""
        agents = []
        
        if pattern == AgentArchitecturePattern.SINGLE_AGENT:
            agents = self._design_single_agent(requirements)
        elif pattern == AgentArchitecturePattern.SUPERVISOR:
            agents = self._design_supervisor_agents(requirements)
        elif pattern == AgentArchitecturePattern.SWARM:
            agents = self._design_swarm_agents(requirements)
        elif pattern == AgentArchitecturePattern.HIERARCHICAL:
            agents = self._design_hierarchical_agents(requirements)
        elif pattern == AgentArchitecturePattern.PIPELINE:
            agents = self._design_pipeline_agents(requirements)
        
        return agents
    
    def _design_single_agent(self, requirements: SystemRequirements) -> List[AgentDefinition]:
        """Design a single general-purpose agent"""
        all_tools = list(self.common_tools.values())
        
        agent = AgentDefinition(
            name="universal_agent",
            role="Universal Task Handler",
            archetype=AgentRole.SPECIALIST,
            responsibilities=requirements.tasks,
            capabilities=["general_purpose", "multi_domain", "adaptable"],
            tools=all_tools,
            communication_interfaces=["direct_user_interface"],
            constraints={
                "max_concurrent_tasks": 1,
                "memory_limit": "high",
                "response_time": "fast"
            },
            success_criteria=["complete all assigned tasks", "maintain quality standards", "respond within time limits"],
            dependencies=[]
        )
        
        return [agent]
    
    def _design_supervisor_agents(self, requirements: SystemRequirements) -> List[AgentDefinition]:
        """Design supervisor pattern agents"""
        agents = []
        
        # Create supervisor agent
        supervisor = AgentDefinition(
            name="supervisor_agent",
            role="Task Coordinator and Quality Controller",
            archetype=AgentRole.COORDINATOR,
            responsibilities=[
                "task_decomposition",
                "delegation",
                "progress_monitoring",
                "quality_assurance",
                "result_aggregation"
            ],
            capabilities=["planning", "coordination", "evaluation", "decision_making"],
            tools=[self.common_tools["file_manager"], self.common_tools["data_analyzer"]],
            communication_interfaces=["user_interface", "agent_messaging"],
            constraints={
                "max_concurrent_supervisions": 5,
                "decision_timeout": "30s"
            },
            success_criteria=["successful task completion", "optimal resource utilization", "quality standards met"],
            dependencies=[]
        )
        agents.append(supervisor)
        
        # Create specialist agents based on task domains
        task_domains = self._identify_task_domains(requirements.tasks)
        for i, domain in enumerate(task_domains[:requirements.team_size - 1]):
            specialist = AgentDefinition(
                name=f"{domain}_specialist",
                role=f"{domain.title()} Specialist",
                archetype=AgentRole.SPECIALIST,
                responsibilities=[task for task in requirements.tasks if domain in task.lower()],
                capabilities=[f"{domain}_expertise", "specialized_tools", "domain_knowledge"],
                tools=self._select_tools_for_domain(domain),
                communication_interfaces=["supervisor_messaging"],
                constraints={
                    "domain_scope": domain,
                    "task_queue_size": 10
                },
                success_criteria=[f"excel in {domain} tasks", "maintain domain expertise", "provide quality output"],
                dependencies=["supervisor_agent"]
            )
            agents.append(specialist)
        
        return agents
    
    def _design_swarm_agents(self, requirements: SystemRequirements) -> List[AgentDefinition]:
        """Design swarm pattern agents"""
        agents = []
        
        # Create peer agents with overlapping capabilities
        agent_count = min(requirements.team_size, 10)  # Reasonable swarm size
        base_capabilities = ["collaboration", "consensus", "adaptation", "peer_communication"]
        
        for i in range(agent_count):
            agent = AgentDefinition(
                name=f"swarm_agent_{i+1}",
                role=f"Collaborative Worker #{i+1}",
                archetype=AgentRole.SPECIALIST,
                responsibilities=requirements.tasks,  # All agents can handle all tasks
                capabilities=base_capabilities + [f"specialization_{i%3}"],  # Some specialization
                tools=list(self.common_tools.values()),
                communication_interfaces=["peer_messaging", "broadcast", "consensus_protocol"],
                constraints={
                    "peer_discovery_timeout": "10s",
                    "consensus_threshold": 0.6,
                    "max_retries": 3
                },
                success_criteria=["contribute to group goals", "maintain peer relationships", "adapt to failures"],
                dependencies=[f"swarm_agent_{j+1}" for j in range(agent_count) if j != i]
            )
            agents.append(agent)
        
        return agents
    
    def _design_hierarchical_agents(self, requirements: SystemRequirements) -> List[AgentDefinition]:
        """Design hierarchical pattern agents"""
        agents = []
        
        # Create management hierarchy
        levels = min(3, requirements.team_size // 3)  # Reasonable hierarchy depth
        agents_per_level = requirements.team_size // levels
        
        # Top level manager
        manager = AgentDefinition(
            name="executive_manager",
            role="Executive Manager",
            archetype=AgentRole.COORDINATOR,
            responsibilities=["strategic_planning", "resource_allocation", "performance_monitoring"],
            capabilities=["leadership", "strategy", "resource_management", "oversight"],
            tools=[self.common_tools["data_analyzer"], self.common_tools["file_manager"]],
            communication_interfaces=["executive_dashboard", "management_messaging"],
            constraints={"management_span": 5, "decision_authority": "high"},
            success_criteria=["achieve system goals", "optimize resource usage", "maintain quality"],
            dependencies=[]
        )
        agents.append(manager)
        
        # Middle managers
        for i in range(agents_per_level - 1):
            middle_manager = AgentDefinition(
                name=f"team_manager_{i+1}",
                role=f"Team Manager #{i+1}",
                archetype=AgentRole.COORDINATOR,
                responsibilities=["team_coordination", "task_distribution", "progress_tracking"],
                capabilities=["team_management", "coordination", "reporting"],
                tools=[self.common_tools["file_manager"]],
                communication_interfaces=["management_messaging", "team_messaging"],
                constraints={"team_size": 3, "reporting_frequency": "hourly"},
                success_criteria=["team performance", "task completion", "team satisfaction"],
                dependencies=["executive_manager"]
            )
            agents.append(middle_manager)
        
        # Workers
        remaining_agents = requirements.team_size - len(agents)
        for i in range(remaining_agents):
            worker = AgentDefinition(
                name=f"worker_agent_{i+1}",
                role=f"Task Worker #{i+1}",
                archetype=AgentRole.SPECIALIST,
                responsibilities=["task_execution", "result_delivery", "status_reporting"],
                capabilities=["task_execution", "specialized_skills", "reliability"],
                tools=self._select_diverse_tools(),
                communication_interfaces=["team_messaging"],
                constraints={"task_focus": "single", "reporting_interval": "30min"},
                success_criteria=["complete assigned tasks", "maintain quality", "meet deadlines"],
                dependencies=[f"team_manager_{(i // 3) + 1}"]
            )
            agents.append(worker)
        
        return agents
    
    def _design_pipeline_agents(self, requirements: SystemRequirements) -> List[AgentDefinition]:
        """Design pipeline pattern agents"""
        agents = []
        
        # Create sequential processing stages
        pipeline_stages = self._identify_pipeline_stages(requirements.tasks)
        
        for i, stage in enumerate(pipeline_stages):
            agent = AgentDefinition(
                name=f"pipeline_stage_{i+1}_{stage}",
                role=f"Pipeline Stage {i+1}: {stage.title()}",
                archetype=AgentRole.SPECIALIST,
                responsibilities=[f"process_{stage}", f"validate_{stage}_output", "handoff_to_next_stage"],
                capabilities=[f"{stage}_processing", "quality_control", "data_transformation"],
                tools=self._select_tools_for_stage(stage),
                communication_interfaces=["pipeline_queue", "stage_messaging"],
                constraints={
                    "processing_order": i + 1,
                    "batch_size": 10,
                    "stage_timeout": "5min"
                },
                success_criteria=[f"successfully process {stage}", "maintain data integrity", "meet throughput targets"],
                dependencies=[f"pipeline_stage_{i}_{pipeline_stages[i-1]}"] if i > 0 else []
            )
            agents.append(agent)
        
        return agents
    
    def _identify_task_domains(self, tasks: List[str]) -> List[str]:
        """Identify distinct domains from task list"""
        domains = []
        domain_keywords = {
            "research": ["research", "search", "find", "investigate", "analyze"],
            "development": ["code", "build", "develop", "implement", "program"],
            "data": ["data", "process", "analyze", "calculate", "compute"],
            "communication": ["write", "send", "message", "communicate", "report"],
            "file": ["file", "document", "save", "load", "manage"]
        }
        
        for domain, keywords in domain_keywords.items():
            if any(keyword in " ".join(tasks).lower() for keyword in keywords):
                domains.append(domain)
        
        return domains[:5]  # Limit to 5 domains
    
    def _identify_pipeline_stages(self, tasks: List[str]) -> List[str]:
        """Identify pipeline stages from task list"""
        # Common pipeline patterns
        common_stages = ["input", "process", "transform", "validate", "output"]
        
        # Try to infer stages from tasks
        stages = []
        task_text = " ".join(tasks).lower()
        
        if "collect" in task_text or "gather" in task_text:
            stages.append("collection")
        if "process" in task_text or "transform" in task_text:
            stages.append("processing")
        if "analyze" in task_text or "evaluate" in task_text:
            stages.append("analysis")
        if "validate" in task_text or "check" in task_text:
            stages.append("validation")
        if "output" in task_text or "deliver" in task_text or "report" in task_text:
            stages.append("output")
        
        # Default to common stages if none identified
        return stages if stages else common_stages[:min(5, len(tasks))]
    
    def _select_tools_for_domain(self, domain: str) -> List[Tool]:
        """Select appropriate tools for a specific domain"""
        domain_tools = {
            "research": [self.common_tools["web_search"], self.common_tools["data_analyzer"]],
            "development": [self.common_tools["code_executor"], self.common_tools["file_manager"]],
            "data": [self.common_tools["data_analyzer"], self.common_tools["file_manager"]],
            "communication": [self.common_tools["api_client"], self.common_tools["file_manager"]],
            "file": [self.common_tools["file_manager"]]
        }
        
        return domain_tools.get(domain, [self.common_tools["api_client"]])
    
    def _select_tools_for_stage(self, stage: str) -> List[Tool]:
        """Select appropriate tools for a pipeline stage"""
        stage_tools = {
            "input": [self.common_tools["api_client"], self.common_tools["file_manager"]],
            "collection": [self.common_tools["web_search"], self.common_tools["api_client"]],
            "process": [self.common_tools["code_executor"], self.common_tools["data_analyzer"]],
            "processing": [self.common_tools["data_analyzer"], self.common_tools["code_executor"]],
            "transform": [self.common_tools["data_analyzer"], self.common_tools["code_executor"]],
            "analysis": [self.common_tools["data_analyzer"]],
            "validate": [self.common_tools["data_analyzer"]],
            "validation": [self.common_tools["data_analyzer"]],
            "output": [self.common_tools["file_manager"], self.common_tools["api_client"]]
        }
        
        return stage_tools.get(stage, [self.common_tools["file_manager"]])
    
    def _select_diverse_tools(self) -> List[Tool]:
        """Select a diverse set of tools for general purpose agents"""
        return [
            self.common_tools["file_manager"],
            self.common_tools["code_executor"],
            self.common_tools["data_analyzer"]
        ]
    
    def design_communication_topology(self, agents: List[AgentDefinition], pattern: AgentArchitecturePattern) -> List[CommunicationLink]:
        """Design communication links between agents"""
        links = []
        
        if pattern == AgentArchitecturePattern.SINGLE_AGENT:
            # No inter-agent communication needed
            return []
        
        elif pattern == AgentArchitecturePattern.SUPERVISOR:
            supervisor = next(agent for agent in agents if agent.archetype == AgentRole.COORDINATOR)
            specialists = [agent for agent in agents if agent.archetype == AgentRole.SPECIALIST]
            
            for specialist in specialists:
                # Bidirectional communication with supervisor
                links.append(CommunicationLink(
                    from_agent=supervisor.name,
                    to_agent=specialist.name,
                    pattern=CommunicationPattern.DIRECT_MESSAGE,
                    data_format="json",
                    frequency="on_demand",
                    criticality="high"
                ))
                links.append(CommunicationLink(
                    from_agent=specialist.name,
                    to_agent=supervisor.name,
                    pattern=CommunicationPattern.DIRECT_MESSAGE,
                    data_format="json",
                    frequency="on_completion",
                    criticality="high"
                ))
        
        elif pattern == AgentArchitecturePattern.SWARM:
            # All-to-all communication for swarm
            for i, agent1 in enumerate(agents):
                for j, agent2 in enumerate(agents):
                    if i != j:
                        links.append(CommunicationLink(
                            from_agent=agent1.name,
                            to_agent=agent2.name,
                            pattern=CommunicationPattern.EVENT_DRIVEN,
                            data_format="json",
                            frequency="periodic",
                            criticality="medium"
                        ))
        
        elif pattern == AgentArchitecturePattern.HIERARCHICAL:
            # Hierarchical communication based on dependencies
            for agent in agents:
                if agent.dependencies:
                    for dependency in agent.dependencies:
                        links.append(CommunicationLink(
                            from_agent=dependency,
                            to_agent=agent.name,
                            pattern=CommunicationPattern.DIRECT_MESSAGE,
                            data_format="json",
                            frequency="scheduled",
                            criticality="high"
                        ))
                        links.append(CommunicationLink(
                            from_agent=agent.name,
                            to_agent=dependency,
                            pattern=CommunicationPattern.DIRECT_MESSAGE,
                            data_format="json",
                            frequency="on_completion",
                            criticality="high"
                        ))
        
        elif pattern == AgentArchitecturePattern.PIPELINE:
            # Sequential pipeline communication
            for i in range(len(agents) - 1):
                links.append(CommunicationLink(
                    from_agent=agents[i].name,
                    to_agent=agents[i + 1].name,
                    pattern=CommunicationPattern.MESSAGE_QUEUE,
                    data_format="json",
                    frequency="continuous",
                    criticality="high"
                ))
        
        return links
    
    def generate_mermaid_diagram(self, design: ArchitectureDesign) -> str:
        """Generate Mermaid diagram for the architecture"""
        diagram = ["graph TD"]
        
        # Add agent nodes
        for agent in design.agents:
            node_style = self._get_node_style(agent.archetype)
            diagram.append(f"    {agent.name}[{agent.role}]{node_style}")
        
        # Add communication links
        for link in design.communication_topology:
            arrow_style = self._get_arrow_style(link.pattern, link.criticality)
            diagram.append(f"    {link.from_agent} {arrow_style} {link.to_agent}")
        
        # Add styling
        diagram.extend([
            "",
            "    classDef coordinator fill:#e1f5fe,stroke:#01579b,stroke-width:2px",
            "    classDef specialist fill:#f3e5f5,stroke:#4a148c,stroke-width:2px",
            "    classDef interface fill:#e8f5e8,stroke:#1b5e20,stroke-width:2px",
            "    classDef monitor fill:#fff3e0,stroke:#e65100,stroke-width:2px"
        ])
        
        # Apply classes to nodes
        for agent in design.agents:
            class_name = agent.archetype.value
            diagram.append(f"    class {agent.name} {class_name}")
        
        return "\n".join(diagram)
    
    def _get_node_style(self, archetype: AgentRole) -> str:
        """Get node styling based on archetype"""
        styles = {
            AgentRole.COORDINATOR: ":::coordinator",
            AgentRole.SPECIALIST: ":::specialist", 
            AgentRole.INTERFACE: ":::interface",
            AgentRole.MONITOR: ":::monitor"
        }
        return styles.get(archetype, "")
    
    def _get_arrow_style(self, pattern: CommunicationPattern, criticality: str) -> str:
        """Get arrow styling based on communication pattern and criticality"""
        base_arrows = {
            CommunicationPattern.DIRECT_MESSAGE: "-->",
            CommunicationPattern.SHARED_STATE: "-.->",
            CommunicationPattern.EVENT_DRIVEN: "===>",
            CommunicationPattern.MESSAGE_QUEUE: "==="
        }
        
        arrow = base_arrows.get(pattern, "-->")
        
        # Modify for criticality
        if criticality == "high":
            return arrow
        elif criticality == "medium":
            return arrow.replace("-", ".")
        else:
            return arrow.replace("-", ":")
    
    def generate_implementation_roadmap(self, design: ArchitectureDesign, requirements: SystemRequirements) -> Dict[str, Any]:
        """Generate implementation roadmap"""
        phases = []
        
        # Phase 1: Core Infrastructure
        phases.append({
            "phase": 1,
            "name": "Core Infrastructure",
            "duration": "2-3 weeks",
            "tasks": [
                "Set up development environment",
                "Implement basic agent framework",
                "Create communication infrastructure",
                "Set up monitoring and logging",
                "Implement basic tools"
            ],
            "deliverables": [
                "Agent runtime framework",
                "Communication layer",
                "Basic monitoring dashboard"
            ]
        })
        
        # Phase 2: Agent Implementation
        phases.append({
            "phase": 2,
            "name": "Agent Implementation",
            "duration": "3-4 weeks",
            "tasks": [
                "Implement individual agent logic",
                "Create agent-specific tools",
                "Implement communication protocols",
                "Add error handling and recovery",
                "Create agent configuration system"
            ],
            "deliverables": [
                "Functional agent implementations",
                "Tool integration",
                "Configuration management"
            ]
        })
        
        # Phase 3: Integration and Testing
        phases.append({
            "phase": 3,
            "name": "Integration and Testing",
            "duration": "2-3 weeks",
            "tasks": [
                "Integrate all agents",
                "End-to-end testing",
                "Performance optimization",
                "Security implementation",
                "Documentation creation"
            ],
            "deliverables": [
                "Integrated system",
                "Test suite",
                "Performance benchmarks",
                "Security audit report"
            ]
        })
        
        # Phase 4: Deployment and Monitoring
        phases.append({
            "phase": 4,
            "name": "Deployment and Monitoring",
            "duration": "1-2 weeks",
            "tasks": [
                "Production deployment",
                "Monitoring setup",
                "Alerting configuration",
                "User training",
                "Go-live support"
            ],
            "deliverables": [
                "Production system",
                "Monitoring dashboard",
                "Operational runbooks",
                "Training materials"
            ]
        })
        
        return {
            "total_duration": "8-12 weeks",
            "phases": phases,
            "critical_path": [
                "Agent framework implementation",
                "Communication layer development", 
                "Integration testing",
                "Production deployment"
            ],
            "risks": [
                {
                    "risk": "Communication complexity",
                    "impact": "high",
                    "mitigation": "Start with simple protocols, iterate"
                },
                {
                    "risk": "Agent coordination failures",
                    "impact": "medium",
                    "mitigation": "Implement robust error handling and fallbacks"
                },
                {
                    "risk": "Performance bottlenecks",
                    "impact": "medium",
                    "mitigation": "Early performance testing and optimization"
                }
            ],
            "success_criteria": requirements.safety_requirements + [
                "All agents operational",
                "Communication working reliably",
                "Performance targets met",
                "Error rate below 1%"
            ]
        }
    
    def plan_system(self, requirements: SystemRequirements) -> Tuple[ArchitectureDesign, str, Dict[str, Any]]:
        """Main planning function"""
        # Select architecture pattern
        pattern = self.select_architecture_pattern(requirements)
        
        # Design agents
        agents = self.design_agents(requirements, pattern)
        
        # Design communication topology
        communication_topology = self.design_communication_topology(agents, pattern)
        
        # Create complete design
        design = ArchitectureDesign(
            pattern=pattern,
            agents=agents,
            communication_topology=communication_topology,
            shared_resources=[
                {"type": "message_queue", "capacity": 1000},
                {"type": "shared_memory", "size": "1GB"},
                {"type": "event_store", "retention": "30 days"}
            ],
            guardrails=[
                {"type": "input_validation", "rules": "strict_schema_enforcement"},
                {"type": "rate_limiting", "limit": "100_requests_per_minute"},
                {"type": "output_filtering", "rules": "content_safety_check"}
            ],
            scaling_strategy={
                "horizontal_scaling": True,
                "auto_scaling_triggers": ["cpu > 80%", "queue_depth > 100"],
                "max_instances_per_agent": 5
            },
            failure_handling={
                "retry_policy": "exponential_backoff",
                "circuit_breaker": True,
                "fallback_strategies": ["graceful_degradation", "human_escalation"]
            }
        )
        
        # Generate Mermaid diagram
        mermaid_diagram = self.generate_mermaid_diagram(design)
        
        # Generate implementation roadmap
        roadmap = self.generate_implementation_roadmap(design, requirements)
        
        return design, mermaid_diagram, roadmap


def main():
    parser = argparse.ArgumentParser(description="Multi-Agent System Architecture Planner")
    parser.add_argument("input_file", help="JSON file with system requirements")
    parser.add_argument("-o", "--output", help="Output file prefix (default: agent_architecture)")
    parser.add_argument("--format", choices=["json", "yaml", "both"], default="both", 
                       help="Output format")
    
    args = parser.parse_args()
    
    try:
        # Load requirements
        with open(args.input_file, 'r') as f:
            requirements_data = json.load(f)
        
        requirements = SystemRequirements(**requirements_data)
        
        # Plan the system
        planner = AgentPlanner()
        design, mermaid_diagram, roadmap = planner.plan_system(requirements)
        
        # Prepare output
        output_data = {
            "architecture_design": asdict(design),
            "mermaid_diagram": mermaid_diagram,
            "implementation_roadmap": roadmap,
            "metadata": {
                "generated_by": "agent_planner.py",
                "requirements_file": args.input_file,
                "architecture_pattern": design.pattern.value,
                "agent_count": len(design.agents)
            }
        }
        
        # Output files
        output_prefix = args.output or "agent_architecture"
        
        if args.format in ["json", "both"]:
            with open(f"{output_prefix}.json", 'w') as f:
                json.dump(output_data, f, indent=2, default=str)
            print(f"JSON output written to {output_prefix}.json")
        
        if args.format in ["both"]:
            # Also create separate files for key components
            with open(f"{output_prefix}_diagram.mmd", 'w') as f:
                f.write(mermaid_diagram)
            print(f"Mermaid diagram written to {output_prefix}_diagram.mmd")
            
            with open(f"{output_prefix}_roadmap.json", 'w') as f:
                json.dump(roadmap, f, indent=2)
            print(f"Implementation roadmap written to {output_prefix}_roadmap.json")
        
        # Print summary
        print(f"\nArchitecture Summary:")
        print(f"Pattern: {design.pattern.value}")
        print(f"Agents: {len(design.agents)}")
        print(f"Communication Links: {len(design.communication_topology)}")
        print(f"Estimated Duration: {roadmap['total_duration']}")
        
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        sys.exit(1)


if __name__ == "__main__":
    main()

{
  "execution_logs": [
    {
      "task_id": "task_001",
      "agent_id": "research_agent_1",
      "task_type": "web_research",
      "task_description": "Research recent developments in artificial intelligence",
      "start_time": "2024-01-15T09:00:00Z",
      "end_time": "2024-01-15T09:02:34Z",
      "duration_ms": 154000,
      "status": "success",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 2300,
          "success": true,
          "parameters": {
            "query": "artificial intelligence developments 2024",
            "limit": 10
          }
        },
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 2100,
          "success": true,
          "parameters": {
            "query": "machine learning breakthroughs recent",
            "limit": 5
          }
        },
        {
          "type": "analysis",
          "description": "Synthesize search results",
          "duration_ms": 149600,
          "success": true
        }
      ],
      "results": {
        "summary": "Found 15 relevant sources covering recent AI developments including GPT-4 improvements, autonomous vehicle progress, and medical AI applications.",
        "sources_found": 15,
        "quality_score": 0.92
      },
      "tokens_used": {
        "input_tokens": 1250,
        "output_tokens": 2800,
        "total_tokens": 4050
      },
      "cost_usd": 0.081,
      "error_details": null,
      "tools_used": ["web_search"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_123",
        "session_id": "session_abc",
        "request_priority": "normal"
      }
    },
    {
      "task_id": "task_002",
      "agent_id": "data_agent_1",
      "task_type": "data_analysis",
      "task_description": "Analyze sales performance data for Q4 2023",
      "start_time": "2024-01-15T09:05:00Z",
      "end_time": "2024-01-15T09:07:45Z",
      "duration_ms": 165000,
      "status": "success",
      "actions": [
        {
          "type": "data_ingestion",
          "description": "Load Q4 sales data",
          "duration_ms": 5000,
          "success": true
        },
        {
          "type": "tool_call",
          "tool_name": "data_analyzer",
          "duration_ms": 155000,
          "success": true,
          "parameters": {
            "analysis_type": "descriptive",
            "target_column": "revenue"
          }
        },
        {
          "type": "visualization",
          "description": "Generate charts and graphs",
          "duration_ms": 5000,
          "success": true
        }
      ],
      "results": {
        "insights": [
          "Revenue increased by 15% compared to Q3",
          "December was the strongest month",
          "Product category A led growth"
        ],
        "charts_generated": 4,
        "quality_score": 0.88
      },
      "tokens_used": {
        "input_tokens": 3200,
        "output_tokens": 1800,
        "total_tokens": 5000
      },
      "cost_usd": 0.095,
      "error_details": null,
      "tools_used": ["data_analyzer"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_456",
        "session_id": "session_def",
        "request_priority": "high"
      }
    },
    {
      "task_id": "task_003",
      "agent_id": "document_agent_1",
      "task_type": "document_processing",
      "task_description": "Extract key information from research paper PDF",
      "start_time": "2024-01-15T09:10:00Z",
      "end_time": "2024-01-15T09:12:20Z",
      "duration_ms": 140000,
      "status": "partial",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "document_processor",
          "duration_ms": 135000,
          "success": true,
          "parameters": {
            "document_url": "https://example.com/research.pdf",
            "processing_mode": "key_points"
          }
        },
        {
          "type": "validation",
          "description": "Validate extracted content",
          "duration_ms": 5000,
          "success": false,
          "error": "Content validation failed - missing abstract"
        }
      ],
      "results": {
        "extracted_content": "Partial content extracted successfully",
        "pages_processed": 12,
        "validation_issues": ["Missing abstract section"],
        "quality_score": 0.65
      },
      "tokens_used": {
        "input_tokens": 5400,
        "output_tokens": 3200,
        "total_tokens": 8600
      },
      "cost_usd": 0.172,
      "error_details": {
        "error_type": "validation_error",
        "error_message": "Document structure validation failed",
        "affected_section": "abstract"
      },
      "tools_used": ["document_processor"],
      "retry_count": 1,
      "metadata": {
        "user_id": "user_789",
        "session_id": "session_ghi",
        "request_priority": "normal"
      }
    },
    {
      "task_id": "task_004",
      "agent_id": "communication_agent_1",
      "task_type": "notification",
      "task_description": "Send completion notification to project stakeholders",
      "start_time": "2024-01-15T09:15:00Z",
      "end_time": "2024-01-15T09:15:08Z",
      "duration_ms": 8000,
      "status": "success",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "notification_sender",
          "duration_ms": 7500,
          "success": true,
          "parameters": {
            "recipients": ["manager@example.com", "team@example.com"],
            "message": "Project analysis completed successfully",
            "channel": "email"
          }
        }
      ],
      "results": {
        "notifications_sent": 2,
        "delivery_confirmations": 2,
        "quality_score": 1.0
      },
      "tokens_used": {
        "input_tokens": 200,
        "output_tokens": 150,
        "total_tokens": 350
      },
      "cost_usd": 0.007,
      "error_details": null,
      "tools_used": ["notification_sender"],
      "retry_count": 0,
      "metadata": {
        "user_id": "system",
        "session_id": "session_jkl",
        "request_priority": "normal"
      }
    },
    {
      "task_id": "task_005",
      "agent_id": "research_agent_2",
      "task_type": "web_research",
      "task_description": "Research competitive landscape analysis",
      "start_time": "2024-01-15T09:20:00Z",
      "end_time": "2024-01-15T09:25:30Z",
      "duration_ms": 330000,
      "status": "failure",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 2800,
          "success": true,
          "parameters": {
            "query": "competitive analysis software industry",
            "limit": 15
          }
        },
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 30000,
          "success": false,
          "error": "Rate limit exceeded"
        },
        {
          "type": "retry",
          "description": "Wait and retry search",
          "duration_ms": 60000,
          "success": false
        },
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 30000,
          "success": false,
          "error": "Service timeout"
        }
      ],
      "results": {
        "partial_results": "Initial search completed, subsequent searches failed",
        "sources_found": 8,
        "quality_score": 0.3
      },
      "tokens_used": {
        "input_tokens": 800,
        "output_tokens": 400,
        "total_tokens": 1200
      },
      "cost_usd": 0.024,
      "error_details": {
        "error_type": "service_timeout",
        "error_message": "Web search service exceeded timeout limit",
        "retry_attempts": 2
      },
      "tools_used": ["web_search"],
      "retry_count": 2,
      "metadata": {
        "user_id": "user_101",
        "session_id": "session_mno",
        "request_priority": "high"
      }
    },
    {
      "task_id": "task_006",
      "agent_id": "scheduler_agent_1",
      "task_type": "task_scheduling",
      "task_description": "Schedule weekly report generation",
      "start_time": "2024-01-15T09:30:00Z",
      "end_time": "2024-01-15T09:30:15Z",
      "duration_ms": 15000,
      "status": "success",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "task_scheduler",
          "duration_ms": 12000,
          "success": true,
          "parameters": {
            "task_definition": {
              "action": "generate_report",
              "parameters": {"report_type": "weekly_summary"}
            },
            "schedule": {
              "type": "recurring",
              "recurrence_pattern": "weekly"
            }
          }
        },
        {
          "type": "validation",
          "description": "Verify schedule creation",
          "duration_ms": 3000,
          "success": true
        }
      ],
      "results": {
        "task_scheduled": true,
        "next_execution": "2024-01-22T09:30:00Z",
        "schedule_id": "sched_789",
        "quality_score": 1.0
      },
      "tokens_used": {
        "input_tokens": 300,
        "output_tokens": 200,
        "total_tokens": 500
      },
      "cost_usd": 0.01,
      "error_details": null,
      "tools_used": ["task_scheduler"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_202",
        "session_id": "session_pqr",
        "request_priority": "low"
      }
    },
    {
      "task_id": "task_007",
      "agent_id": "data_agent_2",
      "task_type": "data_analysis",
      "task_description": "Analyze customer satisfaction survey results",
      "start_time": "2024-01-15T10:00:00Z",
      "end_time": "2024-01-15T10:04:25Z",
      "duration_ms": 265000,
      "status": "timeout",
      "actions": [
        {
          "type": "data_ingestion",
          "description": "Load survey response data",
          "duration_ms": 15000,
          "success": true
        },
        {
          "type": "tool_call",
          "tool_name": "data_analyzer",
          "duration_ms": 250000,
          "success": false,
          "error": "Operation timeout after 250 seconds"
        }
      ],
      "results": {
        "partial_analysis": "Data loaded but analysis incomplete",
        "records_processed": 5000,
        "total_records": 15000,
        "quality_score": 0.2
      },
      "tokens_used": {
        "input_tokens": 8000,
        "output_tokens": 1000,
        "total_tokens": 9000
      },
      "cost_usd": 0.18,
      "error_details": {
        "error_type": "timeout",
        "error_message": "Data analysis operation exceeded maximum allowed time",
        "timeout_limit_ms": 250000
      },
      "tools_used": ["data_analyzer"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_303",
        "session_id": "session_stu",
        "request_priority": "normal"
      }
    },
    {
      "task_id": "task_008",
      "agent_id": "research_agent_1",
      "task_type": "web_research", 
      "task_description": "Research industry best practices for remote work",
      "start_time": "2024-01-15T10:30:00Z",
      "end_time": "2024-01-15T10:33:15Z",
      "duration_ms": 195000,
      "status": "success",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "web_search",
          "duration_ms": 2200,
          "success": true,
          "parameters": {
            "query": "remote work best practices 2024",
            "limit": 12
          }
        },
        {
          "type": "tool_call", 
          "tool_name": "web_search",
          "duration_ms": 2400,
          "success": true,
          "parameters": {
            "query": "hybrid work policies companies",
            "limit": 8
          }
        },
        {
          "type": "content_synthesis",
          "description": "Synthesize findings from multiple sources",
          "duration_ms": 190400,
          "success": true
        }
      ],
      "results": {
        "comprehensive_report": "Detailed analysis of remote work best practices with industry examples",
        "sources_analyzed": 20,
        "key_insights": 8,
        "quality_score": 0.94
      },
      "tokens_used": {
        "input_tokens": 2800,
        "output_tokens": 4200,
        "total_tokens": 7000
      },
      "cost_usd": 0.14,
      "error_details": null,
      "tools_used": ["web_search"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_404",
        "session_id": "session_vwx", 
        "request_priority": "normal"
      }
    },
    {
      "task_id": "task_009",
      "agent_id": "document_agent_2",
      "task_type": "document_processing",
      "task_description": "Process and summarize quarterly financial report",
      "start_time": "2024-01-15T11:00:00Z",
      "end_time": "2024-01-15T11:02:30Z",
      "duration_ms": 150000,
      "status": "success",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "document_processor",
          "duration_ms": 145000,
          "success": true,
          "parameters": {
            "document_url": "https://example.com/q4-financial-report.pdf",
            "processing_mode": "summary",
            "output_format": "json"
          }
        },
        {
          "type": "quality_check",
          "description": "Validate summary completeness",
          "duration_ms": 5000,
          "success": true
        }
      ],
      "results": {
        "executive_summary": "Q4 revenue grew 12% YoY with strong performance in all segments",
        "key_metrics_extracted": 15,
        "summary_length": 500,
        "quality_score": 0.91
      },
      "tokens_used": {
        "input_tokens": 6500,
        "output_tokens": 2200,
        "total_tokens": 8700
      },
      "cost_usd": 0.174,
      "error_details": null,
      "tools_used": ["document_processor"],
      "retry_count": 0,
      "metadata": {
        "user_id": "user_505",
        "session_id": "session_yzA",
        "request_priority": "high"
      }
    },
    {
      "task_id": "task_010",
      "agent_id": "communication_agent_2",
      "task_type": "notification",
      "task_description": "Send urgent system maintenance notification",
      "start_time": "2024-01-15T11:30:00Z",
      "end_time": "2024-01-15T11:30:45Z",
      "duration_ms": 45000,
      "status": "failure",
      "actions": [
        {
          "type": "tool_call",
          "tool_name": "notification_sender",
          "duration_ms": 30000,
          "success": false,
          "error": "Authentication failed - invalid API key",
          "parameters": {
            "recipients": ["all-users@example.com"],
            "message": "Scheduled maintenance tonight 11 PM - 2 AM",
            "channel": "email",
            "priority": "urgent"
          }
        },
        {
          "type": "retry",
          "description": "Retry with backup credentials",
          "duration_ms": 15000,
          "success": false,
          "error": "Backup authentication also failed"
        }
      ],
      "results": {
        "notifications_sent": 0,
        "delivery_failures": 1,
        "quality_score": 0.0
      },
      "tokens_used": {
        "input_tokens": 150,
        "output_tokens": 50,
        "total_tokens": 200
      },
      "cost_usd": 0.004,
      "error_details": {
        "error_type": "authentication_error",
        "error_message": "Failed to authenticate with notification service",
        "retry_attempts": 1
      },
      "tools_used": ["notification_sender"],
      "retry_count": 1,
      "metadata": {
        "user_id": "system",
        "session_id": "session_BcD",
        "request_priority": "urgent"
      }
    }
  ]
}

{
  "goal": "Build a comprehensive research and analysis platform that can gather information from multiple sources, analyze data, and generate detailed reports",
  "description": "The system needs to handle complex research tasks involving web searches, data analysis, document processing, and collaborative report generation. It should be able to coordinate multiple specialists working in parallel while maintaining quality control and ensuring comprehensive coverage of research topics.",
  "tasks": [
    "Conduct multi-source web research on specified topics",
    "Analyze and synthesize information from various sources",
    "Perform data processing and statistical analysis",
    "Generate visualizations and charts from data",
    "Create comprehensive written reports",
    "Fact-check and validate information accuracy",
    "Coordinate parallel research streams",
    "Handle real-time information updates",
    "Manage research project timelines",
    "Provide interactive research assistance"
  ],
  "constraints": {
    "max_response_time": 30000,
    "budget_per_task": 1.0,
    "quality_threshold": 0.9,
    "concurrent_tasks": 10,
    "data_retention_days": 90,
    "security_level": "standard",
    "compliance_requirements": ["GDPR", "data_minimization"]
  },
  "team_size": 6,
  "performance_requirements": {
    "high_throughput": true,
    "fault_tolerance": true,
    "low_latency": false,
    "scalability": "medium",
    "availability": 0.99
  },
  "safety_requirements": [
    "Input validation and sanitization",
    "Output content filtering",
    "Rate limiting for external APIs",
    "Error handling and graceful degradation",
    "Human oversight for critical decisions",
    "Audit logging for all operations"
  ],
  "integration_requirements": [
    "REST API endpoints for external systems",
    "Webhook support for real-time updates",
    "Database integration for data persistence",
    "File storage for documents and media",
    "Email notifications for important events",
    "Dashboard for monitoring and control"
  ],
  "scale_requirements": {
    "initial_users": 50,
    "peak_concurrent_users": 200,
    "data_volume_gb": 100,
    "requests_per_hour": 1000,
    "geographic_regions": ["US", "EU"],
    "growth_projection": "50% per year"
  }
}

{
  "tools": [
    {
      "name": "web_search",
      "purpose": "Search the web for information on specified topics with customizable filters and result limits",
      "category": "search",
      "inputs": [
        {
          "name": "query",
          "type": "string",
          "description": "Search query string to find relevant information",
          "required": true,
          "min_length": 1,
          "max_length": 500,
          "examples": ["artificial intelligence trends", "climate change impact", "python programming tutorial"]
        },
        {
          "name": "limit",
          "type": "integer",
          "description": "Maximum number of search results to return",
          "required": false,
          "default": 10,
          "minimum": 1,
          "maximum": 100
        },
        {
          "name": "language",
          "type": "string",
          "description": "Language code for search results",
          "required": false,
          "default": "en",
          "enum": ["en", "es", "fr", "de", "it", "pt", "zh", "ja"]
        },
        {
          "name": "time_range",
          "type": "string",
          "description": "Time range filter for search results",
          "required": false,
          "enum": ["any", "day", "week", "month", "year"]
        }
      ],
      "outputs": [
        {
          "name": "results",
          "type": "array",
          "description": "Array of search result objects",
          "items": {
            "type": "object",
            "properties": {
              "title": {"type": "string"},
              "url": {"type": "string"},
              "snippet": {"type": "string"},
              "relevance_score": {"type": "number"}
            }
          }
        },
        {
          "name": "total_found",
          "type": "integer",
          "description": "Total number of results available"
        }
      ],
      "error_conditions": [
        "Invalid query format",
        "Network timeout",
        "API rate limit exceeded",
        "No results found",
        "Service unavailable"
      ],
      "side_effects": [
        "Logs search query for analytics",
        "May cache results temporarily"
      ],
      "idempotent": true,
      "rate_limits": {
        "requests_per_minute": 60,
        "requests_per_hour": 1000,
        "burst_limit": 10
      },
      "dependencies": [
        "search_api_service",
        "content_filter_service"
      ],
      "examples": [
        {
          "description": "Basic web search",
          "input": {
            "query": "machine learning algorithms",
            "limit": 5
          },
          "expected_output": {
            "results": [
              {
                "title": "Introduction to Machine Learning Algorithms",
                "url": "https://example.com/ml-intro",
                "snippet": "Machine learning algorithms are computational methods...",
                "relevance_score": 0.95
              }
            ],
            "total_found": 1250
          }
        }
      ],
      "security_requirements": [
        "Query sanitization",
        "Rate limiting by user",
        "Content filtering"
      ]
    },
    {
      "name": "data_analyzer",
      "purpose": "Analyze structured data and generate statistical insights, trends, and visualizations",
      "category": "data",
      "inputs": [
        {
          "name": "data",
          "type": "object",
          "description": "Structured data to analyze in JSON format",
          "required": true,
          "properties": {
            "columns": {"type": "array"},
            "rows": {"type": "array"}
          }
        },
        {
          "name": "analysis_type",
          "type": "string",
          "description": "Type of analysis to perform",
          "required": true,
          "enum": ["descriptive", "correlation", "trend", "distribution", "outlier_detection"]
        },
        {
          "name": "target_column",
          "type": "string",
          "description": "Primary column to focus analysis on",
          "required": false
        },
        {
          "name": "include_visualization",
          "type": "boolean",
          "description": "Whether to generate visualization data",
          "required": false,
          "default": true
        }
      ],
      "outputs": [
        {
          "name": "insights",
          "type": "array",
          "description": "Array of analytical insights and findings"
        },
        {
          "name": "statistics",
          "type": "object",
          "description": "Statistical measures and metrics"
        },
        {
          "name": "visualization_data",
          "type": "object",
          "description": "Data formatted for visualization creation"
        }
      ],
      "error_conditions": [
        "Invalid data format",
        "Insufficient data points",
        "Missing required columns",
        "Data type mismatch",
        "Analysis timeout"
      ],
      "side_effects": [
        "May create temporary analysis files",
        "Logs analysis parameters for optimization"
      ],
      "idempotent": true,
      "rate_limits": {
        "requests_per_minute": 30,
        "requests_per_hour": 500,
        "burst_limit": 5
      },
      "dependencies": [
        "statistics_engine",
        "visualization_service"
      ],
      "examples": [
        {
          "description": "Basic descriptive analysis",
          "input": {
            "data": {
              "columns": ["age", "salary", "department"],
              "rows": [
                [25, 50000, "engineering"],
                [30, 60000, "engineering"],
                [28, 55000, "marketing"]
              ]
            },
            "analysis_type": "descriptive",
            "target_column": "salary"
          },
          "expected_output": {
            "insights": [
              "Average salary is $55,000",
              "Salary range: $50,000 - $60,000",
              "Engineering department has higher average salary"
            ],
            "statistics": {
              "mean": 55000,
              "median": 55000,
              "std_dev": 5000
            }
          }
        }
      ],
      "security_requirements": [
        "Data anonymization",
        "Access control validation"
      ]
    },
    {
      "name": "document_processor",
      "purpose": "Process and extract information from various document formats including PDFs, Word docs, and plain text",
      "category": "file",
      "inputs": [
        {
          "name": "document_url",
          "type": "string",
          "description": "URL or path to the document to process",
          "required": true,
          "pattern": "^(https?://|file://|/)"
        },
        {
          "name": "processing_mode",
          "type": "string",
          "description": "How to process the document",
          "required": false,
          "default": "full_text",
          "enum": ["full_text", "summary", "key_points", "metadata_only"]
        },
        {
          "name": "output_format",
          "type": "string",
          "description": "Desired output format",
          "required": false,
          "default": "json",
          "enum": ["json", "markdown", "plain_text"]
        },
        {
          "name": "language_detection",
          "type": "boolean",
          "description": "Whether to detect document language",
          "required": false,
          "default": true
        }
      ],
      "outputs": [
        {
          "name": "content",
          "type": "string",
          "description": "Extracted and processed document content"
        },
        {
          "name": "metadata",
          "type": "object",
          "description": "Document metadata including author, creation date, etc."
        },
        {
          "name": "language",
          "type": "string",
          "description": "Detected language of the document"
        },
        {
          "name": "word_count",
          "type": "integer",
          "description": "Total word count in the document"
        }
      ],
      "error_conditions": [
        "Document not found",
        "Unsupported file format",
        "Document corrupted or unreadable",
        "Access permission denied",
        "Document too large"
      ],
      "side_effects": [
        "May download and cache documents temporarily",
        "Creates processing logs for debugging"
      ],
      "idempotent": true,
      "rate_limits": {
        "requests_per_minute": 20,
        "requests_per_hour": 300,
        "burst_limit": 3
      },
      "dependencies": [
        "document_parser_service",
        "language_detection_service",
        "file_storage_service"
      ],
      "examples": [
        {
          "description": "Process PDF document for full text extraction",
          "input": {
            "document_url": "https://example.com/research-paper.pdf",
            "processing_mode": "full_text",
            "output_format": "markdown"
          },
          "expected_output": {
            "content": "# Research Paper Title\n\nAbstract: This paper discusses...",
            "metadata": {
              "author": "Dr. Smith",
              "creation_date": "2024-01-15",
              "pages": 15
            },
            "language": "en",
            "word_count": 3500
          }
        }
      ],
      "security_requirements": [
        "URL validation",
        "File type verification",
        "Malware scanning",
        "Access control enforcement"
      ]
    },
    {
      "name": "notification_sender",
      "purpose": "Send notifications via multiple channels including email, SMS, and webhooks",
      "category": "communication",
      "inputs": [
        {
          "name": "recipients",
          "type": "array",
          "description": "List of recipient identifiers",
          "required": true,
          "min_items": 1,
          "max_items": 100,
          "items": {
            "type": "string",
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$|^\\+?[1-9]\\d{1,14}$"
          }
        },
        {
          "name": "message",
          "type": "string",
          "description": "Message content to send",
          "required": true,
          "min_length": 1,
          "max_length": 10000
        },
        {
          "name": "channel",
          "type": "string",
          "description": "Communication channel to use",
          "required": false,
          "default": "email",
          "enum": ["email", "sms", "webhook", "push"]
        },
        {
          "name": "priority",
          "type": "string",
          "description": "Message priority level",
          "required": false,
          "default": "normal",
          "enum": ["low", "normal", "high", "urgent"]
        },
        {
          "name": "template_id",
          "type": "string",
          "description": "Optional template ID for formatting",
          "required": false
        }
      ],
      "outputs": [
        {
          "name": "delivery_status",
          "type": "object",
          "description": "Status of message delivery to each recipient"
        },
        {
          "name": "message_id",
          "type": "string",
          "description": "Unique identifier for the sent message"
        },
        {
          "name": "delivery_timestamp",
          "type": "string",
          "description": "ISO timestamp when message was sent"
        }
      ],
      "error_conditions": [
        "Invalid recipient format",
        "Message too long",
        "Channel service unavailable",
        "Authentication failure",
        "Rate limit exceeded for channel"
      ],
      "side_effects": [
        "Sends actual notifications to recipients",
        "Logs delivery attempts and results",
        "Updates delivery statistics"
      ],
      "idempotent": false,
      "rate_limits": {
        "requests_per_minute": 100,
        "requests_per_hour": 2000,
        "burst_limit": 20
      },
      "dependencies": [
        "email_service",
        "sms_service",
        "webhook_service"
      ],
      "examples": [
        {
          "description": "Send email notification",
          "input": {
            "recipients": ["user@example.com"],
            "message": "Your report has been completed and is ready for review.",
            "channel": "email",
            "priority": "normal"
          },
          "expected_output": {
            "delivery_status": {
              "user@example.com": "delivered"
            },
            "message_id": "msg_12345",
            "delivery_timestamp": "2024-01-15T10:30:00Z"
          }
        }
      ],
      "security_requirements": [
        "Recipient validation",
        "Message content filtering",
        "Rate limiting per user",
        "Delivery confirmation"
      ]
    },
    {
      "name": "task_scheduler",
      "purpose": "Schedule and manage delayed or recurring tasks within the agent system",
      "category": "compute",
      "inputs": [
        {
          "name": "task_definition",
          "type": "object",
          "description": "Definition of the task to be scheduled",
          "required": true,
          "properties": {
            "action": {"type": "string"},
            "parameters": {"type": "object"},
            "retry_policy": {"type": "object"}
          }
        },
        {
          "name": "schedule",
          "type": "object",
          "description": "Scheduling parameters for the task",
          "required": true,
          "properties": {
            "type": {"type": "string", "enum": ["once", "recurring"]},
            "execute_at": {"type": "string"},
            "recurrence_pattern": {"type": "string"}
          }
        },
        {
          "name": "priority",
          "type": "integer",
          "description": "Task priority (1-10, higher is more urgent)",
          "required": false,
          "default": 5,
          "minimum": 1,
          "maximum": 10
        }
      ],
      "outputs": [
        {
          "name": "task_id",
          "type": "string",
          "description": "Unique identifier for the scheduled task"
        },
        {
          "name": "next_execution",
          "type": "string",
          "description": "ISO timestamp of next scheduled execution"
        },
        {
          "name": "status",
          "type": "string",
          "description": "Current status of the scheduled task"
        }
      ],
      "error_conditions": [
        "Invalid schedule format",
        "Past execution time specified",
        "Task queue full",
        "Invalid task definition",
        "Scheduling service unavailable"
      ],
      "side_effects": [
        "Creates scheduled tasks in the system",
        "May consume system resources for task storage",
        "Updates scheduling metrics"
      ],
      "idempotent": false,
      "rate_limits": {
        "requests_per_minute": 50,
        "requests_per_hour": 1000,
        "burst_limit": 10
      },
      "dependencies": [
        "task_scheduler_service",
        "task_executor_service"
      ],
      "examples": [
        {
          "description": "Schedule a one-time report generation",
          "input": {
            "task_definition": {
              "action": "generate_report",
              "parameters": {
                "report_type": "monthly_summary",
                "recipients": ["manager@example.com"]
              }
            },
            "schedule": {
              "type": "once",
              "execute_at": "2024-02-01T09:00:00Z"
            },
            "priority": 7
          },
          "expected_output": {
            "task_id": "task_67890",
            "next_execution": "2024-02-01T09:00:00Z",
            "status": "scheduled"
          }
        }
      ],
      "security_requirements": [
        "Task definition validation",
        "User authorization for scheduling",
        "Resource limit enforcement"
      ]
    }
  ]
}

Related skills

Setup Matt Pocock SkillsScaffold the per-repo configuration that Matt Pocock’s engineering agent skills rely on so they understand the issue tracker, triage labels, and domain documentation la462k185k

Lark Skill MakerQuickly turn any Lark/Feishu OpenAPI call or multi-step workflow into a reusable agent skill with its own SKILL.md.379k15.8k

CavemanSlash token usage by roughly 75% while keeping every technical detail intact when working with Claude Code, Cursor or similar agents.378k92.5k

Lark AppsConnect Claude, Cursor or custom agents directly to Lark (Feishu) for messaging, document automation, approval workflows and enterprise data access.375k

Running Claude Code Via Litellm CopilotRun Claude Code at a fraction of the cost by routing requests through LiteLLM to the GitHub Copilot Chat API.270k72

Codex PetGenerate a complete Codex Pet spritesheet and metadata from one reference image without needing an OpenAI key or Codex Pro.246k8

How it compares

Use agent-designer to quantify runtime agent performance from logs; use prompt-engineering when outputs are wrong but telemetry is not yet collected.

FAQ

What input does agent-designer require?

agent-designer expects JSON execution logs listing tasks, actions taken, results, elapsed time, and tokens used. The Python evaluator aggregates those fields into success, cost, and latency metrics.

What does agent-designer output?

agent-designer returns a performance report with task success rate, average cost per task, latency distribution, error patterns, tool efficiency notes, and prioritized optimization recommendations.

Is Agent Designer safe to install?

skills.sh reports 3 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

AI & Agent Buildingagentsautomation