Multi-Agent Architectures¶

✨ Bit: A multi-agent system should exist because specialization creates value, not because multiple LLMs sound impressive on a slide. If one agent with good tools solves your problem, stop there.

★ TL;DR¶

What: Systems where multiple specialized agents coordinate through explicit patterns (supervisor, debate, fan-out) to solve tasks too complex or broad for a single agent
Why: Enables specialization, parallel execution, and verification — but only when the coordination cost is justified
Key point: Multi-agent is an architecture trade-off, not an automatic upgrade. Start with one agent; add more only when a specific bottleneck appears.

★ Overview¶

Definition¶

A multi-agent architecture is a system in which multiple agents — each with separate roles, system prompts, tool access, and/or memory — collaborate through an explicit coordination pattern (supervisor, pipeline, debate, or swarm) to achieve a shared goal.

Scope¶

Covers: Common multi-agent patterns with architecture diagrams and code, framework comparison (LangGraph vs CrewAI vs ADK), design decisions, and operational risks. For single-agent fundamentals, see AI Agents. For protocol-level interoperability (MCP, A2A), see Agentic Protocols.

When Multi-Agent Helps vs Hurts¶

✅ Helps When	❌ Hurts When
Task naturally decomposes into specialized sub-tasks	Task is simple enough for one agent with tools
Different sub-tasks need different tools/models	Latency budget is tight (each agent adds 1-3s)
Verification is as important as generation	State sharing between agents is weak
Parallel execution yields real speedup	Evaluation and debugging infra isn't mature
You need adversarial review (red team, code review)	You're adding agents because it sounds impressive

Prerequisites¶

AI Agents — agent loop, tool use, memory
Agentic Protocols & Frameworks — MCP, A2A, ADK
Function Calling and Structured Output

★ Deep Dive¶

The 6 Multi-Agent Patterns¶

Pattern 1: Supervisor (Manager → Workers)¶

                    ┌──────────────────┐
                    │   SUPERVISOR     │
                    │   (planner/      │
                    │    router)       │
                    └────────┬─────────┘
                             │
                ┌────────────┼────────────┐
                ▼            ▼            ▼
         ┌──────────┐ ┌──────────┐ ┌──────────┐
         │ Worker A │ │ Worker B │ │ Worker C │
         │ (research│ │  (code)  │ │ (review) │
         │  agent)  │ │          │ │          │
         └──────────┘ └──────────┘ └──────────┘

How: Supervisor receives task, breaks it down, delegates to
     specialized workers, collects results, synthesizes final answer.
Best for: Complex workflows with clear sub-tasks.
Risk: Supervisor becomes bottleneck; workers can't self-correct.

Pattern 2: Sequential Pipeline¶

  [Input] → [Agent A: Research] → [Agent B: Draft] → [Agent C: Review] → [Output]
                                                          │
                                                    (if rejected)
                                                          │
                                                    ◄─────┘ Back to Agent B

How: Each agent processes the output of the previous one.
Best for: Workflows with natural sequential stages (research → write → review).
Risk: Errors compound through the pipeline. Feedback loops can create cycles.

Pattern 3: Debate / Adversarial¶

         ┌──────────┐         ┌──────────┐
         │ Agent A  │ ◄─────► │ Agent B  │
         │ (propose)│         │ (critique)│
         └──────────┘         └──────────┘
                    │         │
                    ▼         ▼
               ┌──────────────────┐
               │     JUDGE        │
               │ (final decision) │
               └──────────────────┘

How: Two agents argue opposing positions. A judge evaluates.
Best for: High-stakes decisions, red-teaming, fact verification.
Risk: Debate can be performative if both agents are equally wrong.

Pattern 4: Fan-Out / Fan-In (Parallel)¶

                    ┌──────────┐
                    │ Planner  │
                    └────┬─────┘
                ┌────────┼────────┐
                ▼        ▼        ▼
           [Agent 1] [Agent 2] [Agent 3]   ← Run in parallel
                ▼        ▼        ▼
                └────────┼────────┘
                    ┌────┴─────┐
                    │ Aggregator│
                    └──────────┘

How: Multiple agents work on the same/similar tasks in parallel. Results aggregated.
Best for: Research, brainstorming, web search, diverse perspective generation.
Risk: Aggregation is hard. Which result do you trust?

Pattern 5: Hierarchical (Multi-Level Supervisors)¶

                 ┌─────────────────┐
                 │  CEO Agent      │
                 │  (orchestrator) │
                 └────────┬────────┘
                ┌─────────┼──────────┐
                ▼                    ▼
         ┌──────────┐         ┌──────────┐
         │ Manager A│         │ Manager B│
         │ (eng)    │         │ (data)   │
         └────┬─────┘         └────┬─────┘
         ┌────┼────┐          ┌────┼────┐
         ▼         ▼          ▼         ▼
     [Worker]  [Worker]   [Worker]  [Worker]

How: Mirrors organizational hierarchy. Managers delegate to workers.
Best for: Very complex projects with multiple workstreams.
Risk: Communication overhead, latency multiplication at each level.

Pattern 6: Swarm (Peer-to-Peer)¶

     [Agent A] ◄──► [Agent B]
         ▲               ▲
         │               │
         ▼               ▼
     [Agent C] ◄──► [Agent D]

How: No central coordinator. Agents communicate peer-to-peer via handoffs.
Best for: Customer service (transfer between specialized agents).
Risk: Hard to track state. Conversation can "get lost" between agents.
Framework: OpenAI Swarm (experimental).

Pattern Summary¶

Pattern	Coordination	Parallelism	Complexity	Best When
Supervisor	Central	Workers parallel	Medium	Clear sub-task decomposition
Pipeline	Sequential	None	Low	Natural stage-by-stage flow
Debate	Peer	Two in parallel	Medium	Verification, red-teaming
Fan-out	Central	High	Medium	Research, exploration
Hierarchical	Multi-level	Per-level	High	Enterprise, multi-workstream
Swarm	Distributed	Per-pair	High	Customer service, handoffs

Design Decisions¶

Decision	Options	Trade-off
Role granularity	Broad (3-4 agents) vs Narrow (10+ agents)	Fewer = less coordination; More = better specialization
State management	Centralized (shared dict/DB) vs Distributed (per-agent)	Central = simpler but bottleneck; Distributed = scales but consistency hard
Communication	Direct messages vs Shared workspace vs Message queue	Direct = fast but coupling; Shared = visible but collision risk
Arbitration	Supervisor decides vs Voting vs Judge agent vs Human	Auto = fast but risky; Human = safe but slow
Tool permissions	All agents share tools vs Least privilege	Shared = flexible; Restricted = safe (principle of least privilege)

★ Code & Implementation¶

Supervisor Pattern with LangGraph¶

# pip install langgraph>=0.2 langchain-openai>=0.2 langchain-core>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langgraph>=0.2

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# 1. Define shared state
class TeamState(TypedDict):
    messages: Annotated[list, add_messages]
    task: str
    research_output: str
    draft_output: str
    review_output: str
    current_agent: str

# 2. Define specialized agents
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

def supervisor(state: TeamState) -> dict:
    """Supervisor decides which agent to call next."""
    system = """You are a project supervisor. Based on the current state, decide the next step:
    - If no research exists: route to 'researcher'
    - If research exists but no draft: route to 'writer'
    - If draft exists but no review: route to 'reviewer'
    - If review exists: route to 'done'

    Respond with ONLY the agent name: researcher, writer, reviewer, or done"""

    context = f"""Task: {state['task']}
Research: {'Done' if state.get('research_output') else 'Not started'}
Draft: {'Done' if state.get('draft_output') else 'Not started'}
Review: {'Done' if state.get('review_output') else 'Not started'}"""

    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=context),
    ])
    next_agent = response.content.strip().lower()
    return {"current_agent": next_agent}

def researcher(state: TeamState) -> dict:
    """Research agent gathers information."""
    system = "You are a research specialist. Provide thorough research findings."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Research this topic: {state['task']}"),
    ])
    return {"research_output": response.content}

def writer(state: TeamState) -> dict:
    """Writer agent drafts content based on research."""
    system = "You are a technical writer. Create a clear, well-structured draft."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Based on this research, write a draft:\n{state['research_output']}"),
    ])
    return {"draft_output": response.content}

def reviewer(state: TeamState) -> dict:
    """Reviewer agent provides feedback."""
    system = "You are a senior editor. Review for accuracy, clarity, and completeness."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Review this draft:\n{state['draft_output']}"),
    ])
    return {"review_output": response.content}

# 3. Build the graph
def route_to_agent(state: TeamState) -> str:
    return state.get("current_agent", "done")

graph = StateGraph(TeamState)
graph.add_node("supervisor", supervisor)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)

graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", route_to_agent, {
    "researcher": "researcher",
    "writer": "writer",
    "reviewer": "reviewer",
    "done": END,
})
# After each worker, go back to supervisor for next routing decision
graph.add_edge("researcher", "supervisor")
graph.add_edge("writer", "supervisor")
graph.add_edge("reviewer", "supervisor")

app = graph.compile()

# 4. Run the multi-agent system
result = app.invoke({
    "task": "Explain how KV-cache optimization works in LLM serving",
    "messages": [],
    "research_output": "",
    "draft_output": "",
    "review_output": "",
    "current_agent": "",
})

print(f"Research: {result['research_output'][:200]}...")
print(f"Draft: {result['draft_output'][:200]}...")
print(f"Review: {result['review_output'][:200]}...")

# Expected output: 3 stages executed sequentially
# - Research: detailed technical findings
# - Draft: structured document based on research
# - Review: feedback with suggestions

CrewAI Multi-Agent Team¶

# pip install crewai>=0.80
# ⚠️ Last tested: 2026-04 | Requires: crewai>=0.80

from crewai import Agent, Task, Crew, Process

# 1. Define specialized agents
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, accurate information on the given topic",
    backstory="Expert researcher with deep knowledge of AI/ML systems.",
    verbose=True,
    allow_delegation=False,
)

writer = Agent(
    role="Technical Writer",
    goal="Transform research into clear, actionable documentation",
    backstory="Experienced tech writer who specializes in making complex topics accessible.",
    verbose=True,
    allow_delegation=False,
)

reviewer = Agent(
    role="Quality Reviewer",
    goal="Ensure accuracy, completeness, and clarity of the final output",
    backstory="Senior engineer who reviews technical content for production readiness.",
    verbose=True,
    allow_delegation=False,
)

# 2. Define tasks
research_task = Task(
    description="Research KV-cache optimization techniques for LLM serving: PagedAttention, prefix caching, and memory management strategies.",
    expected_output="A detailed research summary with key techniques, trade-offs, and current state-of-the-art.",
    agent=researcher,
)

writing_task = Task(
    description="Write a technical guide based on the research findings.",
    expected_output="A structured document with introduction, techniques, code examples, and best practices.",
    agent=writer,
)

review_task = Task(
    description="Review the technical guide for accuracy, completeness, and clarity.",
    expected_output="Review with specific suggestions for improvement and a quality score.",
    agent=reviewer,
)

# 3. Create and run the crew
crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, writing_task, review_task],
    process=Process.sequential,  # or Process.hierarchical
    verbose=True,
)

result = crew.kickoff()
print(result)

# Expected output: Sequential execution through all 3 agents
# Total time: ~30-60 seconds (3 LLM calls)

Framework Comparison (April 2026)¶

Aspect	LangGraph	CrewAI	Google ADK
Architecture	Graph-based (nodes + edges)	Role-based teams	Hierarchical + graph
State management	Explicit typed state	Implicit task context	Session-based
Flexibility	Maximum — build any pattern	Medium — opinionated framework	High — Google ecosystem
Multi-agent	✅ Any pattern (supervisor, swarm, etc.)	✅ Sequential or hierarchical	✅ Sub-agents + delegation
Protocol support	MCP via langchain-mcp	MCP via plugins	A2A native, MCP support
Learning curve	High (graph concepts)	Low (intuitive roles/tasks)	Medium
Production use	✅ Widely adopted	⚠️ Growing	✅ Google-backed
Best for	Custom, complex workflows	Quick prototyping, business automation	Google Cloud integration

DECISION GUIDE:
  "I need maximum control and custom patterns"     → LangGraph
  "I want to prototype quickly with roles/tasks"   → CrewAI
  "I'm in the Google Cloud ecosystem"              → Google ADK
  "I need inter-company agent communication"       → A2A protocol (any framework)

◆ Quick Reference¶

MULTI-AGENT DESIGN CHECKLIST:
  □ Can a single agent with tools solve this? (If yes → don't use multi-agent)
  □ What specific bottleneck justifies adding agents?
  □ What's the coordination pattern? (Supervisor / Pipeline / Debate / Fan-out)
  □ How do agents share state? (Shared dict / Message passing / Workspace)
  □ What's the failure path? (Max retries, human escalation, fallback)
  □ How do you observe and debug? (Tracing per-agent, trajectory logging)
  □ What's the cost per request? (N agents × LLM cost per agent)

MULTI-AGENT COST MODEL:
  Single agent:    1 × (LLM call + tools) = ~$0.03
  Supervisor + 3 workers: 4-7 × LLM call = ~$0.12-$0.21
  Debate (2 rounds): 5-7 × LLM call = ~$0.15-$0.21

  Rule of thumb: Multi-agent costs 3-7× more than single-agent.
  Make sure the quality improvement justifies this.

◆ Production Failure Modes¶

Failure	Symptoms	Root Cause	Mitigation
Token explosion	Costs spike, context windows overflow	Each agent adds messages to shared context; N agents × M turns = huge context	Summarize between agents, limit message passing to structured data only
Cascading failures	One agent error causes all downstream agents to produce garbage	No error handling between agents, errors propagate as corrupted context	Add error boundaries, validate inter-agent output with schemas
State inconsistency	Agents contradict each other, use stale information	Distributed state without consistency guarantees	Use centralized state store, version state updates
Role collapse	All agents behave identically despite different system prompts	System prompts too similar, shared tools dominate behavior	Make roles concrete with tool restrictions, test role differentiation
Infinite delegation	Supervisor keeps routing back to the same agent, never finishes	No progress detection, weak stop conditions	Max iterations per agent (3-5), progress metrics, force termination
Coordination overhead > value	Multi-agent is slower and more expensive than single-agent with same quality	Task doesn't benefit from specialization	Benchmark single-agent baseline first; justify each additional agent

○ Gotchas & Common Mistakes¶

⚠️ Multi-agent ≠ better: The most common mistake is assuming more agents = more capability. Often one agent with good tools beats three agents with bad coordination.
⚠️ Debugging is 5× harder: Each agent adds a layer of opacity. Invest in trajectory logging and per-agent tracing before scaling up.
⚠️ Context sharing is the hardest part: How agents share information (full messages vs summaries vs structured data) determines whether multi-agent works or fails.
⚠️ Cost multiplier is real: 4 agents × 3 turns each = 12 LLM calls per request. At $0.03/call, that's $0.36/request vs $0.09 for single-agent.
⚠️ Start with 2 agents, not 7: The jump from 1→2 agents teaches you more about coordination than the jump from 5→7.

○ Interview Angles¶

Q: When would you choose multi-agent over single-agent?
A: I'd choose multi-agent when three conditions are met: (1) the task has natural decomposition boundaries where different sub-tasks benefit from different tool access, system prompts, or contexts — for example, a research agent with web search and a coding agent with a sandbox; (2) the quality improvement from specialization is measurable and significant, not incremental; and (3) the latency and cost multiplier (3-7× more expensive) is acceptable for the use case. I'd always benchmark a single-agent baseline first. If one agent with well-designed tools achieves 80%+ of the quality, the coordination overhead of multi-agent isn't justified. The exception is adversarial review: having a critic agent that challenges the primary agent's output catches errors that self-review misses.
Q: Design a multi-agent system for automated code review.
A: I'd use a pipeline pattern with 3 agents. First, a Code Analyzer agent with access to static analysis tools (linting, complexity metrics, type checking) processes the diff and produces a structured analysis. Second, a Logic Reviewer agent with access to the codebase context (via RAG over the repo) evaluates correctness, identifies potential bugs, and checks for security issues. Third, a Summary Agent synthesizes both analyses into a human-readable review with actionable suggestions, severity levels, and specific line references. State management: each agent writes to a shared ReviewState dict with typed fields (analysis, logic_issues, suggestions). I'd add a max_cost guard ($0.50/review), trajectory logging via LangSmith, and a confidence score—if any agent is < 70% confident, flag for human review instead of auto-approving.

◆ Hands-On Exercises¶

Exercise 1: Build a Research + Writing Team¶

Goal: Build a 2-agent system where one researches and one writes Time: 60 minutes Steps: 1. Create two agents in LangGraph: Researcher (with web search tool) and Writer 2. Implement the supervisor pattern that routes: research → write → done 3. Give them a topic: "Compare vLLM vs TGI for LLM serving in production" 4. Measure: total cost, latency, and output quality vs single-agent baseline Expected Output: Structured document produced by the team, comparison metrics showing when multi-agent adds value

Exercise 2: Debate Pattern for Fact Verification¶

Goal: Build a debate system that validates claims through adversarial review Time: 45 minutes Steps: 1. Create Agent A (claim defender) and Agent B (claim challenger) 2. Start with a claim: "Speculative decoding always reduces latency for LLM serving" 3. Run 2 rounds of debate (each agent responds to the other) 4. Add a Judge agent that reads the debate and renders a verdict Expected Output: 2-round debate transcript + judge verdict explaining nuanced truth (speculative decoding only helps when draft model is fast and acceptance rate is high)

★ Connections¶

Relationship	Topics
Builds on	AI Agents, Agentic Protocols, Function Calling
Leads to	Agent Evaluation, AI System Design, Enterprise automation
Compare with	Single-agent (simpler, cheaper), deterministic orchestration (Airflow/Prefect — no LLM reasoning), microservices (code not agents)
Cross-domain	Distributed systems, organizational design (Conway's Law), multi-player game AI

★ Recommended Resources¶

Type	Resource	Why
📄 Paper	Anthropic — "Building Effective Agents" (2025)	Industry reference for when to use multi-agent vs single-agent patterns
🔧 Hands-on	LangGraph Multi-Agent Tutorial	Step-by-step supervisor and swarm pattern implementation
🔧 Hands-on	CrewAI Documentation	Easiest framework to prototype multi-agent teams quickly
🎥 Video	Harrison Chase — "Multi-Agent Architectures" (LangChain)	LangGraph creator explaining when and how to use multi-agent patterns
📘 Book	"AI Engineering" by Chip Huyen (2025), Ch 7 (Agents)	Practical treatment of agent systems including multi-agent coordination
📄 Paper	Wu et al. "AutoGen: Enabling Next-Gen LLM Applications" (2023)	Microsoft's multi-agent conversation framework and design patterns
🔧 Hands-on	Google ADK Multi-Agent Documentation	Hierarchical agent teams with sub-agent delegation

★ Sources¶

Anthropic "Building Effective Agents" Guide (2025)
LangGraph Multi-Agent Documentation — https://langchain-ai.github.io/langgraph/
CrewAI Documentation — https://docs.crewai.com/
Wu et al. "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation" (2023)
Google ADK Documentation — https://google.github.io/adk-docs/
AI Agents