Skip to content

Multi-Agent Architectures

Bit: A multi-agent system should exist because specialization creates value, not because multiple LLMs sound impressive on a slide. If one agent with good tools solves your problem, stop there.


★ TL;DR

  • What: Systems where multiple specialized agents coordinate through explicit patterns (supervisor, debate, fan-out) to solve tasks too complex or broad for a single agent
  • Why: Enables specialization, parallel execution, and verification — but only when the coordination cost is justified
  • Key point: Multi-agent is an architecture trade-off, not an automatic upgrade. Start with one agent; add more only when a specific bottleneck appears.

★ Overview

Definition

A multi-agent architecture is a system in which multiple agents — each with separate roles, system prompts, tool access, and/or memory — collaborate through an explicit coordination pattern (supervisor, pipeline, debate, or swarm) to achieve a shared goal.

Scope

Covers: Common multi-agent patterns with architecture diagrams and code, framework comparison (LangGraph vs CrewAI vs ADK), design decisions, and operational risks. For single-agent fundamentals, see AI Agents. For protocol-level interoperability (MCP, A2A), see Agentic Protocols.

When Multi-Agent Helps vs Hurts

✅ Helps When ❌ Hurts When
Task naturally decomposes into specialized sub-tasks Task is simple enough for one agent with tools
Different sub-tasks need different tools/models Latency budget is tight (each agent adds 1-3s)
Verification is as important as generation State sharing between agents is weak
Parallel execution yields real speedup Evaluation and debugging infra isn't mature
You need adversarial review (red team, code review) You're adding agents because it sounds impressive

Prerequisites


★ Deep Dive

The 6 Multi-Agent Patterns

Pattern 1: Supervisor (Manager → Workers)

                    ┌──────────────────┐
                    │   SUPERVISOR     │
                    │   (planner/      │
                    │    router)       │
                    └────────┬─────────┘
                ┌────────────┼────────────┐
                ▼            ▼            ▼
         ┌──────────┐ ┌──────────┐ ┌──────────┐
         │ Worker A │ │ Worker B │ │ Worker C │
         │ (research│ │  (code)  │ │ (review) │
         │  agent)  │ │          │ │          │
         └──────────┘ └──────────┘ └──────────┘

How: Supervisor receives task, breaks it down, delegates to
     specialized workers, collects results, synthesizes final answer.
Best for: Complex workflows with clear sub-tasks.
Risk: Supervisor becomes bottleneck; workers can't self-correct.

Pattern 2: Sequential Pipeline

  [Input] → [Agent A: Research] → [Agent B: Draft] → [Agent C: Review] → [Output]
                                                    (if rejected)
                                                    ◄─────┘ Back to Agent B

How: Each agent processes the output of the previous one.
Best for: Workflows with natural sequential stages (research → write → review).
Risk: Errors compound through the pipeline. Feedback loops can create cycles.

Pattern 3: Debate / Adversarial

         ┌──────────┐         ┌──────────┐
         │ Agent A  │ ◄─────► │ Agent B  │
         │ (propose)│         │ (critique)│
         └──────────┘         └──────────┘
                    │         │
                    ▼         ▼
               ┌──────────────────┐
               │     JUDGE        │
               │ (final decision) │
               └──────────────────┘

How: Two agents argue opposing positions. A judge evaluates.
Best for: High-stakes decisions, red-teaming, fact verification.
Risk: Debate can be performative if both agents are equally wrong.

Pattern 4: Fan-Out / Fan-In (Parallel)

                    ┌──────────┐
                    │ Planner  │
                    └────┬─────┘
                ┌────────┼────────┐
                ▼        ▼        ▼
           [Agent 1] [Agent 2] [Agent 3]   ← Run in parallel
                ▼        ▼        ▼
                └────────┼────────┘
                    ┌────┴─────┐
                    │ Aggregator│
                    └──────────┘

How: Multiple agents work on the same/similar tasks in parallel. Results aggregated.
Best for: Research, brainstorming, web search, diverse perspective generation.
Risk: Aggregation is hard. Which result do you trust?

Pattern 5: Hierarchical (Multi-Level Supervisors)

                 ┌─────────────────┐
                 │  CEO Agent      │
                 │  (orchestrator) │
                 └────────┬────────┘
                ┌─────────┼──────────┐
                ▼                    ▼
         ┌──────────┐         ┌──────────┐
         │ Manager A│         │ Manager B│
         │ (eng)    │         │ (data)   │
         └────┬─────┘         └────┬─────┘
         ┌────┼────┐          ┌────┼────┐
         ▼         ▼          ▼         ▼
     [Worker]  [Worker]   [Worker]  [Worker]

How: Mirrors organizational hierarchy. Managers delegate to workers.
Best for: Very complex projects with multiple workstreams.
Risk: Communication overhead, latency multiplication at each level.

Pattern 6: Swarm (Peer-to-Peer)

     [Agent A] ◄──► [Agent B]
         ▲               ▲
         │               │
         ▼               ▼
     [Agent C] ◄──► [Agent D]

How: No central coordinator. Agents communicate peer-to-peer via handoffs.
Best for: Customer service (transfer between specialized agents).
Risk: Hard to track state. Conversation can "get lost" between agents.
Framework: OpenAI Swarm (experimental).

Pattern Summary

Pattern Coordination Parallelism Complexity Best When
Supervisor Central Workers parallel Medium Clear sub-task decomposition
Pipeline Sequential None Low Natural stage-by-stage flow
Debate Peer Two in parallel Medium Verification, red-teaming
Fan-out Central High Medium Research, exploration
Hierarchical Multi-level Per-level High Enterprise, multi-workstream
Swarm Distributed Per-pair High Customer service, handoffs

Design Decisions

Decision Options Trade-off
Role granularity Broad (3-4 agents) vs Narrow (10+ agents) Fewer = less coordination; More = better specialization
State management Centralized (shared dict/DB) vs Distributed (per-agent) Central = simpler but bottleneck; Distributed = scales but consistency hard
Communication Direct messages vs Shared workspace vs Message queue Direct = fast but coupling; Shared = visible but collision risk
Arbitration Supervisor decides vs Voting vs Judge agent vs Human Auto = fast but risky; Human = safe but slow
Tool permissions All agents share tools vs Least privilege Shared = flexible; Restricted = safe (principle of least privilege)

★ Code & Implementation

Supervisor Pattern with LangGraph

# pip install langgraph>=0.2 langchain-openai>=0.2 langchain-core>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langgraph>=0.2

from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# 1. Define shared state
class TeamState(TypedDict):
    messages: Annotated[list, add_messages]
    task: str
    research_output: str
    draft_output: str
    review_output: str
    current_agent: str

# 2. Define specialized agents
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)

def supervisor(state: TeamState) -> dict:
    """Supervisor decides which agent to call next."""
    system = """You are a project supervisor. Based on the current state, decide the next step:
    - If no research exists: route to 'researcher'
    - If research exists but no draft: route to 'writer'
    - If draft exists but no review: route to 'reviewer'
    - If review exists: route to 'done'

    Respond with ONLY the agent name: researcher, writer, reviewer, or done"""

    context = f"""Task: {state['task']}
Research: {'Done' if state.get('research_output') else 'Not started'}
Draft: {'Done' if state.get('draft_output') else 'Not started'}
Review: {'Done' if state.get('review_output') else 'Not started'}"""

    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=context),
    ])
    next_agent = response.content.strip().lower()
    return {"current_agent": next_agent}

def researcher(state: TeamState) -> dict:
    """Research agent gathers information."""
    system = "You are a research specialist. Provide thorough research findings."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Research this topic: {state['task']}"),
    ])
    return {"research_output": response.content}

def writer(state: TeamState) -> dict:
    """Writer agent drafts content based on research."""
    system = "You are a technical writer. Create a clear, well-structured draft."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Based on this research, write a draft:\n{state['research_output']}"),
    ])
    return {"draft_output": response.content}

def reviewer(state: TeamState) -> dict:
    """Reviewer agent provides feedback."""
    system = "You are a senior editor. Review for accuracy, clarity, and completeness."
    response = llm.invoke([
        SystemMessage(content=system),
        HumanMessage(content=f"Review this draft:\n{state['draft_output']}"),
    ])
    return {"review_output": response.content}

# 3. Build the graph
def route_to_agent(state: TeamState) -> str:
    return state.get("current_agent", "done")

graph = StateGraph(TeamState)
graph.add_node("supervisor", supervisor)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)

graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", route_to_agent, {
    "researcher": "researcher",
    "writer": "writer",
    "reviewer": "reviewer",
    "done": END,
})
# After each worker, go back to supervisor for next routing decision
graph.add_edge("researcher", "supervisor")
graph.add_edge("writer", "supervisor")
graph.add_edge("reviewer", "supervisor")

app = graph.compile()

# 4. Run the multi-agent system
result = app.invoke({
    "task": "Explain how KV-cache optimization works in LLM serving",
    "messages": [],
    "research_output": "",
    "draft_output": "",
    "review_output": "",
    "current_agent": "",
})

print(f"Research: {result['research_output'][:200]}...")
print(f"Draft: {result['draft_output'][:200]}...")
print(f"Review: {result['review_output'][:200]}...")

# Expected output: 3 stages executed sequentially
# - Research: detailed technical findings
# - Draft: structured document based on research
# - Review: feedback with suggestions

CrewAI Multi-Agent Team

# pip install crewai>=0.80
# ⚠️ Last tested: 2026-04 | Requires: crewai>=0.80

from crewai import Agent, Task, Crew, Process

# 1. Define specialized agents
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, accurate information on the given topic",
    backstory="Expert researcher with deep knowledge of AI/ML systems.",
    verbose=True,
    allow_delegation=False,
)

writer = Agent(
    role="Technical Writer",
    goal="Transform research into clear, actionable documentation",
    backstory="Experienced tech writer who specializes in making complex topics accessible.",
    verbose=True,
    allow_delegation=False,
)

reviewer = Agent(
    role="Quality Reviewer",
    goal="Ensure accuracy, completeness, and clarity of the final output",
    backstory="Senior engineer who reviews technical content for production readiness.",
    verbose=True,
    allow_delegation=False,
)

# 2. Define tasks
research_task = Task(
    description="Research KV-cache optimization techniques for LLM serving: PagedAttention, prefix caching, and memory management strategies.",
    expected_output="A detailed research summary with key techniques, trade-offs, and current state-of-the-art.",
    agent=researcher,
)

writing_task = Task(
    description="Write a technical guide based on the research findings.",
    expected_output="A structured document with introduction, techniques, code examples, and best practices.",
    agent=writer,
)

review_task = Task(
    description="Review the technical guide for accuracy, completeness, and clarity.",
    expected_output="Review with specific suggestions for improvement and a quality score.",
    agent=reviewer,
)

# 3. Create and run the crew
crew = Crew(
    agents=[researcher, writer, reviewer],
    tasks=[research_task, writing_task, review_task],
    process=Process.sequential,  # or Process.hierarchical
    verbose=True,
)

result = crew.kickoff()
print(result)

# Expected output: Sequential execution through all 3 agents
# Total time: ~30-60 seconds (3 LLM calls)

Framework Comparison (April 2026)

Aspect LangGraph CrewAI Google ADK
Architecture Graph-based (nodes + edges) Role-based teams Hierarchical + graph
State management Explicit typed state Implicit task context Session-based
Flexibility Maximum — build any pattern Medium — opinionated framework High — Google ecosystem
Multi-agent ✅ Any pattern (supervisor, swarm, etc.) ✅ Sequential or hierarchical ✅ Sub-agents + delegation
Protocol support MCP via langchain-mcp MCP via plugins A2A native, MCP support
Learning curve High (graph concepts) Low (intuitive roles/tasks) Medium
Production use ✅ Widely adopted ⚠️ Growing ✅ Google-backed
Best for Custom, complex workflows Quick prototyping, business automation Google Cloud integration
DECISION GUIDE:
  "I need maximum control and custom patterns"     → LangGraph
  "I want to prototype quickly with roles/tasks"   → CrewAI
  "I'm in the Google Cloud ecosystem"              → Google ADK
  "I need inter-company agent communication"       → A2A protocol (any framework)

◆ Quick Reference

MULTI-AGENT DESIGN CHECKLIST:
  □ Can a single agent with tools solve this? (If yes → don't use multi-agent)
  □ What specific bottleneck justifies adding agents?
  □ What's the coordination pattern? (Supervisor / Pipeline / Debate / Fan-out)
  □ How do agents share state? (Shared dict / Message passing / Workspace)
  □ What's the failure path? (Max retries, human escalation, fallback)
  □ How do you observe and debug? (Tracing per-agent, trajectory logging)
  □ What's the cost per request? (N agents × LLM cost per agent)

MULTI-AGENT COST MODEL:
  Single agent:    1 × (LLM call + tools) = ~$0.03
  Supervisor + 3 workers: 4-7 × LLM call = ~$0.12-$0.21
  Debate (2 rounds): 5-7 × LLM call = ~$0.15-$0.21

  Rule of thumb: Multi-agent costs 3-7× more than single-agent.
  Make sure the quality improvement justifies this.

◆ Production Failure Modes

Failure Symptoms Root Cause Mitigation
Token explosion Costs spike, context windows overflow Each agent adds messages to shared context; N agents × M turns = huge context Summarize between agents, limit message passing to structured data only
Cascading failures One agent error causes all downstream agents to produce garbage No error handling between agents, errors propagate as corrupted context Add error boundaries, validate inter-agent output with schemas
State inconsistency Agents contradict each other, use stale information Distributed state without consistency guarantees Use centralized state store, version state updates
Role collapse All agents behave identically despite different system prompts System prompts too similar, shared tools dominate behavior Make roles concrete with tool restrictions, test role differentiation
Infinite delegation Supervisor keeps routing back to the same agent, never finishes No progress detection, weak stop conditions Max iterations per agent (3-5), progress metrics, force termination
Coordination overhead > value Multi-agent is slower and more expensive than single-agent with same quality Task doesn't benefit from specialization Benchmark single-agent baseline first; justify each additional agent

○ Gotchas & Common Mistakes

  • ⚠️ Multi-agent ≠ better: The most common mistake is assuming more agents = more capability. Often one agent with good tools beats three agents with bad coordination.
  • ⚠️ Debugging is 5× harder: Each agent adds a layer of opacity. Invest in trajectory logging and per-agent tracing before scaling up.
  • ⚠️ Context sharing is the hardest part: How agents share information (full messages vs summaries vs structured data) determines whether multi-agent works or fails.
  • ⚠️ Cost multiplier is real: 4 agents × 3 turns each = 12 LLM calls per request. At $0.03/call, that's $0.36/request vs $0.09 for single-agent.
  • ⚠️ Start with 2 agents, not 7: The jump from 1→2 agents teaches you more about coordination than the jump from 5→7.

○ Interview Angles

  • Q: When would you choose multi-agent over single-agent?
  • A: I'd choose multi-agent when three conditions are met: (1) the task has natural decomposition boundaries where different sub-tasks benefit from different tool access, system prompts, or contexts — for example, a research agent with web search and a coding agent with a sandbox; (2) the quality improvement from specialization is measurable and significant, not incremental; and (3) the latency and cost multiplier (3-7× more expensive) is acceptable for the use case. I'd always benchmark a single-agent baseline first. If one agent with well-designed tools achieves 80%+ of the quality, the coordination overhead of multi-agent isn't justified. The exception is adversarial review: having a critic agent that challenges the primary agent's output catches errors that self-review misses.

  • Q: Design a multi-agent system for automated code review.

  • A: I'd use a pipeline pattern with 3 agents. First, a Code Analyzer agent with access to static analysis tools (linting, complexity metrics, type checking) processes the diff and produces a structured analysis. Second, a Logic Reviewer agent with access to the codebase context (via RAG over the repo) evaluates correctness, identifies potential bugs, and checks for security issues. Third, a Summary Agent synthesizes both analyses into a human-readable review with actionable suggestions, severity levels, and specific line references. State management: each agent writes to a shared ReviewState dict with typed fields (analysis, logic_issues, suggestions). I'd add a max_cost guard ($0.50/review), trajectory logging via LangSmith, and a confidence score—if any agent is < 70% confident, flag for human review instead of auto-approving.

◆ Hands-On Exercises

Exercise 1: Build a Research + Writing Team

Goal: Build a 2-agent system where one researches and one writes Time: 60 minutes Steps: 1. Create two agents in LangGraph: Researcher (with web search tool) and Writer 2. Implement the supervisor pattern that routes: research → write → done 3. Give them a topic: "Compare vLLM vs TGI for LLM serving in production" 4. Measure: total cost, latency, and output quality vs single-agent baseline Expected Output: Structured document produced by the team, comparison metrics showing when multi-agent adds value

Exercise 2: Debate Pattern for Fact Verification

Goal: Build a debate system that validates claims through adversarial review Time: 45 minutes Steps: 1. Create Agent A (claim defender) and Agent B (claim challenger) 2. Start with a claim: "Speculative decoding always reduces latency for LLM serving" 3. Run 2 rounds of debate (each agent responds to the other) 4. Add a Judge agent that reads the debate and renders a verdict Expected Output: 2-round debate transcript + judge verdict explaining nuanced truth (speculative decoding only helps when draft model is fast and acceptance rate is high)


★ Connections

Relationship Topics
Builds on AI Agents, Agentic Protocols, Function Calling
Leads to Agent Evaluation, AI System Design, Enterprise automation
Compare with Single-agent (simpler, cheaper), deterministic orchestration (Airflow/Prefect — no LLM reasoning), microservices (code not agents)
Cross-domain Distributed systems, organizational design (Conway's Law), multi-player game AI

Type Resource Why
📄 Paper Anthropic — "Building Effective Agents" (2025) Industry reference for when to use multi-agent vs single-agent patterns
🔧 Hands-on LangGraph Multi-Agent Tutorial Step-by-step supervisor and swarm pattern implementation
🔧 Hands-on CrewAI Documentation Easiest framework to prototype multi-agent teams quickly
🎥 Video Harrison Chase — "Multi-Agent Architectures" (LangChain) LangGraph creator explaining when and how to use multi-agent patterns
📘 Book "AI Engineering" by Chip Huyen (2025), Ch 7 (Agents) Practical treatment of agent systems including multi-agent coordination
📄 Paper Wu et al. "AutoGen: Enabling Next-Gen LLM Applications" (2023) Microsoft's multi-agent conversation framework and design patterns
🔧 Hands-on Google ADK Multi-Agent Documentation Hierarchical agent teams with sub-agent delegation

★ Sources

  • Anthropic "Building Effective Agents" Guide (2025)
  • LangGraph Multi-Agent Documentation — https://langchain-ai.github.io/langgraph/
  • CrewAI Documentation — https://docs.crewai.com/
  • Wu et al. "AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation" (2023)
  • Google ADK Documentation — https://google.github.io/adk-docs/
  • AI Agents