AI Agents¶
✨ Bit: 2024 was "year of the chatbot." 2025-2026 is "year of the agent." The difference? Chatbots answer. Agents do.
★ TL;DR¶
- What: AI systems that autonomously plan, reason, use tools, and take multi-step actions to achieve goals
- Why: The biggest paradigm shift in GenAI since ChatGPT. Moves AI from "answer questions" to "complete tasks"
- Key point: An agent = LLM + Planning + Memory + Tools. The LLM is the brain, not the whole system.
★ Overview¶
Definition¶
An AI Agent is a system where an LLM acts as a reasoning engine that can: (1) understand goals, (2) break them into sub-tasks, (3) decide which tools to use, (4) execute actions, (5) observe results, and (6) iterate until the goal is achieved — with minimal human intervention.
Scope¶
Covers: Agent architecture, tool use, planning patterns, multi-agent systems, and frameworks. For the underlying LLM, see Llms Overview. For RAG as a tool agents use, see Rag. For specialized coordination patterns, see Multi-Agent Architectures. For tracing and scoring agent runs, see Agent Evaluation & Observability.
Significance¶
- Defining trend of 2025-2026: Every major AI company is building agent capabilities
- Claude Opus 4 was specifically designed for agentic workflows
- Enterprise adoption is scaling: customer support, code generation, data analysis
- Projected to transform knowledge work more fundamentally than chatbots did
Prerequisites¶
- Llms Overview — the brain of the agent
- Prompt Engineering — how to instruct agents
- Rag — agents often use RAG as a tool
★ Deep Dive¶
Agent Architecture¶
┌─────────────────────────────────────────────────────┐
│ AI AGENT │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ LLM (the brain) │ │
│ │ - Understands goals │ │
│ │ - Reasons about next steps │ │
│ │ - Generates tool calls │ │
│ └───────────────────────────────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ │
│ │ PLANNING│ │ MEMORY │ │ TOOLS │ │
│ │ │ │ │ │ │ │
│ │ - Task │ │ - Short │ │ - Search │ │
│ │ decomp│ │ (conv) │ │ - Code │ │
│ │ - Step │ │ - Long │ │ - APIs │ │
│ │ by │ │ (RAG/ │ │ - Files │ │
│ │ step │ │ DB) │ │ - Browse │ │
│ └─────────┘ └──────────┘ └──────────┘ │
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ OBSERVATION & FEEDBACK LOOP │ │
│ │ Tool result → Reason → Next action → ... │ │
│ └───────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
The Agent Loop (ReAct Pattern)¶
User Goal: "Research the top 3 competitors and create a comparison table"
THINK: I need to search for competitors first.
ACT: tool_call: web_search("top competitors for [product]")
OBSERVE: [search results returned]
THINK: I found 5 competitors. Let me get details on top 3.
ACT: tool_call: web_search("competitor A features pricing")
OBSERVE: [detailed results]
THINK: I have enough data. Let me create the table.
ACT: tool_call: create_document("comparison_table.md", content)
OBSERVE: [file created successfully]
THINK: Task complete. Let me present the results.
ACT: respond_to_user(summary + table)
Core Components¶
1. Tool Use / Function Calling¶
// LLM receives tool definitions
{
"tools": [
{
"name": "search_web",
"description": "Search the internet for information",
"parameters": {
"query": {"type": "string", "description": "Search query"}
}
},
{
"name": "run_python",
"description": "Execute Python code",
"parameters": {
"code": {"type": "string", "description": "Python code to run"}
}
}
]
}
// LLM outputs a tool call (instead of text)
{
"tool_call": {
"name": "search_web",
"arguments": {"query": "latest AI trends 2026"}
}
}
// System executes tool, returns result to LLM
// LLM decides: respond to user, or call another tool
2. Planning Strategies¶
| Strategy | How | When |
|---|---|---|
| ReAct | Think → Act → Observe loop | General-purpose agent tasks |
| Plan-and-Execute | Create full plan first, then execute | Complex multi-step tasks |
| Tree of Thoughts | Explore multiple reasoning paths | Hard reasoning problems |
| Reflexion | Self-reflect on failures, retry | Tasks needing self-correction |
3. Memory Systems¶
| Type | Implementation | Use |
|---|---|---|
| Short-term | Conversation history in context | Current task state |
| Long-term | Vector DB / embeddings | Past interactions, knowledge |
| Episodic | Stored successful strategies | Learn from past tasks |
| Working | Scratchpad / notes during task | Complex reasoning steps |
Agent Types¶
| Type | Description | Example |
|---|---|---|
| Single Agent | One LLM with tools | ChatGPT with web browsing |
| Multi-Agent | Multiple specialized agents collaborating | CrewAI, AutoGen |
| Hierarchical | Manager agent delegates to worker agents | Complex workflows |
| Competitive | Agents debate/challenge each other | Red team / verification |
Popular Frameworks (2025-2026)¶
| Framework | Strengths | Use Case |
|---|---|---|
| LangGraph (LangChain) | Flexible graph-based workflows, stateful | Complex custom agents |
| CrewAI | Multi-agent, role-based collaboration | Team of specialized agents |
| AutoGen (Microsoft) | Multi-agent conversations | Research, code generation |
| OpenAI Assistants API | Managed, easy to start | Simple agents with tools |
| Anthropic Tool Use | Strong coding agents | Developer tools |
| Semantic Kernel | Enterprise .NET/Python | Enterprise integration |
◆ Code & Implementation¶
Simple Agent with LangGraph¶
# pip install langgraph>=0.3 langchain-openai>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langgraph>=0.3
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
@tool
def search_web(query: str) -> str:
"""Search the web for information."""
# Implementation here
return f"Results for: {query}"
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
# ⚠️ SECURITY: eval() used for demo only. In production, use a safe
# expression parser like `simpleeval` or `numexpr`. Never eval() untrusted input.
return str(eval(expression))
# Create LLM with tools
tools = [search_web, calculator]
llm = ChatOpenAI(model="gpt-4o").bind_tools(tools)
# Define the agent logic
def agent_node(state: MessagesState):
response = llm.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: MessagesState):
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
# Build graph — ToolNode handles ToolMessage creation automatically
graph = StateGraph(MessagesState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools)) # Correctly wraps results in ToolMessage
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent") # After tools, back to agent
app = graph.compile()
result = app.invoke({"messages": [("user", "What is 25 * 47?")]})
◆ Strengths vs Limitations¶
| ✅ Strengths | ❌ Limitations |
|---|---|
| Can complete complex multi-step tasks | Unreliable — can get stuck in loops |
| Adapts approach based on observations | Expensive (many LLM calls per task) |
| Can use any tool via function calling | Security risk (executing code, API calls) |
| Handles ambiguous, open-ended goals | Hard to debug and test |
| Multi-agent enables specialization | Latency (multiple reasoning steps) |
◆ Agent Memory¶
MEMORY TYPES:
┌──────────────────────────────────────────────────────┐
│ SHORT-TERM (Working Memory) │
│ = Conversation context window │
│ The messages in the current session. │
│ Lost when session ends. │
├──────────────────────────────────────────────────────┤
│ LONG-TERM (Persistent Memory) │
│ = External storage (vector DB, key-value store) │
│ Facts, preferences, past interactions. │
│ Persists across sessions. │
│ "You told me last week you prefer Python." │
├──────────────────────────────────────────────────────┤
│ EPISODIC (Experience Memory) │
│ = Records of past task executions │
│ "Last time I solved this type of problem, I..." │
│ Enables learning from experience. │
├──────────────────────────────────────────────────────┤
│ PROCEDURAL (How-to Memory) │
│ = Tool usage patterns, successful strategies │
│ "When user asks for data analysis, use Python tool │
│ first, then visualization tool." │
└──────────────────────────────────────────────────────┘
IMPLEMENTATION:
Short-term → Sliding window on conversation history
Long-term → Vector DB (Chroma, Pinecone) + retrieval
Episodic → Summarize and store past task outcomes
Procedural → Fine-tuned behaviors or prompt templates
◆ Framework Comparison (March 2026)¶
| Framework | By | Orchestration | Multi-Agent | Best For |
|---|---|---|---|---|
| LangGraph | LangChain | Graph-based stateful | ✅ | Complex workflows with state |
| CrewAI | Community | Role-based teams | ✅ | Business process automation |
| AutoGen | Microsoft | Chat-based | ✅ | Research, conversational agents |
| ADK | Hierarchical + graph | ✅ | Google ecosystem, production | |
| Semantic Kernel | Microsoft | Plugin-based | ⚠️ Basic | Enterprise .NET/Python |
| Mastra | Community | TypeScript-first | ✅ | JS/TS developers |
For protocols connecting agents (MCP, A2A), see Agentic Protocols.
○ Gotchas & Common Mistakes¶
- ⚠️ Agent ≠ Chatbot with tools: A chatbot uses tools reactively. An agent plans proactively. Don't call everything an "agent."
- ⚠️ Infinite loops: Agents can get stuck retrying failed actions. Always set max iterations.
- ⚠️ Tool description quality: Agents are only as good as their tool descriptions. Bad descriptions = wrong tool choices.
- ⚠️ Over-engineering: Most problems don't need agents. Start with simple chains; add agent complexity only when needed.
- ⚠️ Security: Agents executing code or calling APIs can cause real damage. Always sandbox and add human-in-the-loop for critical actions.
- ⚠️ Memory overflow: Long-term memory without curation becomes noisy. Implement summarization and relevance filtering.
○ Interview Angles¶
- Q: What makes an AI agent different from a chatbot?
-
A: A chatbot responds to messages. An agent sets goals, plans multi-step approaches, uses tools, observes results, and iterates. Agents are autonomous; chatbots are reactive.
-
Q: How would you prevent an AI agent from getting stuck in a loop?
-
A: Max iteration limits, self-reflection prompts ("Am I making progress?"), fallback to human, diverse retry strategies (try different tools/approaches), and logging for debugging.
-
Q: What's the ReAct pattern?
-
A: Reason + Act. The agent alternates between thinking (reasoning about what to do) and acting (calling tools). After each action, it observes the result and reasons about next steps. This interleaving of thought and action is more reliable than planning everything upfront.
-
Q: How does agent memory work?
- A: Four types: short-term (conversation context), long-term (vector DB storing facts/preferences across sessions), episodic (summaries of past task executions for learning), and procedural (learned strategies and tool patterns). In practice, most production agents use short-term + simple long-term memory with vector retrieval.
★ Connections¶
| Relationship | Topics |
|---|---|
| Builds on | Llms Overview, Rag, Prompt Engineering, Function Calling And Structured Output |
| Leads to | Agentic Protocols (MCP/A2A/ADK), Code Generation (coding agents) |
| Compare with | Simple chains (no autonomy), Chatbots (reactive only) |
| Cross-domain | Robotics (embodied agents), Game AI, Control theory |
◆ Production Failure Modes¶
| Failure | Symptoms | Root Cause | Mitigation |
|---|---|---|---|
| Infinite loops | Agent repeats the same action indefinitely, burning tokens | No exit condition, tool output doesn't resolve query | Max iteration limits, loop detection, exponential backoff |
| Tool misuse | Agent calls wrong tool or passes wrong arguments | Ambiguous tool descriptions, too many tools | Clear docstrings, reduce active tool set per task, few-shot examples |
| State corruption | Agent "forgets" previous observations, contradicts itself | Message history exceeds context window | Sliding window with summarization, explicit state tracking |
| Cascading failures | One tool error causes entire workflow to crash | No error handling in tool execution layer | Try/catch per tool call, graceful degradation, fallback tools |
| Cost explosion | Agent runs up large bills on a single query | Unconstrained reasoning depth | Token budgets per run, model routing (expensive for reasoning only) |
◆ Hands-On Exercises¶
Exercise 1: Build an Agent with Guardrails¶
Goal: Create a LangGraph agent with loop detection and cost limits Time: 45 minutes Steps: 1. Build a 3-tool agent (search, calculator, code interpreter) 2. Add max_iterations=10 limit 3. Add duplicate-action detection 4. Add token counting per run 5. Test with adversarial queries designed to trigger loops Expected Output: Agent that gracefully terminates rather than looping, with cost logged
Exercise 2: Compare Agent Architectures¶
Goal: Build the same workflow as ReAct vs Plan-and-Execute and compare Time: 30 minutes Steps: 1. Implement a ReAct loop for a research task 2. Implement a Plan-then-Execute version for the same task 3. Run 5 queries on both 4. Compare cost, latency, and answer quality Expected Output: Comparison table with cost/latency/quality per architecture
★ Recommended Resources¶
| Type | Resource | Why |
|---|---|---|
| 📄 Paper | Anthropic — "Building Effective Agents" (2025) | Industry reference for agent design patterns |
| 📘 Book | "AI Engineering" by Chip Huyen (2025), Ch 7 (Agents) | Practical treatment of agent loops, tools, and memory |
| 🔧 Hands-on | LangGraph Documentation | Build production agent workflows with state management |
| 🎥 Video | Harrison Chase — "What Are AI Agents?" | LangChain creator explaining agent architectures |
★ Sources¶
- LangGraph documentation — https://langchain-ai.github.io/langgraph/
- Anthropic "Building Effective Agents" guide (2025)
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)
- CrewAI documentation — https://docs.crewai.com
- AutoGen documentation — https://microsoft.github.io/autogen/
- Google ADK documentation — https://google.github.io/adk-docs/
- deeplearning.ai, "Agent Memory: Building Memory-Aware Agents" (2025)