Skip to content

AI Agents

Bit: 2024 was "year of the chatbot." 2025-2026 is "year of the agent." The difference? Chatbots answer. Agents do.


★ TL;DR

  • What: AI systems that autonomously plan, reason, use tools, and take multi-step actions to achieve goals
  • Why: The biggest paradigm shift in GenAI since ChatGPT. Moves AI from "answer questions" to "complete tasks"
  • Key point: An agent = LLM + Planning + Memory + Tools. The LLM is the brain, not the whole system.

★ Overview

Definition

An AI Agent is a system where an LLM acts as a reasoning engine that can: (1) understand goals, (2) break them into sub-tasks, (3) decide which tools to use, (4) execute actions, (5) observe results, and (6) iterate until the goal is achieved — with minimal human intervention.

Scope

Covers: Agent architecture, tool use, planning patterns, multi-agent systems, and frameworks. For the underlying LLM, see Llms Overview. For RAG as a tool agents use, see Rag. For specialized coordination patterns, see Multi-Agent Architectures. For tracing and scoring agent runs, see Agent Evaluation & Observability.

Significance

  • Defining trend of 2025-2026: Every major AI company is building agent capabilities
  • Claude Opus 4 was specifically designed for agentic workflows
  • Enterprise adoption is scaling: customer support, code generation, data analysis
  • Projected to transform knowledge work more fundamentally than chatbots did

Prerequisites


★ Deep Dive

Agent Architecture

┌─────────────────────────────────────────────────────┐
│                    AI AGENT                          │
│                                                     │
│  ┌───────────────────────────────────────────┐      │
│  │           LLM (the brain)                 │      │
│  │  - Understands goals                      │      │
│  │  - Reasons about next steps               │      │
│  │  - Generates tool calls                   │      │
│  └───────────────────────────────────────────┘      │
│       │              │              │                │
│       ▼              ▼              ▼                │
│  ┌─────────┐   ┌──────────┐   ┌──────────┐         │
│  │ PLANNING│   │  MEMORY  │   │  TOOLS   │         │
│  │         │   │          │   │          │         │
│  │ - Task  │   │ - Short  │   │ - Search │         │
│  │   decomp│   │   (conv) │   │ - Code   │         │
│  │ - Step  │   │ - Long   │   │ - APIs   │         │
│  │   by    │   │   (RAG/  │   │ - Files  │         │
│  │   step  │   │   DB)    │   │ - Browse │         │
│  └─────────┘   └──────────┘   └──────────┘         │
│                                                     │
│  ┌───────────────────────────────────────────┐      │
│  │         OBSERVATION & FEEDBACK LOOP       │      │
│  │  Tool result → Reason → Next action → ... │      │
│  └───────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────┘

The Agent Loop (ReAct Pattern)

User Goal: "Research the top 3 competitors and create a comparison table"

THINK: I need to search for competitors first.
ACT:   tool_call: web_search("top competitors for [product]")
OBSERVE: [search results returned]

THINK: I found 5 competitors. Let me get details on top 3.
ACT:   tool_call: web_search("competitor A features pricing")
OBSERVE: [detailed results]

THINK:  I have enough data. Let me create the table.
ACT:   tool_call: create_document("comparison_table.md", content)
OBSERVE: [file created successfully]

THINK: Task complete. Let me present the results.
ACT:   respond_to_user(summary + table)

Core Components

1. Tool Use / Function Calling

// LLM receives tool definitions
{
  "tools": [
    {
      "name": "search_web",
      "description": "Search the internet for information",
      "parameters": {
        "query": {"type": "string", "description": "Search query"}
      }
    },
    {
      "name": "run_python",
      "description": "Execute Python code",
      "parameters": {
        "code": {"type": "string", "description": "Python code to run"}
      }
    }
  ]
}

// LLM outputs a tool call (instead of text)
{
  "tool_call": {
    "name": "search_web",
    "arguments": {"query": "latest AI trends 2026"}
  }
}

// System executes tool, returns result to LLM
// LLM decides: respond to user, or call another tool

2. Planning Strategies

Strategy How When
ReAct Think → Act → Observe loop General-purpose agent tasks
Plan-and-Execute Create full plan first, then execute Complex multi-step tasks
Tree of Thoughts Explore multiple reasoning paths Hard reasoning problems
Reflexion Self-reflect on failures, retry Tasks needing self-correction

3. Memory Systems

Type Implementation Use
Short-term Conversation history in context Current task state
Long-term Vector DB / embeddings Past interactions, knowledge
Episodic Stored successful strategies Learn from past tasks
Working Scratchpad / notes during task Complex reasoning steps

Agent Types

Type Description Example
Single Agent One LLM with tools ChatGPT with web browsing
Multi-Agent Multiple specialized agents collaborating CrewAI, AutoGen
Hierarchical Manager agent delegates to worker agents Complex workflows
Competitive Agents debate/challenge each other Red team / verification
Framework Strengths Use Case
LangGraph (LangChain) Flexible graph-based workflows, stateful Complex custom agents
CrewAI Multi-agent, role-based collaboration Team of specialized agents
AutoGen (Microsoft) Multi-agent conversations Research, code generation
OpenAI Assistants API Managed, easy to start Simple agents with tools
Anthropic Tool Use Strong coding agents Developer tools
Semantic Kernel Enterprise .NET/Python Enterprise integration

◆ Code & Implementation

Simple Agent with LangGraph

# pip install langgraph>=0.3 langchain-openai>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langgraph>=0.3

from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    # Implementation here
    return f"Results for: {query}"

@tool
def calculator(expression: str) -> str:
    """Evaluate a math expression."""
    # ⚠️ SECURITY: eval() used for demo only. In production, use a safe
    # expression parser like `simpleeval` or `numexpr`. Never eval() untrusted input.
    return str(eval(expression))

# Create LLM with tools
tools = [search_web, calculator]
llm = ChatOpenAI(model="gpt-4o").bind_tools(tools)

# Define the agent logic
def agent_node(state: MessagesState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

def should_continue(state: MessagesState):
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return END

# Build graph — ToolNode handles ToolMessage creation automatically
graph = StateGraph(MessagesState)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))  # Correctly wraps results in ToolMessage
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
graph.add_edge("tools", "agent")  # After tools, back to agent

app = graph.compile()
result = app.invoke({"messages": [("user", "What is 25 * 47?")]})

◆ Strengths vs Limitations

✅ Strengths ❌ Limitations
Can complete complex multi-step tasks Unreliable — can get stuck in loops
Adapts approach based on observations Expensive (many LLM calls per task)
Can use any tool via function calling Security risk (executing code, API calls)
Handles ambiguous, open-ended goals Hard to debug and test
Multi-agent enables specialization Latency (multiple reasoning steps)

◆ Agent Memory

MEMORY TYPES:
  ┌──────────────────────────────────────────────────────┐
  │  SHORT-TERM (Working Memory)                         │
  │  = Conversation context window                       │
  │  The messages in the current session.                │
  │  Lost when session ends.                             │
  ├──────────────────────────────────────────────────────┤
  │  LONG-TERM (Persistent Memory)                       │
  │  = External storage (vector DB, key-value store)     │
  │  Facts, preferences, past interactions.              │
  │  Persists across sessions.                           │
  │  "You told me last week you prefer Python."          │
  ├──────────────────────────────────────────────────────┤
  │  EPISODIC (Experience Memory)                        │
  │  = Records of past task executions                   │
  │  "Last time I solved this type of problem, I..."     │
  │  Enables learning from experience.                   │
  ├──────────────────────────────────────────────────────┤
  │  PROCEDURAL (How-to Memory)                          │
  │  = Tool usage patterns, successful strategies        │
  │  "When user asks for data analysis, use Python tool  │
  │   first, then visualization tool."                   │
  └──────────────────────────────────────────────────────┘

IMPLEMENTATION:
  Short-term  → Sliding window on conversation history
  Long-term   → Vector DB (Chroma, Pinecone) + retrieval
  Episodic    → Summarize and store past task outcomes
  Procedural  → Fine-tuned behaviors or prompt templates

◆ Framework Comparison (March 2026)

Framework By Orchestration Multi-Agent Best For
LangGraph LangChain Graph-based stateful Complex workflows with state
CrewAI Community Role-based teams Business process automation
AutoGen Microsoft Chat-based Research, conversational agents
ADK Google Hierarchical + graph Google ecosystem, production
Semantic Kernel Microsoft Plugin-based ⚠️ Basic Enterprise .NET/Python
Mastra Community TypeScript-first JS/TS developers

For protocols connecting agents (MCP, A2A), see Agentic Protocols.


○ Gotchas & Common Mistakes

  • ⚠️ Agent ≠ Chatbot with tools: A chatbot uses tools reactively. An agent plans proactively. Don't call everything an "agent."
  • ⚠️ Infinite loops: Agents can get stuck retrying failed actions. Always set max iterations.
  • ⚠️ Tool description quality: Agents are only as good as their tool descriptions. Bad descriptions = wrong tool choices.
  • ⚠️ Over-engineering: Most problems don't need agents. Start with simple chains; add agent complexity only when needed.
  • ⚠️ Security: Agents executing code or calling APIs can cause real damage. Always sandbox and add human-in-the-loop for critical actions.
  • ⚠️ Memory overflow: Long-term memory without curation becomes noisy. Implement summarization and relevance filtering.

○ Interview Angles

  • Q: What makes an AI agent different from a chatbot?
  • A: A chatbot responds to messages. An agent sets goals, plans multi-step approaches, uses tools, observes results, and iterates. Agents are autonomous; chatbots are reactive.

  • Q: How would you prevent an AI agent from getting stuck in a loop?

  • A: Max iteration limits, self-reflection prompts ("Am I making progress?"), fallback to human, diverse retry strategies (try different tools/approaches), and logging for debugging.

  • Q: What's the ReAct pattern?

  • A: Reason + Act. The agent alternates between thinking (reasoning about what to do) and acting (calling tools). After each action, it observes the result and reasons about next steps. This interleaving of thought and action is more reliable than planning everything upfront.

  • Q: How does agent memory work?

  • A: Four types: short-term (conversation context), long-term (vector DB storing facts/preferences across sessions), episodic (summaries of past task executions for learning), and procedural (learned strategies and tool patterns). In practice, most production agents use short-term + simple long-term memory with vector retrieval.

★ Connections

Relationship Topics
Builds on Llms Overview, Rag, Prompt Engineering, Function Calling And Structured Output
Leads to Agentic Protocols (MCP/A2A/ADK), Code Generation (coding agents)
Compare with Simple chains (no autonomy), Chatbots (reactive only)
Cross-domain Robotics (embodied agents), Game AI, Control theory

◆ Production Failure Modes

Failure Symptoms Root Cause Mitigation
Infinite loops Agent repeats the same action indefinitely, burning tokens No exit condition, tool output doesn't resolve query Max iteration limits, loop detection, exponential backoff
Tool misuse Agent calls wrong tool or passes wrong arguments Ambiguous tool descriptions, too many tools Clear docstrings, reduce active tool set per task, few-shot examples
State corruption Agent "forgets" previous observations, contradicts itself Message history exceeds context window Sliding window with summarization, explicit state tracking
Cascading failures One tool error causes entire workflow to crash No error handling in tool execution layer Try/catch per tool call, graceful degradation, fallback tools
Cost explosion Agent runs up large bills on a single query Unconstrained reasoning depth Token budgets per run, model routing (expensive for reasoning only)

◆ Hands-On Exercises

Exercise 1: Build an Agent with Guardrails

Goal: Create a LangGraph agent with loop detection and cost limits Time: 45 minutes Steps: 1. Build a 3-tool agent (search, calculator, code interpreter) 2. Add max_iterations=10 limit 3. Add duplicate-action detection 4. Add token counting per run 5. Test with adversarial queries designed to trigger loops Expected Output: Agent that gracefully terminates rather than looping, with cost logged

Exercise 2: Compare Agent Architectures

Goal: Build the same workflow as ReAct vs Plan-and-Execute and compare Time: 30 minutes Steps: 1. Implement a ReAct loop for a research task 2. Implement a Plan-then-Execute version for the same task 3. Run 5 queries on both 4. Compare cost, latency, and answer quality Expected Output: Comparison table with cost/latency/quality per architecture


Type Resource Why
📄 Paper Anthropic — "Building Effective Agents" (2025) Industry reference for agent design patterns
📘 Book "AI Engineering" by Chip Huyen (2025), Ch 7 (Agents) Practical treatment of agent loops, tools, and memory
🔧 Hands-on LangGraph Documentation Build production agent workflows with state management
🎥 Video Harrison Chase — "What Are AI Agents?" LangChain creator explaining agent architectures

★ Sources

  • LangGraph documentation — https://langchain-ai.github.io/langgraph/
  • Anthropic "Building Effective Agents" guide (2025)
  • Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022)
  • CrewAI documentation — https://docs.crewai.com
  • AutoGen documentation — https://microsoft.github.io/autogen/
  • Google ADK documentation — https://google.github.io/adk-docs/
  • deeplearning.ai, "Agent Memory: Building Memory-Aware Agents" (2025)