Function Calling, Structured Output & Tool Use¶

✨ Bit: An LLM that only generates text is like a brain with no hands. Function calling gives it hands — it can now search the web, query databases, send emails, and execute code. This is what makes LLMs actually useful in production.

★ TL;DR¶

What: Mechanisms for LLMs to (1) call external functions/APIs and (2) return data in strict schemas (JSON, Pydantic)
Why: Every production LLM application uses these. You can't build real apps with free-text responses alone.
Key point: Function calling = LLM decides WHICH tool to use and WHAT arguments to pass. Structured output = LLM returns data in a guaranteed format. Together they make LLMs programmable.

★ Overview¶

Definition¶

Function calling / Tool use: The LLM analyzes a user request, determines that an external function should be called, and generates the function name + arguments as structured JSON. Your code then EXECUTES the function and feeds the result back.
Structured output: Constraining the LLM to return responses in a specific schema (JSON, XML, Pydantic model) instead of free-form text.
Model Context Protocol (MCP): An emerging open standard for connecting LLMs to external tools and data sources.

Scope¶

Covers the patterns, APIs, and protocols. For building full agents with planning loops, see Ai Agents. For retrieval specifically, see Rag.

Significance¶

Every ChatGPT plugin, every Copilot action, every enterprise AI app uses function calling
Structured output eliminates parsing headaches and hallucinated fields
MCP is becoming the USB of AI — one protocol for all tool connections
This is what interviewers mean by "production LLM experience"

★ Deep Dive¶

Function Calling Flow¶

┌─────────────────────────────────────────────────────────┐
│                  FUNCTION CALLING FLOW                    │
│                                                         │
│  1. User: "What's the weather in Tokyo?"                │
│                                                         │
│  2. Your Code → sends message + TOOL DEFINITIONS to LLM│
│     tools = [{                                          │
│       name: "get_weather",                              │
│       parameters: { location: string, unit: string }    │
│     }]                                                  │
│                                                         │
│  3. LLM → decides to call a tool (NOT execute it!)      │
│     Response: {                                         │
│       tool_call: "get_weather",                         │
│       arguments: { location: "Tokyo", unit: "celsius" } │
│     }                                                   │
│                                                         │
│  4. YOUR CODE executes the actual function               │
│     result = get_weather("Tokyo", "celsius")  → "22°C"  │
│                                                         │
│  5. Feed result back to LLM                             │
│     messages.append(tool_result: "22°C")                │
│                                                         │
│  6. LLM generates final answer                          │
│     "The weather in Tokyo is currently 22°C."           │
└─────────────────────────────────────────────────────────┘

KEY: The LLM NEVER executes code. It only decides what to call.
     YOUR code runs the function. Safety is YOUR responsibility.

★ Code & Implementation¶

Function Calling API (OpenAI Pattern)¶

# pip install openai>=1.60
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY env var
from openai import OpenAI
import json

client = OpenAI()

# 1. Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g., 'Tokyo'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# 2. Send message with tools
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"  # auto | none | required | specific function
)

# 3. Check if model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    call = message.tool_calls[0]
    name = call.function.name           # "get_weather"
    args = json.loads(call.function.arguments)  # {"location": "Tokyo"}

    # 4. YOUR code executes the function
    result = get_weather(**args)  # You implement this

    # 5. Feed result back
    messages = [
        {"role": "user", "content": "Weather in Tokyo?"},
        message,  # assistant's tool call
        {"role": "tool", "tool_call_id": call.id, "content": str(result)}
    ]

    # 6. Get final response
    final = client.chat.completions.create(
        model="gpt-4o", messages=messages
    )
    print(final.choices[0].message.content)
    # → "The current weather in Tokyo is 22°C and partly cloudy."

Structured Output¶

# ⚠️ Last tested: 2026-04
# ═══ METHOD 1: JSON Mode (basic) ═══
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List 3 planets"}],
    response_format={"type": "json_object"}
)
# Returns valid JSON, but schema is NOT enforced

# ═══ METHOD 2: Structured Output with Schema (strict) ═══
from pydantic import BaseModel

class Planet(BaseModel):
    name: str
    diameter_km: int
    has_rings: bool

class PlanetList(BaseModel):
    planets: list[Planet]

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List 3 planets with details"}],
    response_format=PlanetList  # Schema is STRICTLY enforced
)

planets = response.choices[0].message.parsed  # → PlanetList object
for p in planets.planets:
    print(f"{p.name}: {p.diameter_km}km, rings={p.has_rings}")

# ═══ METHOD 3: Instructor library (popular in production) ═══
import instructor

client = instructor.from_openai(OpenAI())

planets = client.chat.completions.create(
    model="gpt-4o",
    response_model=PlanetList,
    messages=[{"role": "user", "content": "List 3 planets"}]
)
# Returns validated Pydantic object directly

Model Context Protocol (MCP)¶

MCP = "The USB of AI" — a universal standard for connecting
      LLMs to tools, data sources, and services.

BEFORE MCP:
  Each tool needs custom integration code for each LLM
  OpenAI tools ≠ Claude tools ≠ Gemini tools
  N models × M tools = N×M integrations

WITH MCP:
  Tool implements MCP server → works with ANY MCP client
  N models + M tools = N + M integrations

  ┌────────────┐     MCP     ┌────────────────┐
  │ LLM Client │◄───────────►│ MCP Server     │
  │ (Claude,   │  Protocol   │ (Database,     │
  │  Cursor,   │             │  GitHub,       │
  │  custom)   │             │  Slack, etc.)  │
  └────────────┘             └────────────────┘

MCP CONCEPTS:
  Tools     = Functions the LLM can call
  Resources = Data the LLM can read (like files)
  Prompts   = Templated prompts for common tasks

STATUS (2026): Growing adoption. Claude Desktop, Cursor,
  and many IDE tools support MCP natively.

Grounding Techniques¶

PROBLEM: LLMs hallucinate. They generate plausible but false info.
SOLUTION: Ground responses in external, trusted data.

GROUNDING METHODS (from simple to complex):

  1. SYSTEM PROMPT GROUNDING
     "Only answer based on the following context: ..."
     Simple but limited.

  2. RAG (Retrieval-Augmented Generation)
     Retrieve relevant documents → inject as context → generate
See [Retrieval-Augmented Generation (RAG)](./rag.md) for full details.

  3. FUNCTION CALLING + LIVE DATA
     LLM calls get_stock_price() → gets real-time data
     Most accurate for dynamic information.

  4. KNOWLEDGE GRAPHS
     Structured entity relationships (Company → CEO → Founded)
     Graph databases (Neo4j) + LLM reasoning.

  5. MULTI-SOURCE VERIFICATION
     Query multiple sources → cross-validate → generate
     Highest accuracy, highest latency.

◆ Comparison¶

Feature	JSON Mode	Structured Output	Function Calling
What	Valid JSON output	Schema-enforced output	Call external functions
Schema guaranteed?	❌ (valid JSON, not schema)	✅ (100% schema compliance)	✅ (function signature)
Use case	Simple extraction	Data pipelines, APIs	Tool use, agents
Hallucinated fields?	Possible	No	No (args validated)

◆ Quick Reference¶

WHEN TO USE WHAT:
  Need LLM to call APIs/tools    → Function calling
  Need structured data extraction → Structured output (Pydantic)
  Need basic JSON response        → JSON mode
  Need tool interop standard      → MCP
  Need factual grounding          → RAG + citations

TOOL CHOICE OPTIONS:
  "auto"      → LLM decides whether to call a tool
  "required"  → LLM MUST call at least one tool
  "none"      → LLM cannot call any tools
  {name: "x"} → LLM must call specific tool

LIBRARIES:
  instructor    → Structured output with retries
  marvin        → AI functions with type hints
  langchain     → Tool/agent framework
  pydantic      → Schema definition

○ Gotchas & Common Mistakes¶

⚠️ LLM doesn't execute functions: It only generates the call. YOUR code runs it. Never let the LLM run arbitrary code.
⚠️ Tool descriptions matter enormously: Vague descriptions → wrong tool selection. Be specific and include examples.
⚠️ Parallel tool calls: Models can request multiple tool calls at once. Handle them all before responding.
⚠️ JSON mode ≠ Structured Output: JSON mode guarantees valid JSON but NOT schema compliance. Use structured output for reliable schemas.
⚠️ Cost of tool calling: Each round-trip (user → tool call → result → final answer) doubles token usage.

○ Interview Angles¶

Q: How does function calling work in LLMs?
A: You define tools with names, descriptions, and parameter schemas. The LLM receives the user message + tool definitions, decides if a tool should be called, and generates a JSON object with the function name and arguments. YOUR code executes the function and feeds the result back to the LLM for final response generation. The LLM never actually runs the function.
Q: What is MCP and why does it matter?
A: Model Context Protocol is an open standard for connecting LLMs to external tools. Before MCP, every tool needed custom integration for each model. MCP provides a universal interface — any MCP-compatible tool works with any MCP-compatible client. It's becoming the "USB standard" for AI tool integration.

★ Connections¶

Relationship	Topics
Builds on	Llms Overview, Prompt Engineering
Leads to	Ai Agents (agents = function calling + planning loops), Rag (retrieval as a tool)
Compare with	Direct prompting (text-only), Fine-tuning (embedding knowledge)
Cross-domain	API design, RPC protocols, Software architecture

◆ Production Failure Modes¶

Failure	Symptoms	Root Cause	Mitigation
Schema hallucination	Model invents function names or parameters not in schema	Ambiguous intent, schema too complex	Constrained decoding, explicit examples in system prompt
Argument type mismatch	Function receives string instead of int, null for required field	LLM outputs approximate types	Pydantic validation layer, strict JSON schema with type coercion
Infinite tool loops	Agent calls the same tool repeatedly without progress	No exit condition, tool returns don't resolve query	Max iteration limit, loop detection, escalation to human
Partial extraction	Structured output missing fields or has empty values	Complex input requiring multi-step reasoning	Chain-of-thought before extraction, break into sub-extractions
Format regression across models	Code breaks when switching LLM providers	Different models interpret schemas differently	Provider abstraction layer, model-specific schema adapters

◆ Hands-On Exercises¶

Exercise 1: Build a Structured Data Extractor¶

Goal: Extract structured data from unstructured text using function calling Time: 30 minutes Steps: 1. Define a Pydantic model for a job posting (title, company, salary_range, skills, location) 2. Use OpenAI function calling with esponse_format 3. Test on 5 real job posting descriptions 4. Log extraction accuracy per field Expected Output: JSON extractions with >90% field accuracy

Exercise 2: Handle Tool Calling Edge Cases¶

Goal: Build robust error handling for function calling failures Time: 30 minutes Steps: 1. Create 3 tools (weather, calculator, search) 2. Send 10 queries including ambiguous ones 3. Add validation that catches schema violations 4. Add fallback when tool call fails Expected Output: Error handling reduces failures from ~30% to <5%

★ Recommended Resources¶

Type	Resource	Why
🔧 Hands-on	OpenAI Function Calling Guide	Best documentation for function calling implementation
🔧 Hands-on	Instructor Library	Production library for structured output extraction with Pydantic
📘 Book	"AI Engineering" by Chip Huyen (2025), Ch 6 (Agents)	Covers tool use and structured output in agent architectures
🔧 Hands-on	Anthropic Tool Use Guide	Claude's approach to function calling with examples

★ Sources¶

OpenAI Function Calling Guide — https://platform.openai.com/docs/guides/function-calling
OpenAI Structured Outputs — https://platform.openai.com/docs/guides/structured-outputs
Anthropic Tool Use — https://docs.anthropic.com/en/docs/build-with-claude/tool-use
Model Context Protocol — https://modelcontextprotocol.io
Instructor library — https://python.useinstructor.com