Skip to content

Function Calling, Structured Output & Tool Use

Bit: An LLM that only generates text is like a brain with no hands. Function calling gives it hands — it can now search the web, query databases, send emails, and execute code. This is what makes LLMs actually useful in production.


★ TL;DR

  • What: Mechanisms for LLMs to (1) call external functions/APIs and (2) return data in strict schemas (JSON, Pydantic)
  • Why: Every production LLM application uses these. You can't build real apps with free-text responses alone.
  • Key point: Function calling = LLM decides WHICH tool to use and WHAT arguments to pass. Structured output = LLM returns data in a guaranteed format. Together they make LLMs programmable.

★ Overview

Definition

  • Function calling / Tool use: The LLM analyzes a user request, determines that an external function should be called, and generates the function name + arguments as structured JSON. Your code then EXECUTES the function and feeds the result back.
  • Structured output: Constraining the LLM to return responses in a specific schema (JSON, XML, Pydantic model) instead of free-form text.
  • Model Context Protocol (MCP): An emerging open standard for connecting LLMs to external tools and data sources.

Scope

Covers the patterns, APIs, and protocols. For building full agents with planning loops, see Ai Agents. For retrieval specifically, see Rag.

Significance

  • Every ChatGPT plugin, every Copilot action, every enterprise AI app uses function calling
  • Structured output eliminates parsing headaches and hallucinated fields
  • MCP is becoming the USB of AI — one protocol for all tool connections
  • This is what interviewers mean by "production LLM experience"

★ Deep Dive

Function Calling Flow

┌─────────────────────────────────────────────────────────┐
│                  FUNCTION CALLING FLOW                    │
│                                                         │
│  1. User: "What's the weather in Tokyo?"                │
│                                                         │
│  2. Your Code → sends message + TOOL DEFINITIONS to LLM│
│     tools = [{                                          │
│       name: "get_weather",                              │
│       parameters: { location: string, unit: string }    │
│     }]                                                  │
│                                                         │
│  3. LLM → decides to call a tool (NOT execute it!)      │
│     Response: {                                         │
│       tool_call: "get_weather",                         │
│       arguments: { location: "Tokyo", unit: "celsius" } │
│     }                                                   │
│                                                         │
│  4. YOUR CODE executes the actual function               │
│     result = get_weather("Tokyo", "celsius")  → "22°C"  │
│                                                         │
│  5. Feed result back to LLM                             │
│     messages.append(tool_result: "22°C")                │
│                                                         │
│  6. LLM generates final answer                          │
│     "The weather in Tokyo is currently 22°C."           │
└─────────────────────────────────────────────────────────┘

KEY: The LLM NEVER executes code. It only decides what to call.
     YOUR code runs the function. Safety is YOUR responsibility.

★ Code & Implementation

Function Calling API (OpenAI Pattern)

# pip install openai>=1.60
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY env var
from openai import OpenAI
import json

client = OpenAI()

# 1. Define your tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g., 'Tokyo'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"]
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# 2. Send message with tools
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"  # auto | none | required | specific function
)

# 3. Check if model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    call = message.tool_calls[0]
    name = call.function.name           # "get_weather"
    args = json.loads(call.function.arguments)  # {"location": "Tokyo"}

    # 4. YOUR code executes the function
    result = get_weather(**args)  # You implement this

    # 5. Feed result back
    messages = [
        {"role": "user", "content": "Weather in Tokyo?"},
        message,  # assistant's tool call
        {"role": "tool", "tool_call_id": call.id, "content": str(result)}
    ]

    # 6. Get final response
    final = client.chat.completions.create(
        model="gpt-4o", messages=messages
    )
    print(final.choices[0].message.content)
    # → "The current weather in Tokyo is 22°C and partly cloudy."

Structured Output

# ⚠️ Last tested: 2026-04
# ═══ METHOD 1: JSON Mode (basic) ═══
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List 3 planets"}],
    response_format={"type": "json_object"}
)
# Returns valid JSON, but schema is NOT enforced

# ═══ METHOD 2: Structured Output with Schema (strict) ═══
from pydantic import BaseModel

class Planet(BaseModel):
    name: str
    diameter_km: int
    has_rings: bool

class PlanetList(BaseModel):
    planets: list[Planet]

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List 3 planets with details"}],
    response_format=PlanetList  # Schema is STRICTLY enforced
)

planets = response.choices[0].message.parsed  # → PlanetList object
for p in planets.planets:
    print(f"{p.name}: {p.diameter_km}km, rings={p.has_rings}")

# ═══ METHOD 3: Instructor library (popular in production) ═══
import instructor

client = instructor.from_openai(OpenAI())

planets = client.chat.completions.create(
    model="gpt-4o",
    response_model=PlanetList,
    messages=[{"role": "user", "content": "List 3 planets"}]
)
# Returns validated Pydantic object directly

Model Context Protocol (MCP)

MCP = "The USB of AI" — a universal standard for connecting
      LLMs to tools, data sources, and services.

BEFORE MCP:
  Each tool needs custom integration code for each LLM
  OpenAI tools ≠ Claude tools ≠ Gemini tools
  N models × M tools = N×M integrations

WITH MCP:
  Tool implements MCP server → works with ANY MCP client
  N models + M tools = N + M integrations

  ┌────────────┐     MCP     ┌────────────────┐
  │ LLM Client │◄───────────►│ MCP Server     │
  │ (Claude,   │  Protocol   │ (Database,     │
  │  Cursor,   │             │  GitHub,       │
  │  custom)   │             │  Slack, etc.)  │
  └────────────┘             └────────────────┘

MCP CONCEPTS:
  Tools     = Functions the LLM can call
  Resources = Data the LLM can read (like files)
  Prompts   = Templated prompts for common tasks

STATUS (2026): Growing adoption. Claude Desktop, Cursor,
  and many IDE tools support MCP natively.

Grounding Techniques

PROBLEM: LLMs hallucinate. They generate plausible but false info.
SOLUTION: Ground responses in external, trusted data.

GROUNDING METHODS (from simple to complex):

  1. SYSTEM PROMPT GROUNDING
     "Only answer based on the following context: ..."
     Simple but limited.

  2. RAG (Retrieval-Augmented Generation)
     Retrieve relevant documents → inject as context → generate
See [Retrieval-Augmented Generation (RAG)](./rag.md) for full details.

  3. FUNCTION CALLING + LIVE DATA
     LLM calls get_stock_price() → gets real-time data
     Most accurate for dynamic information.

  4. KNOWLEDGE GRAPHS
     Structured entity relationships (Company → CEO → Founded)
     Graph databases (Neo4j) + LLM reasoning.

  5. MULTI-SOURCE VERIFICATION
     Query multiple sources → cross-validate → generate
     Highest accuracy, highest latency.

◆ Comparison

Feature JSON Mode Structured Output Function Calling
What Valid JSON output Schema-enforced output Call external functions
Schema guaranteed? ❌ (valid JSON, not schema) ✅ (100% schema compliance) ✅ (function signature)
Use case Simple extraction Data pipelines, APIs Tool use, agents
Hallucinated fields? Possible No No (args validated)

◆ Quick Reference

WHEN TO USE WHAT:
  Need LLM to call APIs/tools    → Function calling
  Need structured data extraction → Structured output (Pydantic)
  Need basic JSON response        → JSON mode
  Need tool interop standard      → MCP
  Need factual grounding          → RAG + citations

TOOL CHOICE OPTIONS:
  "auto"      → LLM decides whether to call a tool
  "required"  → LLM MUST call at least one tool
  "none"      → LLM cannot call any tools
  {name: "x"} → LLM must call specific tool

LIBRARIES:
  instructor    → Structured output with retries
  marvin        → AI functions with type hints
  langchain     → Tool/agent framework
  pydantic      → Schema definition

○ Gotchas & Common Mistakes

  • ⚠️ LLM doesn't execute functions: It only generates the call. YOUR code runs it. Never let the LLM run arbitrary code.
  • ⚠️ Tool descriptions matter enormously: Vague descriptions → wrong tool selection. Be specific and include examples.
  • ⚠️ Parallel tool calls: Models can request multiple tool calls at once. Handle them all before responding.
  • ⚠️ JSON mode ≠ Structured Output: JSON mode guarantees valid JSON but NOT schema compliance. Use structured output for reliable schemas.
  • ⚠️ Cost of tool calling: Each round-trip (user → tool call → result → final answer) doubles token usage.

○ Interview Angles

  • Q: How does function calling work in LLMs?
  • A: You define tools with names, descriptions, and parameter schemas. The LLM receives the user message + tool definitions, decides if a tool should be called, and generates a JSON object with the function name and arguments. YOUR code executes the function and feeds the result back to the LLM for final response generation. The LLM never actually runs the function.

  • Q: What is MCP and why does it matter?

  • A: Model Context Protocol is an open standard for connecting LLMs to external tools. Before MCP, every tool needed custom integration for each model. MCP provides a universal interface — any MCP-compatible tool works with any MCP-compatible client. It's becoming the "USB standard" for AI tool integration.

★ Connections

Relationship Topics
Builds on Llms Overview, Prompt Engineering
Leads to Ai Agents (agents = function calling + planning loops), Rag (retrieval as a tool)
Compare with Direct prompting (text-only), Fine-tuning (embedding knowledge)
Cross-domain API design, RPC protocols, Software architecture

◆ Production Failure Modes

Failure Symptoms Root Cause Mitigation
Schema hallucination Model invents function names or parameters not in schema Ambiguous intent, schema too complex Constrained decoding, explicit examples in system prompt
Argument type mismatch Function receives string instead of int, null for required field LLM outputs approximate types Pydantic validation layer, strict JSON schema with type coercion
Infinite tool loops Agent calls the same tool repeatedly without progress No exit condition, tool returns don't resolve query Max iteration limit, loop detection, escalation to human
Partial extraction Structured output missing fields or has empty values Complex input requiring multi-step reasoning Chain-of-thought before extraction, break into sub-extractions
Format regression across models Code breaks when switching LLM providers Different models interpret schemas differently Provider abstraction layer, model-specific schema adapters

◆ Hands-On Exercises

Exercise 1: Build a Structured Data Extractor

Goal: Extract structured data from unstructured text using function calling Time: 30 minutes Steps: 1. Define a Pydantic model for a job posting (title, company, salary_range, skills, location) 2. Use OpenAI function calling with esponse_format 3. Test on 5 real job posting descriptions 4. Log extraction accuracy per field Expected Output: JSON extractions with >90% field accuracy

Exercise 2: Handle Tool Calling Edge Cases

Goal: Build robust error handling for function calling failures Time: 30 minutes Steps: 1. Create 3 tools (weather, calculator, search) 2. Send 10 queries including ambiguous ones 3. Add validation that catches schema violations 4. Add fallback when tool call fails Expected Output: Error handling reduces failures from ~30% to <5%


Type Resource Why
🔧 Hands-on OpenAI Function Calling Guide Best documentation for function calling implementation
🔧 Hands-on Instructor Library Production library for structured output extraction with Pydantic
📘 Book "AI Engineering" by Chip Huyen (2025), Ch 6 (Agents) Covers tool use and structured output in agent architectures
🔧 Hands-on Anthropic Tool Use Guide Claude's approach to function calling with examples

★ Sources

  • OpenAI Function Calling Guide — https://platform.openai.com/docs/guides/function-calling
  • OpenAI Structured Outputs — https://platform.openai.com/docs/guides/structured-outputs
  • Anthropic Tool Use — https://docs.anthropic.com/en/docs/build-with-claude/tool-use
  • Model Context Protocol — https://modelcontextprotocol.io
  • Instructor library — https://python.useinstructor.com