Conversational AI & Dialogue Systems¶
✨ Bit: A good chatbot answers questions. A great conversational system manages context, clarifies intent, recovers from confusion, escalates when needed, and knows when to shut up.
★ TL;DR¶
- What: The design of systems that maintain coherent, multi-turn interaction with users through text or voice
- Why: Conversation is not just generation — it is state management, turn-taking, recovery, and UX design. Getting this wrong means users abandon even if the model is brilliant.
- Key point: The hard part is almost never "make it answer" — it's "make it behave coherently over time" (memory, recovery, escalation, latency)
★ Overview¶
Definition¶
Conversational AI systems support ongoing user interaction across turns, using memory, dialogue state, tools, and policy to produce useful, coherent responses. They range from simple FAQ bots to complex multi-modal voice agents.
Scope¶
Covers: Dialogue state management, memory strategies, conversation design patterns, framework comparison, production implementation, voice-specific challenges, and evaluation. For speech-specific infrastructure (ASR/TTS/VAD), see Voice AI & Speech. For the underlying agent patterns, see AI Agents.
Significance¶
- Largest GenAI deployment surface: Customer support, internal copilots, scheduling assistants, and healthcare triage all use conversational AI
- Product-critical: Conversation quality depends as much on orchestration and UX rules as on model quality. A brilliantly smart model in a poorly designed conversation system feels broken.
- Interview staple: System design interviews for AI roles frequently ask "design a customer support chatbot" or "design a conversational assistant"
Prerequisites¶
- AI Agents — agent loop, tool use, memory
- Function Calling and Structured Output — how LLMs call tools
- Voice AI & Speech — for voice conversation patterns
- Prompt Engineering — system prompts and persona design
★ Deep Dive¶
What Makes Conversation Hard¶
Unlike one-shot generation, dialogue systems must manage:
┌──────────────────────────────────────────────────────────────────┐
│ WHY CONVERSATION IS HARD │
├──────────────────────────────────────────────────────────────────┤
│ │
│ 1. INTENT TRACKING "I want to reschedule" → which meeting? │
│ across turns with whom? what constraints? │
│ │
│ 2. AMBIGUITY "Can you make it earlier?" │
│ resolution Earlier today? Earlier in the week? │
│ │
│ 3. CONTEXT WINDOW Turn 1: user name, role, problem │
│ management Turn 15: should we still remember T1? │
│ │
│ 4. INTERRUPTION User changes topic mid-flow │
│ & correction "Actually, forget that — let's..." │
│ │
│ 5. RECOVERY ASR error, misunderstood intent │
│ & repair "No, I said NEW YORK, not Newark" │
│ │
│ 6. ESCALATION When to hand off to human │
│ decisions When to refuse, when to retry │
│ │
│ 7. LATENCY Voice: 200ms VAD → 500ms total response │
│ constraints Text: TTFT < 500ms or users click away │
│ │
└──────────────────────────────────────────────────────────────────┘
Conversation Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ CONVERSATIONAL AI SYSTEM │
│ │
│ User Input ──► [ASR/Text] ──► [Turn Manager] ──► [Response] │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ DIALOGUE │ │ MEMORY │ │ TOOLS │ │
│ │ STATE │ │ POLICY │ │ LAYER │ │
│ │ │ │ │ │ │ │
│ │ - Intent │ │ - Short │ │ - Search │ │
│ │ - Slots │ │ term │ │ - CRM │ │
│ │ - Phase │ │ - Long │ │ - Calendar│ │
│ │ - History│ │ term │ │ - APIs │ │
│ └──────────┘ │ - Summary│ └──────────┘ │
│ └──────────┘ │
│ │ │
│ ┌──────────┐ │
│ │ SAFETY │ │
│ │ POLICY │ │
│ │ │ │
│ │ - Refusal│ │
│ │ - Escal. │ │
│ │ - PII │ │
│ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
Core Dialogue Components¶
| Component | What It Does | Implementation |
|---|---|---|
| Turn Manager | Decides how the system responds each turn — clarify, answer, use tool, or escalate | LLM with structured output or state machine |
| Dialogue State | Tracks what matters from the conversation (intent, slots, phase) | Pydantic model or typed dict |
| Memory Policy | Decides what to keep, summarize, or discard | Sliding window + periodic summary |
| Tool Layer | Connects to search, CRM, scheduling, or business systems | Function calling / MCP |
| Safety Policy | Handles refusals, escalation, PII scrubbing, and risky actions | Guardrails layer (pre/post) |
| Persona | Defines tone, vocabulary, behavior rules | System prompt + few-shot examples |
Conversation Design Patterns¶
| Pattern | Architecture | Best For | Limitation |
|---|---|---|---|
| Stateless RAG chat | Query → retrieve → generate. No turn memory. | FAQ, documentation search | No continuity across turns |
| Context-window memory | Append all messages to context | Short interactions (< 10 turns) | Expensive, fills context window |
| Summarized memory | Periodically summarize old turns, keep recent ones | Longer sessions (10-50 turns) | Summary drift, information loss |
| State-machine + LLM | Hard-coded flow graph with LLM for NLU/NLG in each node | Structured workflows (booking, support tickets) | Less flexible, brittle edges |
| Agentic dialogue | LLM decides when to use tools, clarify, or answer | Task completion with external systems | Harder to evaluate, more expensive |
| Hybrid (graph + agent) | LangGraph: structured flow with LLM decision nodes | Production systems needing both structure and flexibility | More complex to build/test |
Dialogue State Management¶
{
"session_id": "abc-123",
"user_goal": "reschedule my interview",
"conversation_phase": "slot_filling",
"known_slots": {
"date": "next Tuesday",
"role": "backend engineer",
"company": null,
"time_preference": null
},
"pending_question": "What time works best for you?",
"last_action": "calendar_lookup",
"turn_count": 4,
"escalation_trigger": false,
"confidence": 0.85
}
Key principle: State should capture the minimum information needed to continue the conversation if the context window were cleared. It's a product decision, not a dump of everything.
Memory Strategies¶
| Strategy | How It Works | When to Use | Risk |
|---|---|---|---|
| Full history | Keep all messages in context | < 10 turns, non-sensitive | Context overflow, cost explosion |
| Sliding window | Keep last N turns, drop older ones | Medium conversations | Loses early context |
| Summary + recent | Summarize turns 1-N, keep N+1 to now verbatim | Long conversations | Summary drift, hallucinated "memories" |
| Entity extraction | Extract key facts into structured state | Customer support, booking | Extraction errors compound |
| Hybrid | Structured state + summary + last 3 turns | Production systems | Most complex to implement |
Voice vs Text Conversations¶
| Dimension | Text | Voice |
|---|---|---|
| Latency tolerance | 2-3 seconds acceptable | > 500ms feels laggy, > 1s is broken |
| Input errors | Typos (minor) | ASR errors ("New York" → "Newark") — critical |
| Turn-taking | Explicit (user hits send) | Implicit (VAD detects end-of-speech) |
| Interruption | User can edit before sending | User talks over the bot — must handle |
| Repair | User re-types | "No, I said..." — bot must recover gracefully |
| Output length | Long responses OK | Keep responses < 30 seconds of speech |
| Emotional cues | Limited to text tone | Tone, pace, volume detectable |
Critical voice latency thresholds (as of 2026): - VAD detection: < 200ms (when user stops speaking) - Total response time: < 500ms (VAD → first audio output) - Human-like pause: 300-500ms delay feels natural; < 200ms feels robotic
Framework Comparison (April 2026)¶
| Framework | Type | Multi-turn | Tool Use | Best For |
|---|---|---|---|---|
| LangGraph | Graph-based agent | ✅ State persistence | ✅ Function calling | Custom conversation flows with complex state |
| Rasa | Open-source NLU + dialogue | ✅ Tracker store | ✅ Custom actions | Enterprise on-prem, privacy-sensitive |
| Voiceflow | No-code conversation design | ✅ Visual builder | ✅ API integrations | Rapid prototyping, non-technical teams |
| Dialogflow CX | Google Cloud managed | ✅ Session state | ✅ Webhooks/fulfillment | Google ecosystem, voice + text |
| Amazon Lex | AWS managed | ✅ Session attributes | ✅ Lambda fulfillment | AWS ecosystem, Alexa integration |
| Chainlit/Streamlit | Python UI frameworks | ⚠️ Basic | ✅ Via LangChain | Demos, internal tools, prototyping |
★ Code & Implementation¶
Multi-Turn Conversation with LangGraph¶
# pip install langgraph>=0.2 langchain-openai>=0.2 langchain-core>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langgraph>=0.2
from typing import TypedDict, Annotated, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
# 1. Define conversation state
class ConversationState(TypedDict):
messages: Annotated[list, add_messages]
intent: str
slots: dict
turn_count: int
# 2. Define the conversation system prompt
SYSTEM_PROMPT = """You are a helpful scheduling assistant. Your job is to:
1. Understand what the user wants to schedule/reschedule
2. Collect required information: date, time, participants
3. Confirm details before taking action
4. Use tools when you have enough information
Always be concise. Ask ONE clarifying question at a time.
If you don't understand, say so and ask the user to rephrase."""
# 3. Create the conversation node
llm = ChatOpenAI(model="gpt-4o", temperature=0.3)
def conversation_node(state: ConversationState) -> dict:
"""Main conversation turn — LLM processes input and responds."""
messages = [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
response = llm.invoke(messages)
return {
"messages": [response],
"turn_count": state.get("turn_count", 0) + 1,
}
def should_continue(state: ConversationState) -> Literal["continue", "end"]:
"""Check if conversation should continue or end."""
if state.get("turn_count", 0) >= 20: # Safety limit
return "end"
return "continue"
# 4. Build the conversation graph
graph = StateGraph(ConversationState)
graph.add_node("chat", conversation_node)
graph.add_edge(START, "chat")
graph.add_conditional_edges("chat", should_continue, {
"continue": END, # Returns to user for next input
"end": END,
})
app = graph.compile()
# 5. Run a multi-turn conversation
state = {"messages": [], "intent": "", "slots": {}, "turn_count": 0}
# Turn 1
state = app.invoke({
**state,
"messages": [HumanMessage(content="I need to reschedule my interview")]
})
print(f"Bot: {state['messages'][-1].content}")
# Turn 2
state = app.invoke({
**state,
"messages": [HumanMessage(content="Next Tuesday afternoon, anytime after 2pm")]
})
print(f"Bot: {state['messages'][-1].content}")
# Expected output:
# Bot: I'd be happy to help you reschedule your interview. Could you tell me:
# - Which interview is this for (role/company)?
# Bot: Got it — next Tuesday after 2pm. Let me check available slots.
# Would 2:30 PM or 3:00 PM work better for you?
Conversation Memory with Summarization¶
# pip install langchain-openai>=0.2 langchain-core>=0.3
# ⚠️ Last tested: 2026-04 | Requires: langchain-openai>=0.2
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
class ConversationMemory:
"""Hybrid memory: summary of old turns + recent turns verbatim."""
def __init__(self, max_recent_turns: int = 6, summarize_every: int = 10):
self.messages: list = []
self.summary: str = ""
self.max_recent = max_recent_turns
self.summarize_every = summarize_every
self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
def add_turn(self, user_msg: str, ai_msg: str):
self.messages.append(HumanMessage(content=user_msg))
self.messages.append(AIMessage(content=ai_msg))
# Summarize when history gets long
if len(self.messages) > self.summarize_every * 2:
self._summarize_old_turns()
def _summarize_old_turns(self):
"""Compress old turns into a summary, keep recent ones verbatim."""
old = self.messages[:-self.max_recent * 2]
recent = self.messages[-self.max_recent * 2:]
summary_prompt = f"""Summarize this conversation history into key facts and decisions.
Previous summary: {self.summary}
New messages to summarize:
{chr(10).join(f'{m.type}: {m.content}' for m in old)}
Output a concise summary of all important facts, user preferences, and decisions made."""
result = self.llm.invoke([HumanMessage(content=summary_prompt)])
self.summary = result.content
self.messages = recent # Keep only recent turns
def get_context(self) -> list:
"""Return the full context for the next LLM call."""
context = []
if self.summary:
context.append(SystemMessage(
content=f"Summary of earlier conversation:\n{self.summary}"
))
context.extend(self.messages)
return context
# Usage
memory = ConversationMemory(max_recent_turns=4, summarize_every=8)
memory.add_turn("I need to book a flight to NYC", "Sure! When do you want to fly?")
memory.add_turn("Next Friday", "One-way or round trip?")
memory.add_turn("Round trip, back on Sunday", "How many passengers?")
memory.add_turn("Just me", "Let me search for flights...")
context = memory.get_context()
print(f"Context messages: {len(context)}")
# Expected output: Context messages: 8 (4 turns × 2 messages each)
# After 8+ turns, old ones get summarized automatically
◆ Comparison¶
| Aspect | Stateless RAG Chat | LangGraph Conversation | Rasa | Voiceflow |
|---|---|---|---|---|
| Multi-turn state | ❌ None | ✅ Full graph state | ✅ Tracker store | ✅ Visual state |
| Learning curve | Low | Medium-High | High | Low |
| Customization | High | Very High | High | Medium |
| Voice support | ❌ | Via integration | ❌ (text-only) | ✅ Native |
| Production ready | ⚠️ | ✅ | ✅ | ✅ |
| Cost | Per-API-call | OSS + LLM costs | OSS | SaaS pricing |
| Best for | FAQ, search | Custom agents | Enterprise NLU | Rapid prototyping |
◆ Quick Reference¶
CONVERSATION DESIGN CHECKLIST:
□ Define clear conversation boundaries (what it does / doesn't do)
□ Design clarification flows for ambiguous inputs
□ Implement repair strategies for misunderstandings
□ Set up human escalation path for edge cases
□ Separate persona from policy in system prompt
□ Add PII scrubbing for sensitive conversations
□ Set max turn limits to prevent infinite loops
□ Test with adversarial inputs (off-topic, abusive, injection)
LATENCY TARGETS:
Text chatbot: TTFT < 500ms, total < 3s
Voice agent: VAD < 200ms, total response < 500ms
Streaming: First chunk < 200ms
MEMORY RULES OF THUMB:
< 10 turns: Full history in context window
10-50 turns: Summary + last 6 turns
50+ turns: Entity extraction + summary + last 4 turns
Across sessions: Long-term memory (vector DB / key-value store)
◆ Production Failure Modes¶
| Failure | Symptoms | Root Cause | Mitigation |
|---|---|---|---|
| Context overflow | Bot "forgets" early turns, gives contradictory answers | Conversation exceeds context window, old turns silently dropped | Implement summarization memory, set explicit context budget |
| Slot confusion | Bot mixes up entities ("Your flight to LA" when user said NYC) | Poor entity extraction, ambiguous references not resolved | Use structured state with explicit slot tracking, confirm before acting |
| Repair deadlock | Bot and user stuck in clarification loop ("I don't understand" × 5) | No escalation path, overly strict intent matching | Max clarification attempts (3), then offer human handoff or menu |
| Summary drift | Bot confidently states things that were never said | Summarization hallucinated facts from old turns | Validate summaries against source messages, use extractive summaries |
| Persona bleed | Bot breaks character, reveals system prompt content | Adversarial prompting, context pollution | Separate persona/policy prompts, use guardrails for prompt injection |
| Voice interruption failure | Bot keeps talking after user interrupts, or cuts off prematurely | Bad VAD tuning, no barge-in support | Tune VAD sensitivity, implement barge-in (stop TTS on new speech) |
○ Gotchas & Common Mistakes¶
- ⚠️ More memory ≠ better conversations: Keeping everything amplifies confusion. Curate what to remember.
- ⚠️ Conversational polish can hide weak task completion: A friendly bot that never books the meeting is still a failure.
- ⚠️ Persona and policy should not be mixed: "Be casual and fun!" in the same prompt as "Never discuss competitor pricing" creates conflicts. Separate them.
- ⚠️ Teams under-design repair flows: The happy path gets all the attention. Misunderstandings, corrections, and "actually I meant..." are where users really judge quality.
- ⚠️ Testing with your own team ≠ testing with users: Your team knows how the bot works. Real users will ask things you never imagined.
○ Interview Angles¶
- Q: How is conversational AI different from a basic chatbot?
-
A: A basic chatbot generates locally plausible replies — it answers the current message without tracking state. A conversational AI system manages dialogue state across turns (tracking intent, confirmed slots, pending questions), handles ambiguity through clarification, recovers from misunderstandings, uses tools to take real actions, and knows when to escalate to a human. The key difference is that a conversational system has explicit state management (what has been said, what's confirmed, what's pending) rather than relying purely on the LLM's context window to "remember" everything.
-
Q: Design a customer support chatbot for an e-commerce company.
-
A: I'd start by defining the scope: order status, returns/refunds, product questions, and escalation to human agents. The architecture would be a LangGraph-based conversation flow with: (1) an intent classifier node that routes to specialized sub-flows, (2) structured state tracking order IDs, customer info, and issue type, (3) tool integrations for order lookup, return initiation, and ticket creation, (4) a summarization memory layer for conversations > 10 turns, (5) guardrails for PII handling and policy compliance. For latency, I'd target TTFT < 500ms with streaming. For evaluation, I'd track task completion rate, turns-to-resolution, escalation rate, and CSAT scores. The critical design decision is the escalation policy — I'd implement confidence-based routing where the bot hands off proactively when confidence drops below 0.7, rather than waiting for the user to ask for a human.
-
Q: What should a conversational system remember and forget?
- A: This is a product decision, not a technical one. Remember: user's stated goal, confirmed facts (slots), tool results, and explicit preferences. Forget: rejected alternatives, small talk, verbose explanations, and intermediate reasoning steps. The implementation I'd use is a hybrid: structured state for confirmed facts (a Pydantic model with intent, slots, phase), periodic summarization for conversation flow, and the last 4-6 turns verbatim for immediate context. Critical rule: never "remember" something that was said in a summary that wasn't in the original messages — that's how summary drift causes hallucinated memories.
◆ Hands-On Exercises¶
Exercise 1: Build a Multi-Turn Booking Assistant¶
Goal: Build a conversation that collects booking information across multiple turns
Time: 60 minutes
Steps:
1. Define a BookingState with slots: date, time, participants, room_type
2. Implement a LangGraph conversation that asks clarifying questions until all slots are filled
3. Add a confirmation step before "booking"
4. Test with 3 scenarios: cooperative user, ambiguous user, user who changes mind
Expected Output: Working multi-turn bot that correctly fills all slots across 3-7 turns
Exercise 2: Add Summarization Memory¶
Goal: Implement conversation memory that handles 20+ turn conversations Time: 45 minutes Steps: 1. Start with the ConversationMemory class from the Code section 2. Run a 25-turn simulated conversation about travel planning 3. Verify that the summary correctly preserves key facts from early turns 4. Test edge case: user corrects a fact from turn 2 in turn 20 — does the system handle it? Expected Output: Memory system that maintains coherence over 25+ turns with < 4 messages in context
★ Connections¶
| Relationship | Topics |
|---|---|
| Builds on | AI Agents, Voice AI & Speech, Function Calling |
| Leads to | Customer support systems, AI System Design, Voice agents |
| Compare with | One-shot RAG chat, Search interfaces, Static FAQ bots |
| Cross-domain | UX writing, Conversation design (Google, Voiceflow), Contact-center operations |
★ Recommended Resources¶
| Type | Resource | Why |
|---|---|---|
| 📘 Book | "AI Engineering" by Chip Huyen (2025), Ch 7 (Agents) | Practical treatment of conversation state management and memory patterns |
| 🎓 Course | deeplearning.ai — "Building Agentic RAG with LlamaIndex" | Hands-on implementation of conversational retrieval with state |
| 🔧 Hands-on | LangGraph Tutorials — Customer Support Bot | Step-by-step guide to building a production conversation system |
| 🎥 Video | Google — Conversation Design Best Practices | The definitive guide to conversation UX — persona, repair, turn-taking |
| 📄 Paper | Roller et al. "Recipes for Building an Open-Domain Chatbot" (2021) | Facebook's analysis of what makes conversations work — blending, empathy, knowledge |
| 🔧 Hands-on | Rasa Open Source Documentation | Most mature open-source dialogue framework — excellent for learning NLU + dialogue management concepts |
| 📘 Book | "Designing Voice User Interfaces" by Cathy Pearl | Gold standard for voice conversation design — turn-taking, repair, persona |
★ Sources¶
- Google Conversation Design Guidelines — https://designguidelines.withgoogle.com/conversation/
- Microsoft Bot Framework Design Guidance — https://docs.microsoft.com/en-us/azure/bot-service/bot-service-design-principles
- LangGraph Documentation — https://langchain-ai.github.io/langgraph/
- Rasa Documentation — https://rasa.com/docs/
- Roller et al. "Recipes for Building an Open-Domain Chatbot" (2021)
- Voice AI & Speech
- AI Agents