AI UX Patterns¶
✨ Bit: The best AI model in the world is useless if users don't trust it, can't understand it, or give up waiting 8 seconds for a response. AI UX design is about making intelligence feel reliable, fast, and controllable.
★ TL;DR¶
- What: Design patterns for building user interfaces around AI systems — handling latency, uncertainty, trust, and error
- Why: AI behaves differently from traditional software (non-deterministic, sometimes wrong, variable latency). Generic UX patterns don't work.
- Key point: The three pillars of AI UX: make it fast (streaming), make it trustable (citations, confidence), make it controllable (edit, regenerate, undo).
★ Overview¶
Definition¶
AI UX patterns are recurring design solutions for the unique challenges of AI-powered interfaces: communicating uncertainty, managing variable latency, building user trust, and enabling human correction.
Prerequisites¶
★ Deep Dive¶
The Three Pillars of AI UX¶
PILLAR 1: SPEED PILLAR 2: TRUST PILLAR 3: CONTROL
───────────────── ─────────────── ──────────────────
• Stream tokens • Show citations • Regenerate button
• Skeleton loading • Confidence indicators • Edit AI output
• Progressive rendering • Source attribution • Undo / revert
• Optimistic updates • "I don't know" admission • Feedback (👍/👎)
• Background prefetch • Transparent limitations • Temperature control
• Consistent persona • Mode switching
Core AI UX Patterns¶
| Pattern | Problem It Solves | Example |
|---|---|---|
| Streaming response | 3-8 second wait feels slow | ChatGPT token-by-token rendering |
| Skeleton loading | User doesn't know something is happening | Shimmer animation during model inference |
| Citation cards | User can't verify AI claims | Perplexity-style inline source links |
| Confidence indicators | Not all answers are equally reliable | Color-coded confidence bars |
| Suggested prompts | Users don't know what to ask | Starter chips, autocomplete |
| Regeneration | First answer wasn't good enough | "Try again" button with different seed |
| Inline editing | AI was 90% right but needs correction | Editable responses with diff tracking |
| Progressive disclosure | Too much information at once | Summary first, expandable details |
| Guardrail messaging | AI refuses a request | Clear explanation of what's not possible and why |
| Feedback capture | Need to improve model quality | Thumbs up/down, report, correction |
Anti-Patterns to Avoid¶
| Anti-Pattern | Why It Hurts | Better Alternative |
|---|---|---|
| No loading state | User thinks it's broken | Streaming + skeleton loading |
| Fake confidence | Erodes trust when wrong | Show uncertainty explicitly |
| Wall of text | Overwhelming, unreadable | Progressive disclosure, formatting |
| No attribution | "The AI said so" isn't trustworthy | Citations with source links |
| No way to correct | Users feel powerless | Edit, regenerate, and feedback buttons |
| Hiding AI involvement | Users feel deceived | Be transparent about AI-generated content |
Cognitive Load Patterns¶
AI responses are often longer and denser than traditional software output. Managing cognitive load is critical:
PROGRESSIVE DISCLOSURE HIERARCHY:
Level 1 (Always visible):
TL;DR summary -- 1-2 sentences. Show immediately.
Level 2 (Expandable):
Key points -- bullet list. Expand on click.
Level 3 (On demand):
Full response -- complete text. "Read more" link.
Sources / citations -- only when user asks "How do you know?"
Why it works: Miller's Law -- working memory holds 7+/-2 items.
Showing the full response immediately overwhelms; chunked delivery respects limits.
| Cognitive Load Pattern | When to Use | Implementation |
|---|---|---|
| Chunked delivery | Long responses (>200 words) | TL;DR first, details expandable |
| Skeleton states | Inference >500ms | Shimmer animation matching response layout |
| Inline citations | Factual claims | Superscript [1], source panel on click |
| Structured output | Lists, tables, code | Detect and render markdown server-side |
| Response length control | Power users | Terse / Standard / Detailed toggle |
| Diff highlighting | Regenerated responses | Highlight what changed between versions |
◆ Production Failure Modes¶
| Failure | Symptoms | Root Cause | Mitigation |
|---|---|---|---|
| Trust erosion | Users stop relying on AI answers | Confident wrong answers without citations | Add citations, confidence indicators, "I don't know" |
| Latency abandonment | Users leave during model inference | No streaming, no loading indicator | Stream tokens, add skeleton loading |
| Feedback fatigue | Users stop giving feedback | Too many feedback prompts, no visible impact | Make feedback easy (one click), show when it improves results |
| Cognitive overload | Users skim or ignore responses | Full answer dumped without structure | Progressive disclosure, TL;DR first, render markdown properly |
| Hallucination cascade | User acts on wrong AI output | No uncertainty signal; user trusted blindly | Confidence indicators required for factual claims; citations |
○ Interview Angles¶
- Q: How would you design the UX for an AI research assistant?
-
A: Three core principles. Speed: stream responses token-by-token with a skeleton loading state. Trust: every claim gets an inline citation with a link to the source document — clicking opens the relevant passage highlighted. Control: users can regenerate, edit the response, or thumbs-down with a reason. I'd add progressive disclosure — a TL;DR summary with expandable details underneath. For uncertainty, I'd use a confidence indicator and have the AI explicitly say "I'm not sure about this" rather than hallucinating confidently.
-
Q: How do you handle the trust problem with AI-generated content?
-
A: Trust is built through transparency and verifiability. Three patterns: (1) Citation cards — every factual claim links to its source; users can verify. (2) Explicit uncertainty — "I'm not confident about this" is better than false confidence. (3) Graceful correction — make it trivially easy to edit, regenerate, or flag wrong answers. The key insight: users don't need AI to be perfect, they need to know when to trust it and when to double-check.
-
Q: Streaming responses seem simple — what are the hard engineering tradeoffs?
- A: Three non-obvious challenges. (1) Partial markdown — streaming mid-table or mid-code-block means your frontend must handle incomplete syntax gracefully without layout breaking. (2) Cancellation — users abort early; you need to cleanly close SSE connections and stop generation to avoid wasted cost. (3) Error recovery — if the stream breaks after 50 tokens, resume or restart gracefully, not leave a half-rendered response. At scale: buffer DOM updates to batches of ~50ms to avoid 100+ React re-renders/second, and cache common prompt prefixes server-side.
◆ Hands-On Exercises¶
Exercise 1: Audit an AI Product's UX¶
Goal: Evaluate an existing AI product against the three pillars Time: 20 minutes Steps: 1. Choose an AI product (ChatGPT, Perplexity, Cursor, etc.) 2. Score it on Speed, Trust, and Control (1-10 each) 3. Identify 3 UX anti-patterns and suggest improvements Expected Output: UX audit scorecard with improvement recommendations
Exercise 2: Build a Streaming AI Interface¶
Goal: Implement a streaming chat UI with confidence indicators Time: 45 minutes Steps: 1. Use the FastAPI streaming endpoint from the Code section below 2. Build a React frontend that renders tokens progressively using the TypeScript pattern 3. Add a confidence color indicator (green/yellow/red) using the confidence endpoint 4. Add a "Regenerate" button that clears and re-streams the response 5. Test: measure perceived speed vs. non-streaming (5-person user study) Expected Output: Working chat UI with streaming + confidence + regenerate UX
★ Code & Implementation¶
Streaming Response with Progressive Disclosure¶
# pip install openai>=1.60 fastapi>=0.110 uvicorn>=0.29
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI
app = FastAPI()
client = OpenAI()
@app.get("/stream")
async def stream_response(question: str) -> StreamingResponse:
"""Stream LLM tokens to the client as they arrive — core AI UX pattern."""
def token_generator():
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": question}],
max_tokens=400,
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
# Server-Sent Events format
yield f"data: {delta}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(token_generator(), media_type="text/event-stream")
# Frontend consumption (JavaScript):
# const es = new EventSource(`/stream?question=What+is+RAG%3F`);
# es.onmessage = (e) => {
# if (e.data === "[DONE]") { es.close(); return; }
# document.getElementById("output").textContent += e.data;
# };
Confidence Signaling Pattern¶
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY
import json
from openai import OpenAI
client = OpenAI()
def answer_with_confidence(question: str) -> dict:
"""Return answer annotated with confidence and uncertainty signals for UI."""
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "system",
"content": (
"Answer questions and rate your confidence. "
"JSON only: {\"answer\": \"...\", \"confidence\": 0.0-1.0, "
"\"uncertainty_note\": \"null or brief caveat\", \"sources_likely\": [\"...\"]}"
)
}, {"role": "user", "content": question}],
temperature=0,
response_format={"type": "json_object"},
)
return json.loads(resp.choices[0].message.content)
# UI mapping: confidence → indicator color
def confidence_color(conf: float) -> str:
if conf >= 0.85: return "green" # show normally
if conf >= 0.6: return "yellow" # show with "Verify this" note
return "red" # show with prominent "AI may be wrong" warning
result = answer_with_confidence("What is the population of Mars?")
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%} → {confidence_color(result['confidence'])}")
print(f"Caveat: {result.get('uncertainty_note')}")
React Streaming UI (TypeScript — DOM Ref Pattern)¶
// npm install openai (React 18+ assumed)
// ⚠️ Last tested: 2026-04 | Requires: React 18+, EventSource API
// Key insight: use ref + direct DOM mutation for streaming, NOT useState per token.
// useState per token = 100+ re-renders/sec = jank. Ref mutation = smooth.
import { useRef, useState, useCallback } from 'react';
export function StreamingChat() {
const [question, setQuestion] = useState('');
const [isStreaming, setStreaming] = useState(false);
const outputRef = useRef<HTMLDivElement>(null);
const esRef = useRef<EventSource | null>(null);
const handleAsk = useCallback(() => {
if (!question.trim() || isStreaming) return;
setStreaming(true);
if (outputRef.current) outputRef.current.textContent = '';
// Close any previous stream
esRef.current?.close();
const es = new EventSource(`/stream?question=${encodeURIComponent(question)}`);
esRef.current = es;
es.onmessage = (e) => {
if (e.data === '[DONE]') { es.close(); setStreaming(false); return; }
// Direct DOM mutation: avoids re-rendering entire component per token
if (outputRef.current) outputRef.current.textContent += e.data;
};
es.onerror = () => {
es.close();
setStreaming(false);
if (outputRef.current) outputRef.current.textContent += ' [Stream error]';
};
}, [question, isStreaming]);
const handleCancel = () => {
esRef.current?.close();
setStreaming(false);
};
return (
<div>
<textarea value={question} onChange={e => setQuestion(e.target.value)} rows={3} />
<button onClick={handleAsk} disabled={isStreaming}>Ask</button>
{isStreaming && <button onClick={handleCancel}>Stop</button>}
<div ref={outputRef} aria-live="polite" className="ai-output" />
</div>
);
}
★ Connections¶
| Relationship | Topics |
|---|---|
| Builds on | Conversational AI, API Design |
| Leads to | AI product design, user research for AI, AI Product Management |
| Compare with | Traditional software UX, mobile UX patterns |
| Cross-domain | Product design, human-computer interaction, psychology |
★ Recommended Resources¶
| Type | Resource | Why |
|---|---|---|
| 📘 Book | "AI Engineering" by Chip Huyen (2025), Ch 1 | AI product design from an engineering perspective |
| 🔧 Hands-on | Google PAIR Guidelines | Google's AI UX design principles |
| 🔧 Hands-on | Apple Human Interface Guidelines — Machine Learning | Apple's AI UX design principles |
★ Sources¶
- Google PAIR — https://pair.withgoogle.com/
- Apple HIG: Machine Learning — https://developer.apple.com/design/human-interface-guidelines/machine-learning
- Nielsen Norman Group — AI UX Research — https://www.nngroup.com/