AI UX Patterns¶

✨ Bit: The best AI model in the world is useless if users don't trust it, can't understand it, or give up waiting 8 seconds for a response. AI UX design is about making intelligence feel reliable, fast, and controllable.

★ TL;DR¶

What: Design patterns for building user interfaces around AI systems — handling latency, uncertainty, trust, and error
Why: AI behaves differently from traditional software (non-deterministic, sometimes wrong, variable latency). Generic UX patterns don't work.
Key point: The three pillars of AI UX: make it fast (streaming), make it trustable (citations, confidence), make it controllable (edit, regenerate, undo).

★ Overview¶

Definition¶

AI UX patterns are recurring design solutions for the unique challenges of AI-powered interfaces: communicating uncertainty, managing variable latency, building user trust, and enabling human correction.

Prerequisites¶

★ Deep Dive¶

The Three Pillars of AI UX¶

PILLAR 1: SPEED                    PILLAR 2: TRUST                  PILLAR 3: CONTROL
─────────────────                  ───────────────                   ──────────────────
• Stream tokens                    • Show citations                  • Regenerate button
• Skeleton loading                 • Confidence indicators           • Edit AI output
• Progressive rendering            • Source attribution              • Undo / revert
• Optimistic updates               • "I don't know" admission        • Feedback (👍/👎)
• Background prefetch              • Transparent limitations          • Temperature control
                                   • Consistent persona               • Mode switching

Core AI UX Patterns¶

Pattern	Problem It Solves	Example
Streaming response	3-8 second wait feels slow	ChatGPT token-by-token rendering
Skeleton loading	User doesn't know something is happening	Shimmer animation during model inference
Citation cards	User can't verify AI claims	Perplexity-style inline source links
Confidence indicators	Not all answers are equally reliable	Color-coded confidence bars
Suggested prompts	Users don't know what to ask	Starter chips, autocomplete
Regeneration	First answer wasn't good enough	"Try again" button with different seed
Inline editing	AI was 90% right but needs correction	Editable responses with diff tracking
Progressive disclosure	Too much information at once	Summary first, expandable details
Guardrail messaging	AI refuses a request	Clear explanation of what's not possible and why
Feedback capture	Need to improve model quality	Thumbs up/down, report, correction

Anti-Patterns to Avoid¶

Anti-Pattern	Why It Hurts	Better Alternative
No loading state	User thinks it's broken	Streaming + skeleton loading
Fake confidence	Erodes trust when wrong	Show uncertainty explicitly
Wall of text	Overwhelming, unreadable	Progressive disclosure, formatting
No attribution	"The AI said so" isn't trustworthy	Citations with source links
No way to correct	Users feel powerless	Edit, regenerate, and feedback buttons
Hiding AI involvement	Users feel deceived	Be transparent about AI-generated content

Cognitive Load Patterns¶

AI responses are often longer and denser than traditional software output. Managing cognitive load is critical:

PROGRESSIVE DISCLOSURE HIERARCHY:

  Level 1 (Always visible):
    TL;DR summary -- 1-2 sentences. Show immediately.

  Level 2 (Expandable):
    Key points -- bullet list. Expand on click.

  Level 3 (On demand):
    Full response -- complete text. "Read more" link.
    Sources / citations -- only when user asks "How do you know?"

  Why it works: Miller's Law -- working memory holds 7+/-2 items.
  Showing the full response immediately overwhelms; chunked delivery respects limits.

Cognitive Load Pattern	When to Use	Implementation
Chunked delivery	Long responses (>200 words)	TL;DR first, details expandable
Skeleton states	Inference >500ms	Shimmer animation matching response layout
Inline citations	Factual claims	Superscript [1], source panel on click
Structured output	Lists, tables, code	Detect and render markdown server-side
Response length control	Power users	Terse / Standard / Detailed toggle
Diff highlighting	Regenerated responses	Highlight what changed between versions

◆ Production Failure Modes¶

Failure	Symptoms	Root Cause	Mitigation
Trust erosion	Users stop relying on AI answers	Confident wrong answers without citations	Add citations, confidence indicators, "I don't know"
Latency abandonment	Users leave during model inference	No streaming, no loading indicator	Stream tokens, add skeleton loading
Feedback fatigue	Users stop giving feedback	Too many feedback prompts, no visible impact	Make feedback easy (one click), show when it improves results
Cognitive overload	Users skim or ignore responses	Full answer dumped without structure	Progressive disclosure, TL;DR first, render markdown properly
Hallucination cascade	User acts on wrong AI output	No uncertainty signal; user trusted blindly	Confidence indicators required for factual claims; citations

○ Interview Angles¶

Q: How would you design the UX for an AI research assistant?
A: Three core principles. Speed: stream responses token-by-token with a skeleton loading state. Trust: every claim gets an inline citation with a link to the source document — clicking opens the relevant passage highlighted. Control: users can regenerate, edit the response, or thumbs-down with a reason. I'd add progressive disclosure — a TL;DR summary with expandable details underneath. For uncertainty, I'd use a confidence indicator and have the AI explicitly say "I'm not sure about this" rather than hallucinating confidently.
Q: How do you handle the trust problem with AI-generated content?
A: Trust is built through transparency and verifiability. Three patterns: (1) Citation cards — every factual claim links to its source; users can verify. (2) Explicit uncertainty — "I'm not confident about this" is better than false confidence. (3) Graceful correction — make it trivially easy to edit, regenerate, or flag wrong answers. The key insight: users don't need AI to be perfect, they need to know when to trust it and when to double-check.
Q: Streaming responses seem simple — what are the hard engineering tradeoffs?
A: Three non-obvious challenges. (1) Partial markdown — streaming mid-table or mid-code-block means your frontend must handle incomplete syntax gracefully without layout breaking. (2) Cancellation — users abort early; you need to cleanly close SSE connections and stop generation to avoid wasted cost. (3) Error recovery — if the stream breaks after 50 tokens, resume or restart gracefully, not leave a half-rendered response. At scale: buffer DOM updates to batches of ~50ms to avoid 100+ React re-renders/second, and cache common prompt prefixes server-side.

◆ Hands-On Exercises¶

Exercise 1: Audit an AI Product's UX¶

Goal: Evaluate an existing AI product against the three pillars Time: 20 minutes Steps: 1. Choose an AI product (ChatGPT, Perplexity, Cursor, etc.) 2. Score it on Speed, Trust, and Control (1-10 each) 3. Identify 3 UX anti-patterns and suggest improvements Expected Output: UX audit scorecard with improvement recommendations

Exercise 2: Build a Streaming AI Interface¶

Goal: Implement a streaming chat UI with confidence indicators Time: 45 minutes Steps: 1. Use the FastAPI streaming endpoint from the Code section below 2. Build a React frontend that renders tokens progressively using the TypeScript pattern 3. Add a confidence color indicator (green/yellow/red) using the confidence endpoint 4. Add a "Regenerate" button that clears and re-streams the response 5. Test: measure perceived speed vs. non-streaming (5-person user study) Expected Output: Working chat UI with streaming + confidence + regenerate UX

★ Code & Implementation¶

Streaming Response with Progressive Disclosure¶

# pip install openai>=1.60 fastapi>=0.110 uvicorn>=0.29
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from openai import OpenAI

app    = FastAPI()
client = OpenAI()

@app.get("/stream")
async def stream_response(question: str) -> StreamingResponse:
    """Stream LLM tokens to the client as they arrive — core AI UX pattern."""
    def token_generator():
        stream = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": question}],
            max_tokens=400,
            stream=True,
        )
        for chunk in stream:
            delta = chunk.choices[0].delta.content
            if delta:
                # Server-Sent Events format
                yield f"data: {delta}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(token_generator(), media_type="text/event-stream")

# Frontend consumption (JavaScript):
# const es = new EventSource(`/stream?question=What+is+RAG%3F`);
# es.onmessage = (e) => {
#   if (e.data === "[DONE]") { es.close(); return; }
#   document.getElementById("output").textContent += e.data;
# };

Confidence Signaling Pattern¶

# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY
import json
from openai import OpenAI

client = OpenAI()

def answer_with_confidence(question: str) -> dict:
    """Return answer annotated with confidence and uncertainty signals for UI."""
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{
            "role": "system",
            "content": (
                "Answer questions and rate your confidence. "
                "JSON only: {\"answer\": \"...\", \"confidence\": 0.0-1.0, "
                "\"uncertainty_note\": \"null or brief caveat\", \"sources_likely\": [\"...\"]}"
            )
        }, {"role": "user", "content": question}],
        temperature=0,
        response_format={"type": "json_object"},
    )
    return json.loads(resp.choices[0].message.content)

# UI mapping: confidence → indicator color
def confidence_color(conf: float) -> str:
    if conf >= 0.85: return "green"    # show normally
    if conf >= 0.6:  return "yellow"   # show with "Verify this" note
    return "red"                        # show with prominent "AI may be wrong" warning

result = answer_with_confidence("What is the population of Mars?")
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%} → {confidence_color(result['confidence'])}")
print(f"Caveat: {result.get('uncertainty_note')}")

React Streaming UI (TypeScript — DOM Ref Pattern)¶

// npm install openai  (React 18+ assumed)
// ⚠️ Last tested: 2026-04 | Requires: React 18+, EventSource API
// Key insight: use ref + direct DOM mutation for streaming, NOT useState per token.
// useState per token = 100+ re-renders/sec = jank. Ref mutation = smooth.

import { useRef, useState, useCallback } from 'react';

export function StreamingChat() {
  const [question, setQuestion]   = useState('');
  const [isStreaming, setStreaming] = useState(false);
  const outputRef  = useRef<HTMLDivElement>(null);
  const esRef      = useRef<EventSource | null>(null);

  const handleAsk = useCallback(() => {
    if (!question.trim() || isStreaming) return;
    setStreaming(true);
    if (outputRef.current) outputRef.current.textContent = '';

    // Close any previous stream
    esRef.current?.close();

    const es = new EventSource(`/stream?question=${encodeURIComponent(question)}`);
    esRef.current = es;

    es.onmessage = (e) => {
      if (e.data === '[DONE]') { es.close(); setStreaming(false); return; }
      // Direct DOM mutation: avoids re-rendering entire component per token
      if (outputRef.current) outputRef.current.textContent += e.data;
    };

    es.onerror = () => {
      es.close();
      setStreaming(false);
      if (outputRef.current) outputRef.current.textContent += ' [Stream error]';
    };
  }, [question, isStreaming]);

  const handleCancel = () => {
    esRef.current?.close();
    setStreaming(false);
  };

  return (
    <div>
      <textarea value={question} onChange={e => setQuestion(e.target.value)} rows={3} />
      <button onClick={handleAsk} disabled={isStreaming}>Ask</button>
      {isStreaming && <button onClick={handleCancel}>Stop</button>}
      <div ref={outputRef} aria-live="polite" className="ai-output" />
    </div>
  );
}

★ Connections¶

Relationship	Topics
Builds on	Conversational AI, API Design
Leads to	AI product design, user research for AI, AI Product Management
Compare with	Traditional software UX, mobile UX patterns
Cross-domain	Product design, human-computer interaction, psychology

★ Recommended Resources¶

Type	Resource	Why
📘 Book	"AI Engineering" by Chip Huyen (2025), Ch 1	AI product design from an engineering perspective
🔧 Hands-on	Google PAIR Guidelines	Google's AI UX design principles
🔧 Hands-on	Apple Human Interface Guidelines — Machine Learning	Apple's AI UX design principles

★ Sources¶

Google PAIR — https://pair.withgoogle.com/
Apple HIG: Machine Learning — https://developer.apple.com/design/human-interface-guidelines/machine-learning
Nielsen Norman Group — AI UX Research — https://www.nngroup.com/