Skip to content

Prompt Engineering

Bit: Prompt engineering is the art of asking the right question. Turns out, how you ask an LLM matters as much as what you ask — just like talking to humans.


★ TL;DR

  • What: Crafting inputs (prompts) to get desired outputs from LLMs without changing the model
  • Why: The cheapest, fastest way to improve LLM output. Zero training, zero infra — just better instructions.
  • Key point: Good prompting follows patterns: be specific, give examples, assign a role, think step-by-step.

★ Overview

Definition

Prompt Engineering is the practice of designing and refining inputs to LLMs to elicit specific, high-quality responses. It encompasses techniques from simple instruction formatting to complex multi-step reasoning frameworks.

Scope

Covers prompting techniques from basic to advanced. For when prompting isn't enough, see Fine Tuning and Rag.

Significance

  • First thing to try before any other technique (cheapest, fastest)
  • Skill every AI practitioner needs regardless of technical depth
  • The difference between "LLM gives garbage" and "LLM gives gold" is often just the prompt

Prerequisites


★ Deep Dive

The Prompting Hierarchy (Simplest → Most Complex)

Level 1: Zero-Shot       → "Translate this to French: Hello"
Level 2: System Prompt   → "You are a French translator. Translate: Hello"
Level 3: Few-Shot        → "Here are 3 examples. Now do this one..."
Level 4: Chain-of-Thought → "Think step by step..."
Level 5: Self-Consistency → Generate multiple answers, pick the majority
Level 6: ReAct / Tool Use → Think, Act, Observe loops (enters Agent territory)

Key Techniques

1. System Prompts (Role Assignment)

WEAK:  "Summarize this article"
STRONG: "You are an expert technical editor. Summarize the following
         article in 3 bullet points, focusing on practical implications
         for software engineers. Use precise technical language."

Why it works: Activates the model's "persona" — trained on text BY experts, so "being" an expert improves output.

2. Few-Shot Prompting

Classify the sentiment:

"This product is amazing!" → Positive
"Worst purchase ever." → Negative
"It's okay, nothing special." → Neutral

"The quality exceeded my expectations!" →

Rule of thumb: 3-5 examples is the sweet spot. More examples = more consistent, but uses context window.

3. Chain of Thought (CoT)

WEAK:  "What is 17 × 24?"
STRONG: "What is 17 × 24? Think step by step."

Model output with CoT:
  "17 × 24
   = 17 × 20 + 17 × 4
   = 340 + 68
   = 408"

Why it works: Forces the model to show intermediate reasoning, reducing errors. Especially effective for math, logic, and multi-step problems.

4. Structured Output

"Analyze this code and respond in the following JSON format:
{
  "bugs": [{"line": int, "description": string, "severity": "high|medium|low"}],
  "suggestions": [string],
  "overall_quality": int  // 1-10
}"

Why structured: Parseable by code, consistent format, forces completeness.

5. The META Framework

Element Description Example
Mission What's the overall goal? "You are a code reviewer"
Expectations What format/quality? "Be concise, cite line numbers"
Task Specific action "Review this Python function"
Artifacts Examples/reference "Here's an example review..."

Advanced Patterns

Pattern How Use Case
Self-Consistency Generate N responses, majority vote Math, factual questions
Tree of Thought Explore multiple reasoning branches Complex problem-solving
Least-to-Most Break complex problem into sub-problems Problems requiring decomposition
Generated Knowledge "First, tell me facts about X. Then, answer Y" Knowledge-intensive questions
Prompt Chaining Output of prompt A → Input of prompt B Multi-stage pipelines

The Prompting Mistake Matrix

❌ Common Mistake ✅ Better Approach
"Write good code" "Write Python 3.12 code that handles edge cases. Include type hints, docstrings, and error handling."
"Summarize this" "Summarize in 3 bullet points for a technical audience. Each bullet max 20 words."
"Be creative" "Generate 5 alternative approaches, ranked by feasibility. For each, explain trade-offs."
"Fix the bug" "Identify the root cause. Explain why it fails. Provide corrected code with comments on changes."
Dumping entire codebase Provide only the relevant function + error message + expected behavior

◆ Quick Reference

PROMPTING CHECKLIST:
□ Define ROLE      → "You are a [specific expert]"
□ Set CONTEXT      → Background info the model needs
□ State TASK       → Exactly what to do
□ Specify FORMAT   → How to structure the output
□ Give EXAMPLES    → 2-3 examples of desired output
□ Add CONSTRAINTS  → What NOT to do, length limits, etc.
□ Request REASONING → "Think step by step" / "Explain your reasoning"

TEMPERATURE GUIDE:
  0.0 → Factual, deterministic (data extraction, classification)
  0.3 → Balanced (summarization, coding)
  0.7 → Creative (writing, brainstorming)
  1.0 → Very creative (poetry, fiction)

◆ Strengths vs Limitations

✅ Strengths ❌ Limitations
Zero cost (no training/infra) Can't add new knowledge
Instant iteration Fragile — small changes = different results
Works with any model Context window limits complexity
Easy to A/B test Can't change model behavior permanently
Good starting point always Diminishing returns at some point → need RAG/fine-tuning

○ Gotchas & Common Mistakes

  • ⚠️ Prompt ≠ Programming: Prompts are probabilistic, not deterministic. Same prompt can give different results.
  • ⚠️ "Be concise" doesn't work well: Instead say "Respond in exactly 3 sentences" — be specific about constraints.
  • ⚠️ Prompt injection: Users can override your system prompt. Never trust user input in prompts for production apps.
  • ⚠️ Position matters: Important instructions at the beginning AND end of prompts are most likely followed (primacy/recency effect).
  • ⚠️ "Just prompt engineer it" is a ceiling: For domain expertise, consistent behavior, or new knowledge — prompting alone won't cut it.

○ Interview Angles

  • Q: What's the difference between zero-shot, few-shot, and chain-of-thought prompting?
  • A: Zero-shot: just instructions, no examples. Few-shot: include examples of desired input→output pairs. CoT: ask model to show reasoning steps. Each adds more guidance and typically improves quality.

  • Q: How would you handle prompt injection in a production system?

  • A: Input sanitization, separate system/user prompts, output validation, don't include raw user input in system prompts. Use the model's built-in system prompt separation. For critical apps, add a second LLM call to verify the first output makes sense.

★ Code & Implementation

Structured Prompt Builder

# pip install openai>=1.60
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY env var
from openai import OpenAI
from dataclasses import dataclass

client = OpenAI()

@dataclass
class PromptConfig:
    """Structured prompt using the META framework."""
    role: str        # Mission: what expert persona
    context: str     # Context background
    task: str        # Task: specific action
    format: str      # Expected output format
    examples: list[tuple[str, str]]  # (input, output) pairs for few-shot
    constraints: str = ""            # what NOT to do

def build_messages(user_input: str, config: PromptConfig) -> list[dict]:
    """Build few-shot messages list from a PromptConfig."""
    system = (
        f"You are {config.role}.\n\n"
        f"Context: {config.context}\n\n"
        f"Task: {config.task}\n\n"
        f"Output format: {config.format}"
    )
    if config.constraints:
        system += f"\n\nConstraints: {config.constraints}"

    messages = [{"role": "system", "content": system}]
    # Few-shot examples
    for example_input, example_output in config.examples:
        messages.append({"role": "user", "content": example_input})
        messages.append({"role": "assistant", "content": example_output})
    # Actual query
    messages.append({"role": "user", "content": user_input})
    return messages

# Example: Sentiment classifier with few-shot
config = PromptConfig(
    role="an expert sentiment analyst",
    context="You are classifying customer feedback for a SaaS product.",
    task="Classify the sentiment of the user's review.",
    format='{"sentiment": "positive|negative|neutral", "confidence": 0.0-1.0, "reason": "..."}',
    examples=[
        ("This product is amazing!", '{"sentiment": "positive", "confidence": 0.97, "reason": "clear enthusiasm"}'),
        ("Worst purchase ever.",     '{"sentiment": "negative", "confidence": 0.99, "reason": "strong negative language"}'),
    ],
    constraints="Only respond with valid JSON. No extra text.",
)

messages = build_messages("The onboarding is okay but the dashboard is confusing.", config)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    temperature=0.0,   # deterministic for classification
    max_tokens=100,
)
print(response.choices[0].message.content)
# → {"sentiment": "negative", "confidence": 0.82, "reason": "mixed review, negative feature mentioned"}

Chain-of-Thought vs Direct: Side-by-Side Test

# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY

def compare_cot(question: str, model: str = "gpt-4o-mini") -> None:
    """Compare direct vs chain-of-thought prompting on a reasoning question."""
    # Direct prompt
    direct = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": question}],
        temperature=0, max_tokens=100,
    )
    # CoT prompt
    cot = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": f"{question} Think step by step."}],
        temperature=0, max_tokens=300,
    )
    print("=== DIRECT ===")
    print(direct.choices[0].message.content)
    print("\n=== CHAIN-OF-THOUGHT ===")
    print(cot.choices[0].message.content)

compare_cot("If a train travels 120km at 60km/h and then 90km at 45km/h, what is the total travel time?")
# Direct: often gives wrong answer quickly
# CoT: breaks into phases → gets 2h + 2h = 4h (correct)

★ Connections

Relationship Topics
Builds on Llms Overview
Leads to Ai Agents, Rag (prompt is key in RAG too)
Compare with Fine Tuning (permanent behavior change), Rag (adds knowledge)
Cross-domain UX writing, Human communication, Psychology (framing effects)

Type Resource Why
🔧 Hands-on Anthropic Prompt Engineering Guide Industry-best prompt engineering documentation
📘 Book "AI Engineering" by Chip Huyen (2025), Ch 5 (Prompt Engineering) Systematic treatment of prompting techniques with evaluation
🔧 Hands-on OpenAI Prompt Engineering Guide Practical tips with examples for GPT models
🎓 Course deeplearning.ai — "ChatGPT Prompt Engineering" Short, practical course on effective prompting

★ Sources

  • OpenAI Prompt Engineering Guide — https://platform.openai.com/docs/guides/prompt-engineering
  • Anthropic Prompt Engineering Guide — https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
  • Wei et al., "Chain-of-Thought Prompting" (2022)
  • Yao et al., "Tree of Thoughts" (2023)