Prompt Engineering¶
✨ Bit: Prompt engineering is the art of asking the right question. Turns out, how you ask an LLM matters as much as what you ask — just like talking to humans.
★ TL;DR¶
- What: Crafting inputs (prompts) to get desired outputs from LLMs without changing the model
- Why: The cheapest, fastest way to improve LLM output. Zero training, zero infra — just better instructions.
- Key point: Good prompting follows patterns: be specific, give examples, assign a role, think step-by-step.
★ Overview¶
Definition¶
Prompt Engineering is the practice of designing and refining inputs to LLMs to elicit specific, high-quality responses. It encompasses techniques from simple instruction formatting to complex multi-step reasoning frameworks.
Scope¶
Covers prompting techniques from basic to advanced. For when prompting isn't enough, see Fine Tuning and Rag.
Significance¶
- First thing to try before any other technique (cheapest, fastest)
- Skill every AI practitioner needs regardless of technical depth
- The difference between "LLM gives garbage" and "LLM gives gold" is often just the prompt
Prerequisites¶
- Basic understanding of Llms Overview
★ Deep Dive¶
The Prompting Hierarchy (Simplest → Most Complex)¶
Level 1: Zero-Shot → "Translate this to French: Hello"
Level 2: System Prompt → "You are a French translator. Translate: Hello"
Level 3: Few-Shot → "Here are 3 examples. Now do this one..."
Level 4: Chain-of-Thought → "Think step by step..."
Level 5: Self-Consistency → Generate multiple answers, pick the majority
Level 6: ReAct / Tool Use → Think, Act, Observe loops (enters Agent territory)
Key Techniques¶
1. System Prompts (Role Assignment)¶
WEAK: "Summarize this article"
STRONG: "You are an expert technical editor. Summarize the following
article in 3 bullet points, focusing on practical implications
for software engineers. Use precise technical language."
Why it works: Activates the model's "persona" — trained on text BY experts, so "being" an expert improves output.
2. Few-Shot Prompting¶
Classify the sentiment:
"This product is amazing!" → Positive
"Worst purchase ever." → Negative
"It's okay, nothing special." → Neutral
"The quality exceeded my expectations!" →
Rule of thumb: 3-5 examples is the sweet spot. More examples = more consistent, but uses context window.
3. Chain of Thought (CoT)¶
WEAK: "What is 17 × 24?"
STRONG: "What is 17 × 24? Think step by step."
Model output with CoT:
"17 × 24
= 17 × 20 + 17 × 4
= 340 + 68
= 408"
Why it works: Forces the model to show intermediate reasoning, reducing errors. Especially effective for math, logic, and multi-step problems.
4. Structured Output¶
"Analyze this code and respond in the following JSON format:
{
"bugs": [{"line": int, "description": string, "severity": "high|medium|low"}],
"suggestions": [string],
"overall_quality": int // 1-10
}"
Why structured: Parseable by code, consistent format, forces completeness.
5. The META Framework¶
| Element | Description | Example |
|---|---|---|
| Mission | What's the overall goal? | "You are a code reviewer" |
| Expectations | What format/quality? | "Be concise, cite line numbers" |
| Task | Specific action | "Review this Python function" |
| Artifacts | Examples/reference | "Here's an example review..." |
Advanced Patterns¶
| Pattern | How | Use Case |
|---|---|---|
| Self-Consistency | Generate N responses, majority vote | Math, factual questions |
| Tree of Thought | Explore multiple reasoning branches | Complex problem-solving |
| Least-to-Most | Break complex problem into sub-problems | Problems requiring decomposition |
| Generated Knowledge | "First, tell me facts about X. Then, answer Y" | Knowledge-intensive questions |
| Prompt Chaining | Output of prompt A → Input of prompt B | Multi-stage pipelines |
The Prompting Mistake Matrix¶
| ❌ Common Mistake | ✅ Better Approach |
|---|---|
| "Write good code" | "Write Python 3.12 code that handles edge cases. Include type hints, docstrings, and error handling." |
| "Summarize this" | "Summarize in 3 bullet points for a technical audience. Each bullet max 20 words." |
| "Be creative" | "Generate 5 alternative approaches, ranked by feasibility. For each, explain trade-offs." |
| "Fix the bug" | "Identify the root cause. Explain why it fails. Provide corrected code with comments on changes." |
| Dumping entire codebase | Provide only the relevant function + error message + expected behavior |
◆ Quick Reference¶
PROMPTING CHECKLIST:
□ Define ROLE → "You are a [specific expert]"
□ Set CONTEXT → Background info the model needs
□ State TASK → Exactly what to do
□ Specify FORMAT → How to structure the output
□ Give EXAMPLES → 2-3 examples of desired output
□ Add CONSTRAINTS → What NOT to do, length limits, etc.
□ Request REASONING → "Think step by step" / "Explain your reasoning"
TEMPERATURE GUIDE:
0.0 → Factual, deterministic (data extraction, classification)
0.3 → Balanced (summarization, coding)
0.7 → Creative (writing, brainstorming)
1.0 → Very creative (poetry, fiction)
◆ Strengths vs Limitations¶
| ✅ Strengths | ❌ Limitations |
|---|---|
| Zero cost (no training/infra) | Can't add new knowledge |
| Instant iteration | Fragile — small changes = different results |
| Works with any model | Context window limits complexity |
| Easy to A/B test | Can't change model behavior permanently |
| Good starting point always | Diminishing returns at some point → need RAG/fine-tuning |
○ Gotchas & Common Mistakes¶
- ⚠️ Prompt ≠ Programming: Prompts are probabilistic, not deterministic. Same prompt can give different results.
- ⚠️ "Be concise" doesn't work well: Instead say "Respond in exactly 3 sentences" — be specific about constraints.
- ⚠️ Prompt injection: Users can override your system prompt. Never trust user input in prompts for production apps.
- ⚠️ Position matters: Important instructions at the beginning AND end of prompts are most likely followed (primacy/recency effect).
- ⚠️ "Just prompt engineer it" is a ceiling: For domain expertise, consistent behavior, or new knowledge — prompting alone won't cut it.
○ Interview Angles¶
- Q: What's the difference between zero-shot, few-shot, and chain-of-thought prompting?
-
A: Zero-shot: just instructions, no examples. Few-shot: include examples of desired input→output pairs. CoT: ask model to show reasoning steps. Each adds more guidance and typically improves quality.
-
Q: How would you handle prompt injection in a production system?
- A: Input sanitization, separate system/user prompts, output validation, don't include raw user input in system prompts. Use the model's built-in system prompt separation. For critical apps, add a second LLM call to verify the first output makes sense.
★ Code & Implementation¶
Structured Prompt Builder¶
# pip install openai>=1.60
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY env var
from openai import OpenAI
from dataclasses import dataclass
client = OpenAI()
@dataclass
class PromptConfig:
"""Structured prompt using the META framework."""
role: str # Mission: what expert persona
context: str # Context background
task: str # Task: specific action
format: str # Expected output format
examples: list[tuple[str, str]] # (input, output) pairs for few-shot
constraints: str = "" # what NOT to do
def build_messages(user_input: str, config: PromptConfig) -> list[dict]:
"""Build few-shot messages list from a PromptConfig."""
system = (
f"You are {config.role}.\n\n"
f"Context: {config.context}\n\n"
f"Task: {config.task}\n\n"
f"Output format: {config.format}"
)
if config.constraints:
system += f"\n\nConstraints: {config.constraints}"
messages = [{"role": "system", "content": system}]
# Few-shot examples
for example_input, example_output in config.examples:
messages.append({"role": "user", "content": example_input})
messages.append({"role": "assistant", "content": example_output})
# Actual query
messages.append({"role": "user", "content": user_input})
return messages
# Example: Sentiment classifier with few-shot
config = PromptConfig(
role="an expert sentiment analyst",
context="You are classifying customer feedback for a SaaS product.",
task="Classify the sentiment of the user's review.",
format='{"sentiment": "positive|negative|neutral", "confidence": 0.0-1.0, "reason": "..."}',
examples=[
("This product is amazing!", '{"sentiment": "positive", "confidence": 0.97, "reason": "clear enthusiasm"}'),
("Worst purchase ever.", '{"sentiment": "negative", "confidence": 0.99, "reason": "strong negative language"}'),
],
constraints="Only respond with valid JSON. No extra text.",
)
messages = build_messages("The onboarding is okay but the dashboard is confusing.", config)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.0, # deterministic for classification
max_tokens=100,
)
print(response.choices[0].message.content)
# → {"sentiment": "negative", "confidence": 0.82, "reason": "mixed review, negative feature mentioned"}
Chain-of-Thought vs Direct: Side-by-Side Test¶
# ⚠️ Last tested: 2026-04 | Requires: openai>=1.60, OPENAI_API_KEY
def compare_cot(question: str, model: str = "gpt-4o-mini") -> None:
"""Compare direct vs chain-of-thought prompting on a reasoning question."""
# Direct prompt
direct = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": question}],
temperature=0, max_tokens=100,
)
# CoT prompt
cot = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": f"{question} Think step by step."}],
temperature=0, max_tokens=300,
)
print("=== DIRECT ===")
print(direct.choices[0].message.content)
print("\n=== CHAIN-OF-THOUGHT ===")
print(cot.choices[0].message.content)
compare_cot("If a train travels 120km at 60km/h and then 90km at 45km/h, what is the total travel time?")
# Direct: often gives wrong answer quickly
# CoT: breaks into phases → gets 2h + 2h = 4h (correct)
★ Connections¶
| Relationship | Topics |
|---|---|
| Builds on | Llms Overview |
| Leads to | Ai Agents, Rag (prompt is key in RAG too) |
| Compare with | Fine Tuning (permanent behavior change), Rag (adds knowledge) |
| Cross-domain | UX writing, Human communication, Psychology (framing effects) |
★ Recommended Resources¶
| Type | Resource | Why |
|---|---|---|
| 🔧 Hands-on | Anthropic Prompt Engineering Guide | Industry-best prompt engineering documentation |
| 📘 Book | "AI Engineering" by Chip Huyen (2025), Ch 5 (Prompt Engineering) | Systematic treatment of prompting techniques with evaluation |
| 🔧 Hands-on | OpenAI Prompt Engineering Guide | Practical tips with examples for GPT models |
| 🎓 Course | deeplearning.ai — "ChatGPT Prompt Engineering" | Short, practical course on effective prompting |
★ Sources¶
- OpenAI Prompt Engineering Guide — https://platform.openai.com/docs/guides/prompt-engineering
- Anthropic Prompt Engineering Guide — https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering
- Wei et al., "Chain-of-Thought Prompting" (2022)
- Yao et al., "Tree of Thoughts" (2023)