NLP Fundamentals¶

✨ Bit: Before GPT, NLP was a completely different world — specific models for specific tasks. Sentiment analysis? Train a model. Translation? Train a different model. Named entities? Yet another model. Then Transformers said "one model to rule them all" — and traditional NLP became a chapter in LLM history.

★ TL;DR¶

What: The foundational techniques for understanding, processing, and generating human language — from classical methods to the Transformer revolution
Why: GenAI REPLACED much of traditional NLP, but interviewers still ask about it. You need to know what came before to understand why GenAI is powerful.
Key point: BERT (2018) = encoder Transformer (understand text). GPT (2018) = decoder Transformer (generate text). This encoder vs decoder split defined GenAI.

★ Overview¶

Definition¶

NLP (Natural Language Processing) is the field of AI focused on enabling computers to understand, interpret, and generate human language. Pre-2018, it used task-specific models. Post-2018, Transformers unified most NLP tasks under one architecture.

Scope¶

Covers classical NLP tasks, the BERT vs GPT paradigm, and how LLMs changed everything. For Transformer architecture, see Transformers. For modern LLM capabilities, see Llms Overview.

★ Deep Dive¶

NLP Task Landscape¶

Task	What It Does	Classical Approach	Modern (LLM) Approach
Text Classification	Categorize text (spam/not spam)	TF-IDF + SVM/Naive Bayes	LLM zero-shot or fine-tuned
Sentiment Analysis	Detect emotion (positive/negative)	Lexicon-based + ML classifiers	LLM prompt: "Is this positive?"
Named Entity Recognition	Find names, places, dates	CRF / BiLSTM-CRF	LLM extraction + JSON output
Machine Translation	English → French	Seq2seq + attention	LLM prompt or specialized model
Summarization	Condense text	Extractive (select sentences)	LLM abstractive summarization
Question Answering	Answer from context	BERT fine-tuned on SQuAD	LLM + RAG
Text Generation	Write new text	RNNs, Markov chains	GPT, Claude, LLaMA
POS Tagging	Label parts of speech	HMM, CRF	Mostly replaced by LLMs

The Pre-Transformer Era (Know This for Interviews)¶

TEXT REPRESENTATION EVOLUTION:

  BAG OF WORDS (BoW):
    "the cat sat on the mat"
    → {the: 2, cat: 1, sat: 1, on: 1, mat: 1}
    ❌ Ignores word order
    ❌ No semantics

  TF-IDF (Term Frequency × Inverse Document Frequency):
    Weights words by importance (rare words score higher)
    TF = count(word) / total_words
    IDF = log(total_docs / docs_containing_word)
    ✅ Better than BoW for retrieval
    ❌ Still no semantics

  WORD2VEC (2013):
    Learn 300-dim vectors where similar words are nearby
    "king" - "man" + "woman" ≈ "queen"
    ✅ Captures semantic relationships
    ❌ One vector per word (no context: "bank" = river or money?)

  ELMo (2018):
    Context-dependent embeddings using BiLSTM
    "bank" gets different vectors in different contexts
    ✅ Context-aware
    ❌ Sequential processing (slow)

  BERT / GPT (2018+):
    Transformer-based, attention-powered
    ✅ Deep context, parallel processing, state-of-the-art everything

The BERT vs GPT Paradigm (Critical!)¶

THE FUNDAMENTAL SPLIT:

  ENCODER (BERT family):              DECODER (GPT family):
  "Understand text"                   "Generate text"

  Sees: ← all tokens →               Sees: ← only past tokens
  (bidirectional)                     (autoregressive, left-to-right)

  Training: Predict [MASK]ed words    Training: Predict next token
  "The [MASK] sat on the mat"        "The cat sat on the" → "mat"

  Output: Hidden representations      Output: Next token probabilities
  (good for classification, NER)      (good for generation)

  Models:                             Models:
  - BERT (2018)                       - GPT-1/2/3/4/5 (2018-)
  - RoBERTa (2019)                    - LLaMA (2023-)
  - DeBERTa (2020)                    - Claude (2023-)
  - ModernBERT (2024)                 - Gemini (2023-)

  Use when:                           Use when:
  - Classification                    - Text generation
  - Search / retrieval                - Chatbots
  - Token-level tasks (NER)           - Code generation
  - Embedding generation              - Anything creative

  GPT WON the scaling war.
  BERT family is still used for embeddings and classification
  but LLMs handle most NLP tasks now via prompting.

Key Classical Concepts Still Relevant¶

NAMED ENTITY RECOGNITION (NER):
  Input:  "Apple Inc. was founded by Steve Jobs in Cupertino."
  Output: [Apple Inc.]=ORG  [Steve Jobs]=PERSON  [Cupertino]=LOCATION

  Modern approach: Use LLM with structured output
  → "Extract all entities as JSON: {persons: [], orgs: [], locations: []}"


SENTIMENT ANALYSIS:
  Input:  "This product is absolutely terrible, waste of money!"
  Output: NEGATIVE (confidence: 0.95)

  Modern approach: LLM prompt
  → "Rate the sentiment of this review: positive, negative, or neutral"


TEXT CLASSIFICATION:
  Input:  Support ticket text
  Output: Category (billing, technical, shipping)

  Modern approach:
  1. Zero-shot: Give LLM category list, ask it to classify
  2. Few-shot: Provide examples of each category
  3. Fine-tuned: LoRA-tune on your labeled data for best accuracy

Information Extraction (Generative IE)¶

TRADITIONAL IE:            GENERATIVE IE (2025):
  Pipeline of models:        One LLM handles everything:
  1. NER model               prompt: "Extract from this text:
  2. Relation extraction        - Entities (person, org, loc)
  3. Event extraction           - Relations between entities
  4. Coreference resolution     - Key events
  5. Temporal extraction        - Return as JSON"

  6 models, 6 training runs   1 prompt, 1 model, done.
  Fragile pipeline             Robust, flexible, adaptable

◆ Quick Reference¶

NLP TASK → MODERN APPROACH:
  Classification      → Zero-shot LLM or fine-tuned classifier
  Sentiment           → LLM prompt or fine-tuned BERT
  NER                 → LLM with JSON output
  Translation         → NLLB, GPT-5.4, or fine-tuned model
  Summarization       → LLM (abstractive)
  Search              → Embedding model (BERT/BGE) + vector DB
  Q&A                 → RAG (retrieve + generate)

STILL RELEVANT IN 2026:
  TF-IDF       → BM25 search is still fast and effective
  BERT family  → Embeddings, classification, reranking
  spaCy        → Fast NER, POS, dependency parsing
  Regex        → Pattern extraction (dates, emails, IDs)

MOSTLY OBSOLETE:
  Word2Vec     → Replaced by contextual embeddings
  BiLSTM-CRF   → Replaced by Transformer models
  Seq2seq+attn → Replaced by Transformers

○ Interview Angles¶

Q: What's the difference between BERT and GPT?
A: BERT is an ENCODER that sees all tokens bidirectionally (optimized for understanding — classification, NER, embeddings). GPT is a DECODER that sees only past tokens (optimized for generation — text, code, chat). Both use Transformers, but BERT predicts masked tokens while GPT predicts the next token.
Q: Has GenAI made traditional NLP obsolete?
A: Mostly, yes. LLMs handle most NLP tasks via prompting, often better than task-specific models. However, BERT-based models survive for: (1) embeddings (BGE, E5), (2) sub-100ms classification at scale, (3) BM25/TF-IDF for initial retrieval in RAG. The field has consolidated around "one model, many tasks."

★ Connections¶

Relationship	Topics
Builds on	Neural Networks, Python For Ai
Leads to	Transformers, Embeddings, Llms Overview
Compare with	Rule-based NLP (regex, grammars), Classical ML (SVM, CRF)
Cross-domain	Linguistics, Information retrieval, Search engines

★ Recommended Resources¶

Type	Resource	Why
📘 Book	"Speech and Language Processing" by Jurafsky & Martin (3rd ed.)	The NLP bible — freely available online
🎓 Course	Stanford CS224n: NLP with Deep Learning	Best NLP course covering classical to modern methods
🔧 Hands-on	spaCy Course	Practical NLP with production-ready tools

★ Sources¶

Jurafsky & Martin, "Speech and Language Processing" (3rd ed.) — https://web.stanford.edu/~jurafsky/slp3/
Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers" (2018)
spaCy documentation — https://spacy.io
HuggingFace NLP Course — https://huggingface.co/learn/nlp-course