The Context Window

Not a course. Not a textbook. A complete interactive universe spanning neural networks, transformers, reinforcement learning, reasoning models, search systems, and production Python — where every concept becomes working code, right in your browser.

NLP

Tokens, Transformers & Truth

A 15-part series covering language models, transformers, attention, RLHF, reasoning, agents, security, and more.

15 tutorials
NLP

The Transformer Deep Cuts

A 26-part deep-dive series — transformer internals, LLaMA/Mistral/Mamba architectures, Flash Attention, LoRA, quantization, RLHF/DPO, diffusion models, CLIP, multimodal VLMs, and more.

26 tutorials · 5 arcs
Deep Learning

Neurons All the Way Down

A 9-part interactive series building neural networks from scratch — from a single neuron and backpropagation all the way to reproducing GPT-2, with code walkthroughs, interactive demos, and hands-on playgrounds.

9 tutorials · micrograd → GPT-2
Search

Finding Needles

A 26-part journey from classical information retrieval to modern agentic search — BM25, embeddings, vector databases, hybrid search, RAG, learning to rank, click models, personalization, and autonomous search agents.

26 tutorials · 7 arcs · AI-Powered Search book deep-dive
Python

Python Under the Hood

14 chapters from data types and control flow to OOP, generators, gotchas, and interview coding challenges.

14 chapters
Python

The Sandbox

Write and run Python code right in your browser. No setup needed — just open and start coding.

Pyodide · instant
Architectures

The Architecture Atlas

Visual reference of 51 LLM architectures — from the original Transformer to DeepSeek V4. Browse, zoom, search, and compare any two architectures side-by-side. Based on Sebastian Raschka's architecture diagrams.

51 architectures · gallery + compare
Architectures

Architecture Deltas: 2017 → 2026

The chronological journey from Attention Is All You Need to DeepSeek V4. 21 milestone models, six deep-dive panels (attention, MoE, normalisation, position encodings, FFN, beyond-the-transformer), and an interactive family tree. One question per stop: what new component did this model actually introduce?

21 milestones · 6 deep dives · single page
Architectures

The Evolving Transformer

One codebase, every architecture. Watch the code evolve paper by paper — from the original Transformer through LLaMA, Mixtral, DeepSeek V3, Gemma 3, Qwen3, Kimi K2, GPT-OSS, Gemma 4, to DeepSeek V4. Each stop shows exactly what changed and why, with color-coded diffs and delta cards.

16 architectures · 55+ deltas · code evolution
Course

AI Researcher Course

End-to-end preparation for Applied AI Research Engineer roles — 10 phases covering architectures, pretraining, alignment, RL for reasoning, multimodal, and interview prep. Includes a full syllabus and progress tracker.

10 phases · 6 months · syllabus + tracker
Interview

The Gauntlet

75 LLM interview questions — from foundational concepts to senior-level traps drawn from Google DeepMind, OpenAI, Meta, and Anthropic. Tap to reveal answers, track your progress, and test yourself in random quiz mode.

75 questions · 16 topics · self-test
RL + Reasoning

Reasoning from Scratch

A 12-part series on building reasoning LLMs — from chain-of-thought prompting through reinforcement learning foundations (bandits, MDPs, policy gradients, PPO) to training your own reasoning model with GRPO on GSM8K.

12 tutorials · 4 arcs · CoT → RL → GRPO → R1-Zero
Deep Learning

Decoding DeepSeek

A 12-part series building every piece of the DeepSeek architecture from scratch — attention, KV cache, MLA, RoPE, Mixture of Experts, multi-token prediction, FP8 quantization, and more. Theory, math, and code for every module.

12 tutorials · 6 arcs · from transcript to blueprint
Inference

Beyond the Forward Pass

Deep dives into LLM inference engines, serving systems, and production optimization — paged attention, continuous batching, speculative decoding, prefix caching, GPU VRAM management, and more.

1 tutorial · vLLM deep-dive
Pre-Training

The Training Ground

Deep dives into the pre-training forward pass and transformer architecture internals — tokenization, embedding, RMSNorm, GeGLU MLP, attention mechanisms, YaRN positional embeddings, hybrid masking, and the math of FLOPs and cluster sizing.

1 tutorial · dense transformer deep-dive
Research

The Paper Trail

Every research paper referenced across The Context Window — curated with brief descriptions, direct links, and tagged by blog series. Search, filter, and browse the papers behind the concepts.

80+ papers · 10 blog series · searchable & paginated