The Context Window

I learn AI the messy way: papers, YouTube lectures, blog posts, Twitter threads. I go through all of it, then try to write what I actually understood — from scratch, in code, with interactive demos. That's The Context Window. Not a course, not a textbook — just the clearest explanation I could write for every concept I've worked hard to understand. Neural networks, transformers, reinforcement learning, reasoning models, search systems, production Python. Written from first principles. YouTube series covering these topics coming soon.

A note: things here aren't perfect — this is the way I understand things. I've also used AI heavily to help draft and structure the content, so there may be the occasional mistake. If you spot something off or have suggestions, I'd genuinely love to hear it.

NLP

Tokens, Transformers & Truth

A 15-part series covering language models, transformers, attention, RLHF, reasoning, agents, security, and more.

15 tutorials

NLP

The Transformer Deep Cuts

A 26-part deep-dive series — transformer internals, LLaMA/Mistral/Mamba architectures, Flash Attention, LoRA, quantization, RLHF/DPO, diffusion models, CLIP, multimodal VLMs, and more.

26 tutorials · 5 arcs

Deep Learning

Neurons All the Way Down

A 9-part interactive series building neural networks from scratch — from a single neuron and backpropagation all the way to reproducing GPT-2, with code walkthroughs, interactive demos, and hands-on playgrounds.

9 tutorials · micrograd → GPT-2

Finding Needles

A 26-part journey from classical information retrieval to modern agentic search — BM25, embeddings, vector databases, hybrid search, RAG, learning to rank, click models, personalization, and autonomous search agents.

26 tutorials · 7 arcs · AI-Powered Search book deep-dive

Python

Python Under the Hood

14 chapters from data types and control flow to OOP, generators, gotchas, and interview coding challenges.

14 chapters

Python

The Sandbox

Write and run Python code right in your browser. No setup needed — just open and start coding.

Pyodide · instant

Architectures

The Architecture Atlas

Visual reference of 51 LLM architectures — from the original Transformer to DeepSeek V4. Browse, zoom, search, and compare any two architectures side-by-side. Based on Sebastian Raschka's architecture diagrams.

51 architectures · gallery + compare

Architectures

Architecture Deltas: 2017 → 2026

The chronological journey from Attention Is All You Need to DeepSeek V4. 21 milestone models, six deep-dive panels (attention, MoE, normalisation, position encodings, FFN, beyond-the-transformer), and an interactive family tree. One question per stop: what new component did this model actually introduce?

21 milestones · 6 deep dives · single page

Architectures

The Evolving Transformer

One codebase, every architecture. Watch the code evolve paper by paper — from the original Transformer through LLaMA, Mixtral, DeepSeek V3, Gemma 3, Qwen3, Kimi K2, GPT-OSS, Gemma 4, to DeepSeek V4. Each stop shows exactly what changed and why, with color-coded diffs and delta cards.

16 architectures · 55+ deltas · code evolution

Course

AI Researcher Course

End-to-end preparation for Applied AI Research Engineer roles — 10 phases covering architectures, pretraining, alignment, RL for reasoning, multimodal, and interview prep. Includes a full syllabus and progress tracker.

10 phases · 6 months · syllabus + tracker

Interview

The Gauntlet

75 LLM interview questions — from foundational concepts to senior-level traps drawn from Google DeepMind, OpenAI, Meta, and Anthropic. Tap to reveal answers, track your progress, and test yourself in random quiz mode.

75 questions · 16 topics · self-test

RL + Reasoning

Reasoning from Scratch

A 12-part series on building reasoning LLMs — from chain-of-thought prompting through reinforcement learning foundations (bandits, MDPs, policy gradients, PPO) to training your own reasoning model with GRPO on GSM8K.

12 tutorials · 4 arcs · CoT → RL → GRPO → R1-Zero

Deep Learning

Decoding DeepSeek

A 12-part series building every piece of the DeepSeek architecture from scratch — attention, KV cache, MLA, RoPE, Mixture of Experts, multi-token prediction, FP8 quantization, and more. Theory, math, and code for every module.

12 tutorials · 6 arcs · from transcript to blueprint

Inference

Beyond the Forward Pass

Deep dives into LLM inference engines, serving systems, and production optimization — paged attention, continuous batching, speculative decoding, prefix caching, GPU VRAM management, and more.

1 tutorial · vLLM deep-dive

Pre-Training

The Training Ground

Deep dives into the pre-training forward pass and transformer architecture internals — tokenization, embedding, RMSNorm, GeGLU MLP, attention mechanisms, YaRN positional embeddings, hybrid masking, and the math of FLOPs and cluster sizing.

1 tutorial · dense transformer deep-dive

Research

The Paper Trail

Every research paper referenced across The Context Window — curated with brief descriptions, direct links, and tagged by blog series. Search, filter, and browse the papers behind the concepts.

80+ papers · 10 blog series · searchable & paginated

The Archives

76 posts from 2018–2024 — ML fundamentals, NLP, PyTorch, backpropagation, LoRA, speculative decoding, and more. The full early writing history, organised year by year.

76 posts · 2018 – 2024 · year-wise