I learn AI the messy way: papers, YouTube lectures, blog posts, Twitter threads. I go through all of it, then try to write what I actually understood — from scratch, in code, with interactive demos. That's The Context Window. Not a course, not a textbook — just the clearest explanation I could write for every concept I've worked hard to understand. Neural networks, transformers, reinforcement learning, reasoning models, search systems, production Python. Written from first principles. YouTube series covering these topics coming soon.
A note: things here aren't perfect — this is the way I understand things. I've also used AI heavily to help draft and structure the content, so there may be the occasional mistake. If you spot something off or have suggestions, I'd genuinely love to hear it.
A 15-part series covering language models, transformers, attention, RLHF, reasoning, agents, security, and more.
NLPA 26-part deep-dive series — transformer internals, LLaMA/Mistral/Mamba architectures, Flash Attention, LoRA, quantization, RLHF/DPO, diffusion models, CLIP, multimodal VLMs, and more.
Deep LearningA 9-part interactive series building neural networks from scratch — from a single neuron and backpropagation all the way to reproducing GPT-2, with code walkthroughs, interactive demos, and hands-on playgrounds.
SearchA 26-part journey from classical information retrieval to modern agentic search — BM25, embeddings, vector databases, hybrid search, RAG, learning to rank, click models, personalization, and autonomous search agents.
Python14 chapters from data types and control flow to OOP, generators, gotchas, and interview coding challenges.
PythonWrite and run Python code right in your browser. No setup needed — just open and start coding.
ArchitecturesVisual reference of 51 LLM architectures — from the original Transformer to DeepSeek V4. Browse, zoom, search, and compare any two architectures side-by-side. Based on Sebastian Raschka's architecture diagrams.
ArchitecturesThe chronological journey from Attention Is All You Need to DeepSeek V4. 21 milestone models, six deep-dive panels (attention, MoE, normalisation, position encodings, FFN, beyond-the-transformer), and an interactive family tree. One question per stop: what new component did this model actually introduce?
ArchitecturesOne codebase, every architecture. Watch the code evolve paper by paper — from the original Transformer through LLaMA, Mixtral, DeepSeek V3, Gemma 3, Qwen3, Kimi K2, GPT-OSS, Gemma 4, to DeepSeek V4. Each stop shows exactly what changed and why, with color-coded diffs and delta cards.
CourseEnd-to-end preparation for Applied AI Research Engineer roles — 10 phases covering architectures, pretraining, alignment, RL for reasoning, multimodal, and interview prep. Includes a full syllabus and progress tracker.
Interview75 LLM interview questions — from foundational concepts to senior-level traps drawn from Google DeepMind, OpenAI, Meta, and Anthropic. Tap to reveal answers, track your progress, and test yourself in random quiz mode.
RL + ReasoningA 12-part series on building reasoning LLMs — from chain-of-thought prompting through reinforcement learning foundations (bandits, MDPs, policy gradients, PPO) to training your own reasoning model with GRPO on GSM8K.
Deep LearningA 12-part series building every piece of the DeepSeek architecture from scratch — attention, KV cache, MLA, RoPE, Mixture of Experts, multi-token prediction, FP8 quantization, and more. Theory, math, and code for every module.
InferenceDeep dives into LLM inference engines, serving systems, and production optimization — paged attention, continuous batching, speculative decoding, prefix caching, GPU VRAM management, and more.
Pre-TrainingDeep dives into the pre-training forward pass and transformer architecture internals — tokenization, embedding, RMSNorm, GeGLU MLP, attention mechanisms, YaRN positional embeddings, hybrid masking, and the math of FLOPs and cluster sizing.
ResearchEvery research paper referenced across The Context Window — curated with brief descriptions, direct links, and tagged by blog series. Search, filter, and browse the papers behind the concepts.
Archives76 posts from 2018–2024 — ML fundamentals, NLP, PyTorch, backpropagation, LoRA, speculative decoding, and more. The full early writing history, organised year by year.