Blog
Guides and posts by roadmap step
Cohort guides and posts are organized by the Roadmap. Start with the step you’re on.
Step 1 First use of AI for coding
Set up a Quarto blog, GitHub, and your AI IDE. Posts for this step are still in progress — follow the roadmap for the current task list.
Step 1
First use of AI for coding
Set up a Quarto blog, GitHub, and use your AI IDE.
Step 2 Run a model locally
Run Qwen3 0.6B on your laptop and trace every operation from tokenization to token output — first in pure C, then through the HuggingFace transformers API.
Step 2
Run Qwen3 0.6B in pure C
Run Qwen3 0.6B in pure C. Tokens, BPE, chat templates, attention, KV cache — all in one C binary.
Step 2
GGUF vs SafeTensors
Model weight formats compared. GGUF, SafeTensors, pickle security, quantization, mmap, and why we start with pure C.
Step 3 Inference deep dive
Beyond running a model — inference engines, batching, quantization formats, and serving. Start with the Lesson 2a video: llama.cpp server + Cline.
Lesson 2a · Step 3
Run a Coding AI on Your Laptop
Serve Qwen2.5 Coder with llama.cpp server and wire Cline to a local OpenAI-compatible endpoint — free, private, no API key.
Step 4 Training fundamentals
Read training curves like an engineer, then train your own. Distributed training, loss spikes, and what real LLM W&B dashboards actually show you.
Step 4
Reading the Curves (LLM training)
W&B checklists, tokens vs steps, OLMo checkpoints, Marin 32B and QK-Norm warm-start—how real LLMs learn, spike, and stabilize.