Blog

Guides and posts by roadmap step

First Break AI cohort blog — step-by-step guides that follow the roadmap. Run models locally, understand inference, and build in the open.

Cohort guides and posts are organized by the Roadmap. Start with the step you’re on.

First use of AI for coding

Set up a Quarto blog, GitHub, and your AI IDE. Posts for this step are still in progress — follow the roadmap for the current task list.

Step 1

First use of AI for coding

Set up a Quarto blog, GitHub, and use your AI IDE.

See Roadmap

Run a model locally

Run Qwen3 0.6B on your laptop and trace every operation from tokenization to token output — first in pure C, then through the HuggingFace transformers API.

Step 2

Run Qwen3 0.6B in pure C

Run Qwen3 0.6B in pure C. Tokens, BPE, chat templates, attention, KV cache — all in one C binary.

Read guide

Step 2

GGUF vs SafeTensors

Model weight formats compared. GGUF, SafeTensors, pickle security, quantization, mmap, and why we start with pure C.

Read guide

Inference deep dive

Beyond running a model — inference engines, batching, quantization formats, and serving. Start with the Lesson 2a video: llama.cpp server + Cline.

Lesson 2a · Step 3

Run a Coding AI on Your Laptop

Serve Qwen2.5 Coder with llama.cpp server and wire Cline to a local OpenAI-compatible endpoint — free, private, no API key.

Watch lesson

Training fundamentals

Read training curves like an engineer, then train your own. Distributed training, loss spikes, and what real LLM W&B dashboards actually show you.

Step 4

Reading the Curves (LLM training)

W&B checklists, tokens vs steps, OLMo checkpoints, Marin 32B and QK-Norm warm-start—how real LLMs learn, spike, and stabilize.

Read guide