Alembic · visual-teach · EN

Local AI, done right

Eleven lessons that build a real system on your MacBook Pro M5 Max 128 GB: from @TheAhmadOsman's playbook to a quasi-frontier brain (DeepSeek-V4-Flash via DS4) + local vision (Qwen3-VL), offline and $0/token. Every number was measured on your machine or cited from source; every lesson ships diagrams, flows, and real examples.

25–34

tok/s · DeepSeek q2 brain (DS4)

122

tok/s · Qwen3-VL vision (MLX)

lessons · EN + PT-BR

per token · 100% offline

How to read this. Lessons 01–04 are the foundations (mental model, memory, bandwidth, quantization); 05–07 are the tools (engines, KV cache, DS4); 08 assembles your Mac's config; 09–11 put it to work (hands-on, vision, capstone). Each lesson is a self-contained HTML file — no network, no build — with several diagrams, real examples, and quizzes. Toggle light/dark with the ◐ Theme button.

The eleven lessons

LESSON 01 · FOUNDATION

The mental model

capacity × bandwidth × software stack. Why you pick the bottleneck, and the engine comes last.

capacity · bandwidth · stack

LESSON 02 · FOUNDATION

Memory math

VRAM ≈ params × bits/8. The quantization ladder and what fits in 128 GB.

VRAM · quant ladder · 128 GB

LESSON 03 · FOUNDATION

Bandwidth = speed

Decode tracks bandwidth, prefill tracks compute. Where the M5 Max lands and why it's usable.

bandwidth · decode · prefill

LESSON 04 · FOUNDATION

Quantization & quality

Where Q2 degrades — and antirez's asymmetric recipe that preserves tool-use.

Q2 · asymmetric · tool-use

LESSON 05 · TOOLS

Inference engines

The decision guide, "DO NOT USE Ollama", and why kernels are the real work.

MLX · kernels · vLLM/SGLang

LESSON 06 · TOOLS

The KV cache

The model's working memory — and DeepSeek-V4's compressed attention that makes 1M context viable.

KV · CSA · 1M ctx

LESSON 07 · TOOLS

DS4 / DwarfStar

antirez's C engine for DeepSeek-V4: Metal, 2/8-bit, SSD streaming — compiled on your Mac.

DS4 · Metal · SSD streaming

LESSON 08 · THE CONFIG

Your M5 Max config

The memory budget to scale, the 3 deployment modes, and the 2-Mac pool over TB5.

budget · 3 modes · 2 Macs

LESSON 09 · RUNNING IT

Hands-on

Build, download, serve and plug into Claude Code/Codex — with real outputs measured this session.

build · serve · plug

LESSON 10 · RUNNING IT

Vision, carefully

Ahmad's multimodal cautions and the 4-VLM benchmark on your real images.

VLM · benchmark · on-demand

LESSON 11 · CAPSTONE

The decade setup

The full assembled stack, homelab-as-cloud, and the ds4-legal → Previdência Factory bridge.

capstone · ds4-legal · decade