The creature, end to end
Everything in this course exists to fill in one of the boxes below. Watch the signals move: voice in teal, body in clay, the orchestrator's creature-state in amber.
Two load-bearing invariants
Two design decisions shape every page and every lab. If you remember nothing else, remember these.
The Pi 5 runs only the LeKiwi host daemon — motor I/O and camera streaming. A GPU host runs the VLA and the interaction model. The phone is the face and audio I/O.
"Phone runs everything on-device" is the north-star architecture — but it is not a Day-14 requirement. Tether first, ship the creature, then push compute toward the edge later. Pretending the Pi can run SmolVLA + Moshi is the fastest way to ship nothing.
LeKiwi's action vector already contains the holonomic base velocities (x, y, θ) alongside the arm joints. So a teleop demo can record driving up to the object AND grasping it as one continuous motion.
A single SmolVLA then learns approach + manipulation end-to-end. That is why Track 3's navigation needs no motion planner, no SLAM, no nav stack for the demo — the policy that grasps is the same policy that drove there.
The recurring principle: fast / slow decoupling
This pattern appears on nearly every page. A fast reactive loop keeps the creature alive and conversational; a slow deliberative loop does the heavy thinking. They run async — the fast loop must never block on the slow one.
If the dialogue blocked on the policy, the creature would freeze mid-sentence every time it thought about moving — and feel dead. Decoupling is what makes it feel alive. You will implement this exact split in 6 · The Creature.
The learning path
Eight pages. Each one derives the theory, then hands you a runnable lab with a "done when" gate. Three of them assemble directly into the three build tracks.
What learning a policy even means
How modern policies output motion
SmolVLA, dissected and deployed
Imagining before acting
A voice that listens while it talks
One mind from many models
The 14-day sprint, scheduled
Earn the intuition
Three tracks, one creature
The capability pages cluster into three build tracks. Each track owns one color across the whole course, in diagrams and badges alike.
The 14-day timeline
Week 1 builds each capability in isolation to a hard gate. Week 2 fuses them and ships. You are here: Day 0.
Full per-day runbook with commands and exit criteria lives in 7 · Plan & Demo.
The study spine
The theory backbone of ANIMA-Kiwi is ETH's Robot Learning: From Fundamentals to Foundation Models (Oier Mees, Spring 2026). Every topic page weaves in the relevant lectures and distills the key papers, then bends them toward what you actually need to build.
You don't watch the lectures and then build. You build, and the lectures explain why the thing you just ran behaves the way it does.
How to use this course
1 · Read the theory
Each page leads with a diagram or a worked equation. Derive, don't memorize. Every theory section ends with a maps to my build line tying it back to the creature.
2 · Run the lab
Every lab is numbered, runnable steps with expected output. Nothing is hand-wavy — if it's on the page, you can execute it on your hardware.
3 · Check the gate
Each lab ends with a "Done when" checklist. Don't advance until it's green. Gates are what keep a 14-day sprint from quietly slipping to 40 days.
The tooling moves fast. Before running any lerobot, moshi, or
pipecat command, verify the current flags with --help on your installed version —
argument names and subcommands change between releases. When in doubt, trust --help over this page.
- Confirm your hardware list: LeKiwi base + SO-101 arm, Raspberry Pi 5, 2× USB cameras, a phone, and a GPU host on the same network.
- Read 1 · Foundations — understand covariate shift before you record a single demo.
- Skim 7 · Plan & Demo so you know what Day 7 and Day 14 demand of you.
- Pick your single demo task now (e.g. "drive to the mug and pick it up while we chat"). Constrain it. Smaller is shippable.
Done when:
- Every hardware item is accounted for and on one network.
- You can state your one demo task in a single sentence with fixed-ish object positions.
- You understand the fast/slow split and both invariants well enough to explain them.