Inside the
machine
room.
You’ve driven the car. Now let’s look under the bonnet. Feature packs, phase gates, parallel reviews, the whole workflow that makes bigger builds possible.
Six things you
already know.
Say the answer out loud before you reveal it. You’ve met all of these in Parts 1 and 2.
You’ve already run half of the commands you’ll see in this deck. This isn’t new territory, it’s the same ground, one floor deeper.
Three decks,
one journey.
Before we dive in, here’s where each deck lives. Part 3 is the inside of the stages Part 2 named.
Think of it as a drawing scale: Part 1 is 1:200 (whole site). Part 2 is 1:50 (floor plans). Part 3 is 1:5 (details). Same building.
A product is
a stack of packs.
Trying to build a whole product at once is like laying out a whole hospital in one drawing. It doesn’t work. Ship It breaks every product into feature packs: a single feature’s worth of spec, contract, states, acceptance, and build plan, all in one folder.
What’s in a pack
__spec.md (what it does), __contracts.md (exact field names and API shapes), __states.md (every UI state), __acceptance.md (pass/fail tests), and, when planning begins, __build.md (the phase plan).
Vertical slices, not horizontal layers
Each phase inside a pack builds a vertical slice: a small end-to-end piece that works and is testable. Types + DB + backend + UI for one feature. Never “do all the backend, then all the UI”, that way lies endless integration pain.
Like issuing a single room in full: plans, sections, details, M&E, finishes schedule, the works. When it’s signed off, you move to the next room. The room is the unit of delivery, not the drawing type.
Before you write
a single line, audit.
Writing a spec is not the same as writing a good spec. Ship It runs five passes over every blueprint before building begins. Each pass catches different kinds of problem.
Hygiene
Terminology consistent? Names match? Cross-references resolve? Every named thing defined? Catches the silly stuff before it hardens into the build.
Navigation
A dedicated sweep just for screen-to-screen flow. Does every “tap here” lead somewhere real? Any orphaned screens? This is where UX confusion is caught.
Contract tracing
Every field in every contract traced back to the spec. If the spec says “booking has a date” and the contract doesn’t, the contract is wrong.
Flow validation
End-to-end user journeys walked through. If booking requires login but login isn’t in the flow, that’s a blocker. Found at audit, not at build.
Dry-run buildability
Can someone build this from the docs alone, without asking you questions? If the answer is “maybe”, the blueprint isn’t ready.
Result: ready to build
(not a pass, an outcome)
Only when all five passes above are clean. This discipline is the single biggest difference between projects that ship and projects that stall.
Screen inventory,
room data sheets.
Before starting any visual build, Ship It takes a deliberate pause. Two skills fire in sequence.
/screen-inventory
Produces a comprehensive list of every screen in the product: what it is, which feature pack owns it, what states it has. Once per project, at the top.
Like a room data sheet before detailed design starts.
/design-preflight
Fires before each visual phase. Enriches the builder prompt with motion specs, touch targets, glass/material choices, asset needs, and emotional tone. Closes gaps that would otherwise get winged.
Like pre-issue checklist before any drawing set leaves your desk.
Screen inventory gives you the total picture. Design preflight makes each phase specific. Together, the design never feels improvised.
Every project
needs stuff.
Icons, images, illustrations, animations, screen mockups, copy. Collectively: assets. Ship It’s /asset-forge inventories what you need and writes locked prompts for each. The trick is picking the right tool for each type.
GPT Image 2.0
OpenAI’s latest image model. Great for atmospheric, magazine-style visuals. Best when you want something photoreal or illustrated with real style.
Gemini 2.5 / Imagen
Google’s image generation. Good when you need ten variants cheaply, or want to explore styles before committing.
Stitch (via MCP)
Claude can drive Stitch directly to generate complete screens in your design system, with layout, components, and typography applied. Best for fast visual iteration.
Icons8
Curated library of thousands of icons in matching families. Drop-in consistent assets, every time. Nothing you generate will match this for uniformity.
Inline SVG (from Claude)
For charts, architectural diagrams, custom shapes. Claude writes SVG directly. Sharper, lighter, more controllable than any raster alternative.
CSS, Framer Motion, Lottie
CSS keyframes for simple reveals. Framer Motion for complex interactive UI animation. Lottie for designer-authored JSON animations imported from After Effects.
Match the tool to the task. GPT Image 2.0 for a hero photo, never for a chart icon. Inline SVG for your itinerary’s route line, never for a person’s face. Icons8 for the button glyphs, never hand-drawn.
Seven gates.
Plus a design loop.
Every phase of every pack passes seven gates before it can be called done. For visual phases, a small between-phase design loop fires too. This is how Ship It stops Claude from shipping mess.
The seven gates (sequential)
The visual loop (after visual phases)
Runs between every visual phase. Small, tight, repeatable. Design never drifts.
Two force
multipliers.
Hooks beyond the basics
Part 2 introduced three beginner hooks. Ship It’s starter pack adds a couple more worth knowing once you’re committing real work:
- credential-read-guard: blocks reads of credential files
- known-pitfalls: reminds of recorded traps before risky ops
- playbook-capture-nudge: soft reminder to capture learnings
- filing-scratch-guard: prevents writes to scratch paths
- code-quality-check: runs terminal gates after edits
Grow the set by incident. Every hook that earns a place there does so because something real went wrong once.
Parallelism: two agents, two features
/parallelism-advisor studies your plan and tells you:
- → which packs can run in parallel worktrees
- → which phases inside a pack are independent (the Diamond Pattern)
- → a ready-made worktree setup command
- → sequential vs parallel timeline comparison
LLMs make
the same mistakes
every human does.
Just faster. Dead code, repeated logic, silent failures, drifted names, scope creep. Ship It’s whole pipeline is, in effect, a set of specialised mistake-catchers. Here’s what each one catches.
These don’t run one by one. At the end of a pack, /pr-review-toolkit:review-pr spawns its six specialists in parallel, and the other three skills (/review-diff, /security-review, /diff-audit) run alongside. Most of the team reviews your work at once, in about a minute. Then you read the reports.
Four specialists.
In parallel.
At the end of every pack, Ship It runs a 4-step review. Steps 1 to 3 fire in parallel. Step 4 runs last. The whole thing takes about as long as one specialist working alone.
/pr-review-toolkit:review-pr
Six review specialists spawn in parallel: simplifier, reviewer, silent-failure hunter, test coverage, type design, comment accuracy.
/review-diff
Spec-compliance check. Catches naming drift, invented fields, and contract mismatches.
/security-review
Injection, auth bypass, crypto, data exposure, service-account leaks, dependency vulnerabilities.
/diff-audit
Scope sanity. Only intended files changed? No accidental formatting churn? File count matches the plan? Runs after the first three.
Study after study finds AI-generated code ships more security holes than reviewed human code. You can’t eye-review your way out of that. You need specialists.
What would you
do if...
One scenario. Pick your answer. Reveal to check. No grade, just noticing.
Plan the next stage
with the real tools.
This is not a rebuild. Your trip planner already works. This is a planning exercise, seeing what /spec-writer and /build-pack produce when you point them at a real growth decision. Takes fifteen minutes.
Pick one slice, feed it in.
Your trip planner could grow three ways: a memory book of past trips with photos and notes, sharing a trip with a partner so two people can tick one packing list, or a simple map view of your stops. Each of those is a feature pack in its own right: one feature’s worth of spec, contracts, states, acceptance. Pick the one you actually want. Open Claude in your trip-planner workspace, paste this with your pick filled in, and answer its questions honestly. At the end you’ll have a SPECS.md that’s better than anything you’ve written for this project before. (Drop the weather clause from the paste if you skipped the stretch goal.)
My trip planner shows one trip: name and dates, a countdown, a packing checklist, itinerary stops as stations on a line, notes, and live destination weather. The next feature is [your pick: the memory book / partner sharing / the map view]. Treat it as a single feature pack. Walk me through the five-phase spec flow.
Turn the spec into the pack.
Use the SPECS.md you just wrote. Plan [your pick] as a single feature pack, kept small.
This turns the spec into the pack: the contracts (what it does and doesn’t do), the states, the acceptance criteria. The detailed drawing for this one thing.
List the screens before anyone draws.
It lists every screen the feature adds or changes, and the states each screen can be in. Inventory before drawing, same instinct as Chapter 05.
Now the phase plan.
Use the SPECS.md and the feature pack you just wrote. Plan it as a single pack, kept small, no more than 5 phases.
Its gates now have what they need: the pack files and the screen inventory. Read what it produces. Notice the screen list came first (inventory before drawing), then done-when checklists for each phase, context budgets, pitfalls flagged from KNOWN_PITFALLS. This is what a production plan looks like.
Your feature-pack plan probably came out with 3, 4, or 5 phases. The same tool produced a 15-phase plan for the finance dashboard build, about 8,400 lines of code when it was done. Same workflow, different scale. The tools adjust; the discipline doesn’t.
Three things you couldn’t do in Part 2.
Take two minutes. Type what’s genuinely new.
(Saved in your browser.)
Ship It
is a template.
The trip planner was your practice. Now you take the same playbook and start something you actually care about. Six steps.
Remember the TfL delay-times widget, the app that tells you how your lines are running before you leave the house? It was your idea, and it’s the example this whole workshop was first built around (before the trip planner took its place in these decks). It is also exactly the right size for a first solo project: one screen, one free data feed, one person who genuinely wants it. That’s the pattern to copy. Pick something you personally care about, and keep it small. If nothing else is calling to you yet, build the widget. Two practical notes from experience: register at api.tfl.gov.uk (the activation email loves the spam folder), and pick the free 500-requests-per-minute product when it asks.
Get the playbook onto your Mac
Ship It lives in a private GitHub repository, shared by invitation. Munim invites your GitHub account (a one-time email invite; accept it and you’re in). Then ask Claude: “clone github.com/munimmoiz0-sys/ship-it into my home folder as ship-it”. It uses the GitHub login you set up in the step below.
Make a project folder
Decide the one thing you’re building (nothing too big, a tiny first feature is fine). Tell Claude: “make a new folder called my-project next to ship-it and start me in it”.
Run the starter script
One prompt does this:
Spec it first
Run /spec-writer and answer its questions honestly. It writes SPECS.md. Read what it writes before you go on; ten minutes here saves hours later.
Scaffold from the spec
Run /bootstrap-project. It reads the spec and writes your CLAUDE.md and the state/ files, and installs the working skills. Then copy in the five Tier 1 rules from templates/CLAUDE.md.starter, delete anything you’re not ready for, and keep it under 60 lines.
Plan, build, repeat
For each phase: /feature-plan, then /build-pack, then /build-team writes the code and Claude runs the 7 quality checks automatically. Ask Claude to commit when something works. End of pack: the 4-check review. Then merge and start the next pack. Same loop, forever.
Ship It doesn’t build for you, it gives you the scaffolding and the discipline. The muscle you’ve built across these three decks, that is the thing that ships your project.
Local commits live on your Mac only. GitHub is where you put them to back up, share, and (eventually) collaborate. Three one-time steps, then it’s one prompt away for every future project.
Start every repo private by default. You can flip it public later. Anything with an API key, a draft, or anyone else’s data stays private, always.
Static sites (decks, demos, landing pages) deploy to Cloudflare Pages in one command: wrangler pages deploy <folder> --project-name=<name>. The deploy script at scripts/deploy-decks.sh in Ship It is the pattern, copy and adapt.
You’re actually ready now.
You know what happens inside each stage. You know why the gates exist. You know which skill catches which kind of mistake. Nothing in a real project will be a surprise.
Your one next action
Pick one real project you care about. Run /spec-writer on it. Ship a single pack. Notice the difference.
For monthly maintenance, a quick ten minutes with /tidy /prune /playbook-feedback keeps the repo breathing.
Open Part 4: The Studio →End of the coursework. The studio is next.
Stuck on something?
Type it here. It goes straight to Munim.