Field notes · 46 published
Receipts, not slogans.
Field notes from running our own delivery on Ship. Every post earns its keep by pointing at code we merged or tore out. No thought leadership. No vibes. The long read lives in the book.
All notes
46 published
- Aside··4 min read
We're changing the industry, anyway
A small team encoded its judgment into a few thousand lines of prompt, handed the typing to a cheap model, and now the work ships itself. You are supposed to feel the floor move. From inside, it feels like almost nothing — and that, it turns out, is the whole point.
Read the note - Principle··6 min read
Human, machine, idea
Since April our changelog read like an apology — we cut the scheduler, the worker, the Inbox screen, the custom client, the expensive model. It looked like austerity. It was the opposite: we were removing glass. What is left when you remove everything removable is a human, a machine, and an idea — and that is where the truth is.
Read the note - Field note··3 min read
The 4-column footer
The header could not hold the navigation. We rebuilt the footer as four columns — Ship, Lighthouse, Harbor Gang, Contact & legal — and watched it become a navigation surface, not a graveyard for legal links.
Read the note - Method··7 min read
The PO-audience filter
Three writer agents produced 28 doc pages in parallel. A separate critic agent passed the output through one rule: would a founder skip past this sentence because it talks about plumbing they don't run? The agent-criticises-agent pattern is what made the result readable. The shape generalises.
Read the note - Field note··4 min read
The sidebar nobody noticed
The new docs pages have a sticky left nav and a sticky right TOC. We did not invent the pattern — we picked it. The post is about which IA layers to invent and which to copy wholesale from a vendor twenty times your size.
Read the note - Brand··4 min read
The cross-product ribbon
A one-line ribbon under the hero subhead on /ship and /lighthouse points at the other product. Small visible decision. Large positioning consequence. Contextual placement beats global placement when the connection is conditional.
Read the note - Method··5 min read
Freezing the run
/lighthouse/evals is an alias of the latest run. The actual Run 1 lives at /lighthouse/evals/runs/v1 and is frozen forever. When Run 2 ships, the alias moves; v1 keeps its original numbers. Cite-friendly URLs, RFC-style versioning, pulled into a public product artifact.
Read the note - Brand··4 min read
What we got wrong the first time
Most teams write a 'limitations' section and bury it in a footnote. We put 'What we got wrong' above 'Where Lighthouse loses' on the evals page. Same visual weight as the headline numbers. Admitting the dumb thing is the part that earns the right to publish the wins.
Read the note - Autopsy··7 min read
Empty versus empty: the bug that made reasoning models look smart
Our first Lighthouse benchmark scored reasoning models as competitive when both sides had returned nothing. The judge counted empty-vs-empty as a tie. Here is how we caught it, what changed when we re-ran, and why the corrected numbers say Llama 3.3 got worse.
Read the note - Autopsy··6 min read
The /docs mashup that broke the IA
One user message — 'I can't find /docs from /ship' — triggered eight person-days of IA migration. The post is about which user complaints are architectural and which are cosmetic. Architectural complaints are gifts. We were lucky we listened.
Read the note - Build in public··6 min read
The page an agent can read
We rewrote the Lighthouse pitch page with the explicit constraint that an LLM agent reading the HTML should be able to self-configure its MCP client. Then we ran the test. Here is what changed and what the new constraint asked of the writing.
Read the note - Brand··9 min read
Skills are not knowledge
The industry is publishing skill catalogs as the answer for AI agents. Recipe cards, instruction sheets, 'how to do X' packs. All of them rot in twenty minutes. The shape is wrong. Here is the argument for search over catalog, and the secret ingredient nobody else uses.
Read the note - Method··6 min read
No broken links, second time
When we replaced MkDocs in April, the constraint was simple — every old URL keeps working. We applied it again this week to reshape the entire IA. The rule isn't a one-time migration discipline. It is how a site matures without burning its inbound link graph.
Read the note - Brand··5 min read
Three deletions for every build
We counted our own field notes. Far more 'we deleted X' posts than 'we built X' posts. That ratio is the startup loop — ship a hypothesis, learn it was wrong, delete it, ship the next one. The deletion is the receipt.
Read the note - Best practice··7 min read
Two readers, two manuals
Documentation written for a human and documentation written for an agent are different artifacts. We learned that the hard way and split them. Here is what changed when we did.
Read the note - Brand··3 min read
Champagne over teal
We swapped the accent colour from teal to muted champagne gold this week. Small change in the diff. Bigger change in how the product reads. Notes on why colour is a positioning lever, not a vibe choice.
Read the note - Best practice··4 min read
Boring failures are good failures
The worst failure mode for an agent isn't a crash — it's silence. The pipeline keeps running, the dashboard stays green, and the work that was supposed to happen quietly didn't. The cure is to make failure modes mundane and named, before the agent ever runs.
Read the note - Autopsy··5 min read
The mapping table the runtime never read
We shipped an editor surface, an LLM resolver, and a PR-write flow for a config field. Twenty-four hours later we walked all of it back. The runtime didn't read the field. The story is about where config belongs.
Read the note - Manifesto··8 min read
The dark factory is closed
The dark factory — Foxconn lights out, robots making robots — is the wrong metaphor for what AI agents do to software teams. The team isn't shrinking; it's becoming the part that decides. And that part runs on more brains, not fewer.
Read the note - Best practice··6 min read
PO ideas belong in the epic, not the ticket
When the agent that writes the ticket and the agent that builds it are different processes, scope and motivation have to live somewhere both of them can read. We tried chat. We tried fat tickets. The thing that actually works is putting it in the project description and keeping the tickets thin.
Read the note - Pattern··4 min read
The cheap classifier goes first
We were paying for an LLM call every turn to detect topic shifts that the user had already announced in plain language. Adding a fifteen-line regex in front of the classifier removed the cost and made the UX more decisive at the same time. The rule generalises.
Read the note - Field note··7 min read
The agent that finished without committing
A real intake agent did the right work, in the wrong shape — it answered the ticket but never pushed a branch. The old exit protocol assumed every agent writes code. Branchless agents had nothing to commit, so we replaced the file contract with an HTTP one.
Read the note - Best practice··5 min read
Clarification is not failure
Most agent loops treat "I have a question" as the same outcome as "I crashed." Both stop the pipeline. Both look red on a dashboard. They are completely different things, and conflating them teaches agents to invent rather than ask.
Read the note - Product··7 min read
We moved the front door
Ship did not stop caring about engineers. We stopped making the engineer's tool the first thing a buyer had to understand.
Read the note - Best practice··6 min read
Policies before prompts
A good prompt asks for work. A good policy says what kind of work is allowed, what proof is required, and when the machine must stop.
Read the note - Best practice··6 min read
The Inbox is not a backlog
A backlog stores work. An Inbox stores attention. Mixing the two is how teams turn every agent question into another queue.
Read the note - Field note··5 min read
The book was written on Sunday
Prologue, manifesto, nine lettered sub-chapters, eight field notes. Every new passage anchored to a specific commit SHA from a real reference org. All of it keyed in between commits to the cloud console, on one Sunday, in a single session that started after midnight.
Read the note - Autopsy··11 min read
The catalog rename and the matrix lane
21 patterns renamed across 78 files. A new six-category scheme. Five duplicates deleted. Then a multi-pattern lane with three fan-out modes. RFC-0008, in the order it actually shipped, and why the matrix execution model fell out of the rename.
Read the note - Case study··8 min read
Wizard v2 — the art of saying no
Ten steps became three, then stayed three across seven backend rewrites in one day. A case study in keeping a flow honest while the mechanism under it moves.
Read the note - Architecture··11 min read
Lanes as config — or how we killed the workflow artifact
A full RFC, ten commits, one repo — in a single day we retired a first-class artifact kind, introduced lanes-as-config, and made shipctl run the single entry-point for everything a repo schedules. An autopsy of RFC-0007.
Read the note - Case study··11 min read
Knowledge buckets and the Distiller
Eight phases in one day — a scope ladder, a dual-written articles table, an LLM-backed ingest classifier, Notion and Linear connectors, and a per-user memory bucket that the agent can actually cite. The knowledge layer Ship needed before it could grow a second brain.
Read the note - Architecture··7 min read
We deleted the worker. The system got simpler.
Five moving parts in the morning, two by the end of the day. A worker, a Redis queue, a repo cache, and a git-sync loop — and why deleting them made the Ship Console cheaper, faster, and easier to reason about.
Read the note - Case study··9 min read
From chat to Navigator
A chat window is a failure mode dressed as a product surface. Over two days the Ship chat became a Navigator — fewer bubbles, word-by-word reveal, typed widgets, and a turn that no longer jumps the viewport. A case study in treating a surface as part of the agent.
Read the note - Autopsy··9 min read
Artifacts are frontmatter now — the RFC-0005 autopsy
Two files per artifact, one manifest, and two sources of truth that never agreed on a Monday. How we collapsed 61 artifacts into a single-file shape in one day — and why the cleanup commit mattered more than the migration.
Read the note - Build in public··11 min read
Ship — the first two weeks
189 commits. 16 days. One repo. The story of how Ship, shipctl, and the Ship Console went from an extracted folder to a running cloud platform — read off the actual git log.
Read the note - Case study··10 min read
The protocol before the product
Between the Apr 7 extraction and the Apr 19 cloud console, there is a quiet 11-day stretch that looks like nothing happened. One commit on a Sunday did the whole thing — shipctl v0.9. We wrote the protocol before we wrote the product; here is how and why.
Read the note - Origin··6 min read
How we cut Ship out of elmundi
Ship did not begin as a greenfield repo. It began as a folder inside a product called elmundi. On Apr 7 we cut it out, and twenty commits later it was a standalone thing with its own CI, its own docs, and its own CLI. This is the prequel to every other post on this blog.
Read the note - Architecture··4 min read
The methodology API — one endpoint, two consumers
shipctl reads from it. Every agent reads from it. The customer's repo never sees the source files. One HTTP API, deliberately small, cleanly versioned. The shape that made the rest of April land smoothly.
Read the note - Architecture··4 min read
Multi-tracker adapters before there were customers
We shipped Linear, GitHub Issues, Notion, Jira, and Asana adapters before any of them had a real user. It is the exception to the "build for one before many" rule. Here is why we did it anyway and what kept the cost down.
Read the note - Field note··4 min read
adopt-ship.sh and the wrong adopters
A 60-line shell script meant to make adoption frictionless. The first three people who ran it did the wrong thing in three different ways. What we learned about the gap between "easy to start" and "easy to start correctly.
Read the note - Autopsy··4 min read
The docs-mcp-server experiment we deleted
We built a Model Context Protocol server to expose the docs to agents. It worked. Then we deleted it. Why building the right thing the wrong way is sometimes worse than not building it.
Read the note - Architecture··4 min read
The repo refactor that gave shape to everything else
Three top-level folders — documentation, prompts, runtime. Each one had a different reader. Renaming and moving files for half a day made the next four weeks possible.
Read the note - Architecture··4 min read
Killed MkDocs, kept the URLs
We replaced the docs runtime in one commit. Same content, same URL paths, different stack. The constraint that did most of the work was "no broken links.
Read the note - Field note··3 min read
Translating cloud-prompts to English
The prompts that came over from elmundi were partly Ukrainian. We sat one afternoon and translated them. Three things changed; one of them was unexpected.
Read the note - Field note··4 min read
Bunny Magic Containers, three weeks of fights
Sixteen consecutive `fix(bunny)` commits. The Magic Containers API was new, the docs were thin, and our deploy pipeline learned each lesson the same way every team learns it. Here's the shape of the fight, told off the actual commit log.
Read the note - Origin··4 min read
What "extracted from elmundi" actually carried
One commit, six months of methodology, and the LICENSE file that did more work than any line of code. A look at what was actually inside the first import.
Read the note
Field notes are the receipts
The full argument lives in the book.
Each note is a single scar — one commit, one deletion, one decision. The book strings forty of them into the operating model that comes out the other side.
