Method

Freezing the run

/lighthouse/evals is an alias of the latest run. The actual Run 1 lives at /lighthouse/evals/runs/v1 and is frozen forever. When Run 2 ships, the alias moves; v1 keeps its original numbers. Cite-friendly URLs, RFC-style versioning, pulled into a public product artifact.

Denys Kuzin··5 min read·methodlighthouseevalsiabuild-in-public

The rule we wrote down before publishing the first Lighthouse run was a sentence: we never overwrite a published run. If a number on the page is wrong tomorrow, we do not edit the page. We publish a new run and link forward. The page that carried the original number keeps carrying it.

That rule needed a URL shape, and the shape we used is the alias.

/lighthouse/evals is an alias of the latest run. The actual Run 1 lives at /lighthouse/evals/runs/v1 and is frozen. Today both routes render the same component, because v1 is the only run that exists. When Run 2 ships, only the alias updates. The /runs/v1 URL keeps the original numbers, the original autopsy, the original everything. Run 2 publishes at /runs/v2 with its own permanent URL. Run 3 at /runs/v3. The alias is always pointing at the newest one; the numbered routes never move.

The RFC pattern

The model we borrowed from is the one the IETF uses. RFC 9110 always means RFC 9110. If the working group revises the spec, they publish a new RFC with a new number. The old document is still there, still citable, still the thing a 2014 paper meant when it cited it. Nobody has to ask "which version of RFC 9110 did you mean" because there is only one. Drafts get new numbers; numbers do not get new drafts.

We borrowed that for evals. A run is a draft that became a document. Once it is published, it gets a number and stops moving. The next set of numbers is the next document.

This is a small idea. It is also the difference between a page a reader can cite and a page they have to screenshot.

Three things this gets us

The first is cite-friendly URLs. If a Substack post links to /lighthouse/evals/runs/v1 for the +10.7 point Gemini lift, that link resolves to the same numbers six months later. We did not move the page out from under the citation. If a paper references the autopsy, the autopsy is still at the URL the paper used. The reader who clicks through in 2027 gets the artefact the writer was looking at when they linked.

The second is honest version history. A reader who wants to see how the numbers moved between runs can read both. They can compare v1 to v2 directly, in their browser, by changing one segment of the URL. The page about Run 2 does not have to claim what Run 1 said; Run 1 is one click away, saying it itself. That is a different shape than a changelog at the bottom of a single page that swallowed three editions of its own content. The changelog approach asks the reader to trust the summary. The frozen-run approach gives them the source.

The third is discipline. We have already had one moment internally where someone wanted to fix a small thing on the v1 page — a copy edit, a clearer caption. The rule is the same as for the numbers. If the published artefact is wrong, the fix is the next artefact. The constraint forces every revision to be visible, and forces us to decide whether the fix is worth a new run or not worth doing at all. Most of the time it is the second thing, which is the correct answer.

What it costs

Once Run 2 publishes, every frozen run picks up a small banner: This is Run 1 — a newer run is available. No redirect. No content change. The banner is the only thing the v1 page learns after publication, and it learns it from a config table, not from an edit to the document. A reader on /runs/v1 knows they are reading the older artefact and can jump to the alias if they want the latest.

The other cost is the discipline itself, which is harder to enforce than to describe. The temptation to edit Run 1 will keep showing up — a typo, a footnote, a clearer chart. Each one is a small thing on its own and a category error in aggregate. We treat the rule the same way we treat empty versus empty in the data layer: the contract is the thing, and the contract is enforced by refusing the small convenient violation.

Generalising

The pattern is not specific to evals. Most product pages have at least one load-bearing number on them — pricing tiers, customer counts, benchmark results, version coverage — and most of those numbers are silently editable today. A page that carries a number a reader might quote is a page that should be cite-friendly first and editable second. The frozen run is one application; the principle is older than the run.

The long read

This is one field note. The full argument lives in the book.

Read the book