# The gap between a green build and a live deploy

> A passing local build proves your code compiles — nothing more. The expensive failures in building this documentation corpus all lived in the gap between "it compiles" and "it's live": a deploy-time security gate, a missing shell variable that silently broke two tools, a popular framework that did not fit the stack, and a serverless function that could not read its own files. The defense is to verify across every boundary you cross.

URL: https://biloh.com.au/docs/engineering-notes/green-build-vs-live-deploy
Category: Engineering notes | Audience: builder | Updated: 2026-06-25

A passing local build tells you your code is syntactically valid. It does **not** tell you the thing will deploy, run, or be reachable. The most expensive hours of building this very documentation system were all spent in the gap between *"it compiles"* and *"it's live"* — and every one of them had the same shape: an assumption that held in one environment and quietly broke in the next.

## A green build is a claim about syntax, not about shipping

A local production build passed cleanly. The deploy was then rejected — by a security gate that refused a known-vulnerable transitive dependency (a CVE in a markdown library), something the compiler never checks. The fix was a one-line patch bump; the lesson is durable: **"build passed" and "deploy succeeded" are independent claims**, and only the live deploy is authoritative. If your workflow lands on a single branch and every push deploys, treat the green deployment — not the green build — as your definition of done.

## When a tool goes silent, suspect the shell before your code

Adding one dependency failed with a cryptic argument-type error, and the test runner produced *no output at all*. It looked like a corrupted install, and an hour disappeared into deleting and reinstalling things. The actual cause was a single missing environment variable — the path to the system command interpreter. Without it, both the package manager and the test runner could not spawn the child processes they rely on, and both failed in ways that mimicked broken code.

The principle: **when several unrelated tools fail to *spawn* subprocesses, the common cause is the shell environment, not any one tool.** Check the environment before you start deleting your dependencies.

## A "production" install quietly removes your build tools

Installing one new package in a shell that defaulted to production mode pruned the project's dev dependencies — including the CSS toolchain the build needs — and the next build failed with *module not found* for things that had been fine an hour earlier. Installs are environment-sensitive: when you add a package in a context that might be production-mode, force dev dependencies in (or set the environment explicitly) so the tool you just removed doesn't take the build down with it.

## Pick the engine for the stack you have, not the one the field recommends

The most-recommended documentation framework required a major version of the web framework, the UI library, *and* the CSS toolchain that the project didn't run. Adopting it meant a high-risk upgrade of three foundations to ship a docs page. The right call was the boring one: a small, native library that matched the existing stack exactly. **Verify peer-dependency compatibility before you adopt the popular tool** — "best in class" is relative to your constraints, and the migration you avoid is the bug you don't ship.

## A serverless function can't read a file you didn't bundle

A request-time route that read content files off disk worked locally and returned a 500 in production: the platform's output file-tracing can't follow a path computed at runtime, so the files were never shipped into the function. Two fixes work — declare the files for explicit inclusion, or make the route **static** so it reads at build time and serves a cached result. Runtime filesystem access is a deployment concern, not just a code concern.

## The data you parse is rarely the type you assumed

An author wrote a date the natural way — unquoted — and the YAML parser handed back a `Date` object where the schema expected a string, failing the build. The fix was not to scold the author; it was to **coerce at the schema boundary**. Normalize inputs where they enter your system and be generous about the formats real authors will actually use. A validator that rejects the obvious, correct thing is a validator that will be worked around.

## The thread that ties them together

Every one of these was a *boundary* failure: code → shell, local → deploy, build-time → run-time, parser → schema. A green check on one side of a boundary is not a green check on the other.

The cheap, durable defense is two-part. First, **verify across the boundary you're about to cross** — run the thing in the environment that will actually run it, and treat the live deployment as the only completion signal. Second, **make the crossing safe**: a pre-push hook that runs the full build means a broken build is rejected at the push and never reaches production; a schema that coerces author-friendly input means a real author's first draft doesn't bounce; and a one-command check (here, `npm run docs:check`) that validates the cheap things — schema, links — in seconds means the slow boundary, the full build and deploy, only runs on changes already likely to pass.

None of this is exotic. It's the same discipline as a good test suite, applied to the seams between systems rather than the inside of one.

## Related

- [Lessons from shipping agent-facing MCP tools](/docs/engineering-notes/lessons-shipping-agent-tools)
- [Making a multi-connector MCP setup safe to act on](/docs/engineering-notes/multi-tenant-mcp-safety)
- [What is Biloh?](/docs/getting-started/what-is-biloh)