Why isn't a passing unit test on a tool's logic enough?

The thin wrapper between the logic and the agent can drop a field. Test the tool the way the agent calls it — through the wrapper and the response envelope — or the agent gets less than the logic returned.

Why did the build break when the tests passed?

The version changelog is a single-quoted string, and an apostrophe closed it. A test run (esbuild) tolerated it; the production build (tsc plus lint) did not. Always run the real build before pushing.

Lessons from shipping agent-facing MCP tools

A run of agent-facing tools surfaced the same handful of lessons more than once. None are exotic; each cost real time the first time. Here they are as patterns to copy.

1. Test the tool the way the agent calls it, not just its inner function

A tool's underlying function returned exactly the right value. Its unit test passed. The agent still got nothing — because the thin MCP wrapper between them cherry-picked the response and dropped the new field. The fix was an integration test that calls the tool through its wrapper and the response envelope, the way an agent actually does. It failed immediately, for the right reason.

The unit test proves the logic. The integration test proves the consumer sees the logic. Ship both.

2. The value you want to return is often already in hand

A request to "return the bumped lock version after an update" looked like it needed a second read. It didn't. The database trigger that increments the version runs before the row is written, so the row already returned by the update carried the new value — it was just buried in a nested field the agent didn't know to read. The fix was to surface it explicitly, with no extra round-trip.

Read the existing behaviour before adding a query. A surprising amount of "we need to fetch X" is "X is already on the object."

3. Make implicit behaviour explicit, in-band

Two small changes removed real friction:

An opaque field name (divergenceFlags) became a self-describing one (visits_per_year_mismatches) at the response layer — without touching the value the UI consumes.
A tool that silently applied financial defaults began returning an applied_defaults block, so the agent could see what it had been given rather than infer it.

An agent acts on what the response shows it. Surfacing the implicit is often higher-leverage than new logic.

4. The build gate catches what unit tests don't — run it

The version changelog lives in a single-quoted string. An apostrophe in a description (tool's) closed the string and broke the build — twice. The test runner tolerated it; the production build (type-check plus lint) did not.

Run the real build before every push, and judge it by its exit code or full output — never by the tail of the log, because lint errors appear early and a tail hides them.

5. On a busy main branch, expect to rebase — and preserve the other work

When other commits land between your work and your push, the changelog and version files collide. The recovery pattern:

Rebase onto their commit.
Take their version of the changelog files, then re-apply your entry on top and bump your version past the collision.
Keep their changelog entry. Don't clobber it.

Done that way, both histories survive and the version stays monotonic. A merge that overwrites someone else's changelog entry is a silent data loss.

Next steps

The safety model these tools ship inside: Making a multi-connector MCP setup safe to act on.
How agents find these tools at all: Tool discoverability for agents.