# Tool discoverability for agents

> A catalogue of ~230 tools is unusable by name-scanning. The path is: orient with a categorised map, search by intent, then measure the one honest signal — the rate at which a search returns nothing. A miss is a catalogue gap, and it should be turned into a request for the missing tool.

URL: https://biloh.com.au/docs/engineering-notes/tool-discoverability-for-agents
Category: Engineering notes | Audience: builder | Updated: 2026-06-25

A Biloh tenant exposes on the order of **230 MCP tools**. No agent should find the right one by reading names. The working pattern is three moves — **orient, search, measure** — and the measurement is the part people skip.

## How does an agent orient?

The first call on a fresh session is `get_session_context`. Beyond confirming the tenant and persona, it returns a **`tool_map`**: every tool the persona can use, grouped by category (clients, contractors, invoices, jobs, proposals, …), each with a one-line summary. That is the floor-plan — enough to know *where* a capability lives before narrowing.

The authoritative flat list stays available as `mcp_health.tool_names`. The map is for orientation; the flat list is the source of truth.

## How does an agent find a specific tool?

`search_tools("onboard a client for a recurring service")` returns a ranked, persona-scoped shortlist — name, summary, category. The agent searches by **what it wants to do**, not by guessing a name.

One non-obvious discipline kept the ranker honest: **it carries no tool-name literals.** Ranking is pure lexical overlap (weighted name > category > description, with a small domain synonym map). A structural test asserts the ranker source contains no tool names, so it can't be quietly "tuned" by hard-coding the answers to the test queries — it has to generalise. The spec proves this on a *held-out* intent set the ranker never sees.

## What metric proves discoverability actually improved?

The honest signal is the **zero-result rate**. If `search_tools` returns nothing, the catalogue or the ranker failed to serve the agent's intent — that is the gap to close, and it is the one thing worth counting.

So a miss does three things: it sets an explicit `zero_results` flag on the response, it records a structured `tool_search` metric for aggregation, and it points the agent at the **request-a-tool** flow. That last move closes the loop:

> search miss → tool request → catalogue grows → fewer misses.

A high zero-result rate is not a failure to hide; it is a map of exactly what agents wanted that didn't exist yet.

## The shape to copy

1. A categorised map for orientation (`get_session_context`), a flat catalogue for truth (`mcp_health`).
2. Intent search with a ranker that has no knowledge of the test answers.
3. A zero-result signal that is both measured and turned into an actionable request.

## Next steps

- Connection basics: [Connecting Biloh over MCP](/docs/reference/mcp-overview).
- Keeping a multi-tenant connection safe: [Making a multi-connector MCP setup safe to act on](/docs/engineering-notes/multi-tenant-mcp-safety).