How does an agent find a tool without scanning the whole list?

Call get_session_context for a categorised tool_map (the floor-plan), then search_tools(query) to find a specific tool by intent. mcp_health remains the authoritative flat catalogue.

What is the right metric for tool discoverability?

The zero-result rate on intent search. A search that returns nothing means the catalogue or the ranker did not serve the agent's intent — a measurable gap.

What happens on a search miss?

The response flags zero_results and points the agent at the request-a-tool flow, turning a dead end into a logged catalogue gap that grows the catalogue over time.

Tool discoverability for agents

A Biloh tenant exposes on the order of 230 MCP tools. No agent should find the right one by reading names. The working pattern is three moves — orient, search, measure — and the measurement is the part people skip.

How does an agent orient?

The first call on a fresh session is get_session_context. Beyond confirming the tenant and persona, it returns a tool_map: every tool the persona can use, grouped by category (clients, contractors, invoices, jobs, proposals, …), each with a one-line summary. That is the floor-plan — enough to know where a capability lives before narrowing.

The authoritative flat list stays available as mcp_health.tool_names. The map is for orientation; the flat list is the source of truth.

How does an agent find a specific tool?

search_tools("onboard a client for a recurring service") returns a ranked, persona-scoped shortlist — name, summary, category. The agent searches by what it wants to do, not by guessing a name.

One non-obvious discipline kept the ranker honest: it carries no tool-name literals. Ranking is pure lexical overlap (weighted name > category > description, with a small domain synonym map). A structural test asserts the ranker source contains no tool names, so it can't be quietly "tuned" by hard-coding the answers to the test queries — it has to generalise. The spec proves this on a held-out intent set the ranker never sees.

What metric proves discoverability actually improved?

The honest signal is the zero-result rate. If search_tools returns nothing, the catalogue or the ranker failed to serve the agent's intent — that is the gap to close, and it is the one thing worth counting.

So a miss does three things: it sets an explicit zero_results flag on the response, it records a structured tool_search metric for aggregation, and it points the agent at the request-a-tool flow. That last move closes the loop:

search miss → tool request → catalogue grows → fewer misses.

A high zero-result rate is not a failure to hide; it is a map of exactly what agents wanted that didn't exist yet.

The shape to copy

A categorised map for orientation (get_session_context), a flat catalogue for truth (mcp_health).
Intent search with a ranker that has no knowledge of the test answers.
A zero-result signal that is both measured and turned into an actionable request.

Next steps

Connection basics: Connecting Biloh over MCP.
Keeping a multi-tenant connection safe: Making a multi-connector MCP setup safe to act on.