TopHat is a .NET middleware that sits in front of Anthropic and OpenAI and trims tool-result payloads before they cost you input tokens. The problem it solves is pretty common once you start building anything serious with LLMs: tool calls return big chunks of JSON, most of which the model never actually needs, and you end up paying full price for every byte of it on every turn.
Instead of just dropping data and hoping the model does not miss it, TopHat scores the items in a tool result for relevance, keeps the top N, summarizes what got cut, and quietly stands up a retrieval tool the model can use to ask for the dropped items if it actually needs them. This all happens transparently; from the application’s perspective, it is still just talking to Anthropic or OpenAI like normal.
In an 8-fixture eval against live providers, it cut Anthropic costs by ~54% and OpenAI costs by 61% on the same workload. Savings vary a lot depending on what your tools return, but the pattern is consistent: the more your tools spit out partially relevant data, the more TopHat saves you.
TopHat hooks into the HttpClient pipeline as a DelegatingHandler, which means it slots in cleanly underneath the official Anthropic and OpenAI SDKs without any application changes beyond registering the handler. The interesting pieces are:
tophat_retrieve tool into the request. When the model calls that tool, the handler intercepts the call in-band and returns the elided items without ever round-tripping back to the application.Still deciding how I want this project to go. This may eventually become a NuGet package, but for now, it’s purely just a repository. The source is on GitHub for anyone who wants to read through the implementation or contribute.