Most MCP Servers Are Bad. Here’s What We Learned Building Ours.
Building an MCP server is quick. Building one an agent can actually use takes weeks.
TL;DR
- A study of 10,831 MCP servers found that 73% had broken or repeated tool descriptions — making them unreliable for AI agents trying to pick the right tool.
- MCP quality is a UX problem, not a development problem. Building with domain expertise beats building fast.
- tchop launched its MCP server in June 2026. Here’s what we got wrong first and what fixed it.
Hundreds of MCP servers ship every week. The community is excited. We joined it last month when tchop launched its own MCP server — connecting Claude, Cursor, and Windsurf directly to a tchop organisation.
Building it forced an honest conversation: most MCP servers, including early versions of ours, are bad. Not buggy in the traditional sense. Bad at their actual job, which is helping an AI agent and the person behind it accomplish something real.
The symptom is visible immediately
Ask an agent to do something through a poorly designed MCP and you will watch it loop. It calls a tool, gets back a cryptic response, calls another tool to clarify, stalls, and eventually returns something wrong or nothing at all. A task that should take five seconds consumes hundreds of tokens and goes nowhere.
The agent is not broken. The server gave it nothing to work with.
What the data says
A study published in February 2026 analysed 10,831 MCP servers and found the pattern was consistent: 73% had tool names that repeated without variation. Thousands had wrong parameter descriptions. 3,093 tools had no description of what they return. The researchers named the pattern: “code-first, description-last.”
That pattern holds because MCP servers are mostly built by developers wrapping existing APIs. The developer knows the API. The agent has no idea.
An AI agent reads tool descriptions the way a new hire reads bad onboarding docs. It fills in the gaps with guesses. When a description says id: string, the agent does not know whether you want a UUID, a slug, an integer, or a URL. It guesses. Sometimes correctly.
Standard-compliant MCP descriptions achieved a 72% tool selection probability in competitive scenarios where multiple tools were available. The baseline was 20%. Description quality had a bigger impact on agent behaviour than model choice.
This is a UX problem, not a development problem
Anthropic published useful guidance on this: tools for agents are a new kind of software contract between a deterministic system and a non-deterministic consumer. Designing for a developer (who reads source code, asks questions, checks docs) is completely different from designing for an agent that has only the description, the schema, and whatever context it carries from the current conversation.
The right person to design an MCP tool is not the engineer who owns the codebase. It is a domain expert who understands what users actually want to accomplish. Technical fluency is not a substitute for domain fluency. Handing MCP design to whoever owns the API is one of the most common and expensive mistakes in this space.
Good tool design means:
- Field names that tell the agent what a value looks like, not just what it is called
- Descriptions that say when to use a tool, not just what it does
- Error messages that explain what went wrong and what to try next
- Return values that give the agent what it needs to proceed, not raw database objects
One of the sharpest insights from building the tchop MCP: every tool response is a chance to guide the agent’s next step. Return a raw channel object after createChannel and the agent has to infer what comes next. Return a structured object that includes context for the logical follow-on step and the agent follows it. The server has no memory of the conversation, so inject context into every response.
Design for the human, not the agent
Most MCP design processes collapse here. Developers think about tools. They do not think about jobs to be done.
An editor at a media company does not say “call GET /channels and then POST /content with the channel_id.” She says “publish today’s briefing to the internal newsroom.” A community manager does not ask for a list of channels. He asks to send a push notification to the Hamburg team before the 9am editorial call.
The MCP server needs to bridge that gap between an API endpoint and a real job to be done.
When we built the tchop MCP, we kept asking: what would someone in communications actually prompt Claude to do? That question changes the entire tool surface. Tools designed around API endpoints become tools designed around outcomes.
What good looks like in practice
A few things we either got right on the first pass or had to iterate to fix:
Fewer tools beat more tools. An agent completing a full job in one tool call beats orchestrating three partial tools. Every extra call is a new point of failure.
Test with real prompts, not synthetic ones. Not “call createChannel with name=’test'” but “set up a channel for the Hanover editorial team.” Real prompts expose what the agent cannot infer from the schema alone.
Error messages matter more than happy paths. The agent will hit errors. “Invalid channel ID” is useless. “Channel ID must be a UUID from the /channels endpoint. Use listChannels first to retrieve it.” lets the agent recover on its own.
The integrations that hold up in 12 months are not the ones that shipped fastest. They are the ones that get the job done for the person who typed the prompt.
The bar is higher than most people realise
Building a functional MCP server takes a weekend. Building one that an agent can use reliably for real work takes weeks of testing, reading transcripts, watching where agents get confused, and going back to fix the gaps.
The tchop MCP is live. We are still improving it. If you try it and hit something that does not work the way it should, tell us — that feedback is how MCP servers actually get better.
Frequently asked questions about MCP server quality
What makes an MCP server bad for AI agents?
Bad MCP servers expose tools with vague or incomplete descriptions, field names that give no context on expected format, and error messages that do not explain how to recover. The agent fills in the gaps with guesses, which compounds into wrong outputs or infinite loops. The February 2026 arXiv study found most servers fall into a “code-first, description-last” pattern — built by developers who understand the API but do not design for how an agent reads.
Is MCP quality a technical problem or a design problem?
Primarily design. The protocol itself is sound. The failures happen in tool naming, description quality, schema constraints, and return value structure. These are UX decisions, not engineering decisions. The best MCP servers are designed by people who understand what end users want to accomplish, not just what the underlying API can do.
How does tool description quality affect agent behaviour?
A study of 10,831 MCP servers showed that standard-compliant descriptions produced a 72% correct tool selection rate in competitive scenarios, versus a 20% baseline. Functionality accuracy had the largest measurable impact. Agents rely entirely on what you write in the description — there is no other source of context.
What is the most common mistake when building an MCP server?
Wrapping an existing API endpoint-for-endpoint without redesigning the tool surface for agent use. REST APIs are designed for developers who understand the system. MCP tools need to be designed for agents that know only what the description says. One tool per user goal beats one tool per API endpoint almost every time.
How do I test whether my MCP server is good?
Use real user prompts, not synthetic API calls. Ask a colleague what they would type into Claude to get something done with your product, then run that prompt against the server. Watch where the agent loops, hesitates, or returns wrong results. Those are the description gaps. Fix the description before touching the code.