Claude Cowork said nobody replied. The reply was message 28.
An AI agent told me a thread had gone quiet. The reply was message 28 of 28, hidden by a truncated search. Why AI tools fail where they fetch reality, and the one-line rule that fixes it.
I run a daily reconciliation: an AI agent cross-checks my open tasks against my email, so that anything I'm waiting on either gets ticked off or flagged. One morning it told me a particular thread had gone quiet — no reply from the other side, still waiting. It was wrong. The reply had landed days earlier. It was the twenty-eighth message in a twenty-eight-message thread.
That gap is worth understanding, because it's not a stupid mistake. It's a subtle one, and the subtle ones are the ones that bite you.
The tool didn't lie. It got handed half the page.
When you search email through an API, a long conversation doesn't come back whole. The search results return a preview — and on this tool, the preview was the oldest few messages in the thread, not the newest. So a thread that perfectly matched my query showed up, the agent read the snippet, saw an old back-and-forth, and reasonably concluded nothing new had happened. The latest reply was there the whole time. It just wasn't in the slice the search handed back.
The data was present. The retrieval was partial. And the agent, working from a truncated view, gave me a confident answer that was exactly wrong.
The agent wasn't hallucinating. It was reasoning correctly about the wrong half of the conversation.
This is the trap with AI agents that everyone underestimates. We've trained ourselves to watch for the model making things up. We watch far less for the model reasoning impeccably over incomplete input — which is arguably more dangerous, because the output looks sound. There's good research showing that bigger, more capable models don't necessarily get more reliable; they get more fluent and more willing to answer, which means a wrong answer arrives wearing a very convincing suit.
The fix was a rule, not a smarter model
The repair had nothing to do with a better AI. It was a procedure. For reconciliation, the agent is now forbidden from deciding a thread's status off the search preview. It has to open the full thread and read the last message before it concludes anything. I also widened the lookback window to seven days, so a weekend or a few travel days can't let a reply slip through the gap.
That's it. "Never judge from the snippet, always read the latest message." A one-line rule, born from knowing how email threads actually behave and how this specific tool actually returns them. The intelligence didn't need upgrading. The workflow did.
If you've followed my AI-tooling notes, this is a recurring theme. The connectors that let an AI reach into your real systems — what the industry now standardises as the Model Context Protocol — are genuinely powerful, but each one has a personality. It paginates a certain way, truncates a certain way, orders results a certain way. You only learn those edges by getting burned once and writing down the lesson. It's the same reason I keep a fix that runs every session rather than hoping the tool behaves, and the same distinction I drew between skills and agents: the value isn't the raw capability, it's the encoded knowledge of how to use it without getting fooled.
The pattern under all of this
When an AI tool gives you a wrong answer, the instinct is to blame the model. Usually the model is fine. The problem is upstream — at the boundary where the tool fetches reality and hands it over in pieces. The model can only reason about what it was given, and what it was given is rarely the whole truth.
So the operator's job isn't to find a smarter agent. It's to know exactly where each tool quietly drops information, and to build the guardrail that forces the full picture into view. Mine cost me one missed reply and ten minutes of writing a rule. Cheap tuition, as these things go.
The agent will always sound certain. Your edge is knowing which certainties to double-check.
Sources & further reading
External
Anthropic — Introducing the Model Context Protocol
Nature (via PMC) — Larger and more instructable language models become less reliable
Related posts
Claude Cowork keeps hanging? Here's the fix that runs every session
Claude Skills vs Claude Agents: the difference that actually matters
I pointed the Meta MCP at a real ad account — here's what it found in ten minutes