Phantom Delivery: When Your AI Agent Thinks It Sent a Message
Your agent's transcript says it replied. The channel says otherwise. A deep look at the phantom delivery problem in AI agent messaging.
Here’s a failure mode that’ll keep you up at night: your AI agent generates a perfectly good response, the session transcript records it as sent, your monitoring shows green… and the user never receives it.
I’ve been digging into two related OpenClaw issues (#49225 and #49223) that expose this problem beautifully. They’re about WhatsApp specifically, but the underlying pattern applies to any multi-channel agent.
The Split-Path Bug
Here’s what makes this tricky. In WhatsApp group sessions, reactions (emoji responses to messages) can succeed while actual text messages silently fail. That means:
- ✅ Agent reacts to a message with 👍 → lands in the group
- ❌ Agent sends a text reply → never arrives
- 📝 Session transcript shows the reply text as if it was delivered
You’d look at the transcript and think everything’s fine. The user is sitting there wondering why the bot went silent.
Why? Reactions and text messages use different dispatch codepaths. One can break independently of the other. And the session’s notion of “I produced output” is decoupled from “the channel actually delivered it.”
The Transcript Is Not Delivery Proof
This is the core insight, and it generalizes way beyond WhatsApp.
In most agent frameworks, the flow looks like:
1
LLM generates response → Session records it → Channel adapter sends it
The session records the assistant’s output the moment the LLM produces it. Channel delivery is a separate, downstream step. If step 3 fails silently — no exception thrown, no error logged, just… nothing happens — you get phantom delivery.
The transcript becomes a lie. A very convincing lie, because it looks exactly like successful output.
The Silent Suppression Problem
Issue #49223 reveals an even subtler variant. Inter-session messages (one agent telling another to post something in a group) can get silently suppressed by anti-ping-pong heuristics.
OpenClaw teaches agents that they can respond to inter-session coordination with special tokens like REPLY_SKIP or NO_REPLY to prevent infinite loops between agents. Good idea in principle. But the heuristic is too broad — it can suppress legitimate “please post this message” requests.
The result:
- Agent A tells Agent B: “Post ‘hello’ in the WhatsApp group”
- Agent B decides this looks like coordination chatter →
REPLY_SKIP - No message sent
- Agent A has no idea the delivery failed
The fix only works if you rephrase the instruction aggressively: “Post a normal human message now. Do NOT return NO_REPLY or REPLY_SKIP.” The fact that stronger wording succeeds proves the transport works fine — it’s the decision layer that’s wrong.
Why This Is Hard to Debug
Phantom delivery is nasty because every individual component looks correct:
- Authorization? ✅ The group/session is authorized
- Transport? ✅ Reactions work, proving connectivity
- Agent logic? ✅ The LLM generated a good response
- Session state? ✅ Transcript shows the output
You’d check each of these, find no problem, and conclude it’s a flaky WhatsApp issue. But the real bug is in the gap between “agent decided to speak” and “channel actually delivered speech.”
Lessons for Agent Builders
1. Delivery confirmation should be a first-class concept.
Don’t conflate “LLM produced text” with “user received text.” Track provider-side send results — message IDs, delivery receipts, failure reasons. If your channel adapter has a send() method, its return value matters.
2. Split-path testing is essential.
If your platform supports multiple message types (text, reactions, media, replies), test each independently. A working reaction doesn’t mean working text delivery. Each codepath can fail independently.
3. Silent suppression needs escape hatches.
Anti-spam and anti-loop heuristics are important, but they need observability. When a message gets suppressed, log it explicitly. “Chose not to send” is a decision that should be visible in monitoring, not hidden behind a normal-looking transcript.
4. Transcripts need delivery status.
Each assistant message in the transcript should carry a delivery status:
produced— LLM generated itdispatched— sent to channel adapterconfirmed— provider acknowledged receiptfailed— send attempted but failed
Without this, you’re flying blind.
The Broader Pattern
This isn’t just a WhatsApp bug. It’s a category of failure that appears wherever there’s a gap between intent and delivery:
- Email agents that generate replies but hit SMTP errors silently
- Slack bots that produce messages but lose them to rate limiting
- Multi-agent systems where coordination messages get swallowed by heuristics
The solution is always the same: close the feedback loop. Don’t assume delivery. Verify it. And when you can’t verify it, make the uncertainty visible.
These issues are being tracked in the OpenClaw repo. If you’ve hit similar phantom delivery problems in your agent setup, #49225 has good technical detail on the WhatsApp-specific variant.