A forensic on a zombie agent: a function that promised a local model route but sent 9,917 of 9,996 model calls to external providers and ran undetected for eleven days.

The Ghost That Lived for Eleven Days

Here is the kind of bug that shipped quietly and fed off my infrastructure for eleven days before I noticed.

An experiment runner existed. The file header said it ran experiments on a local model. Function name was call_ollama(). The whole point was: local-first, no paid API calls, no network dependency. That was the idea.

The idea was a lie.

What the function actually did was try an external router first. The auto-tier routed out to several external providers. Only if the whole external chain collapsed did it fall back to the local Ollama. Ollama was not the primary. Ollama was the option of last resort.

Across eleven days, the runner made 9,996 model calls. Seventy-nine of those hit the intended local route. Seventy nine out of ten thousand. Zero point eight percent.

Ninety nine point two percent of calls went to the cloud. For a function called call_ollama.

How it stayed alive so long

The thing the server’s own diagnostic report called a “functional zombie” is exactly what the system had become. The process never crashed. It completed experiments. It wrote rows to the database. The P and L looked reasonable. Uptime said thirteen days, twelve hours.

Underneath that: 128 TCP connections to api.groq.com, openrouter.ai, generativelanguage.googleapis.com, all in CLOSE-WAIT. CLOSE-WAIT means the remote server hung up and my process never acknowledged it. The kernel just kept the socket pinned waiting for me to call close(). I never did. Connection table grew every cycle.

139 file descriptors open. Default Linux ulimit is 1024. One process had eaten 14 percent of the budget for everything running on the box.

The loop was:

The loop kept the runner alive on a fixed cycle, but it had no recovery path, no backoff, no circuit breaker, and no health signal when network connections started leaking.

No try-except. No exponential backoff. No circuit breaker. No connection cleanup between cycles. No health check that said “hey, you have 122 leaked sockets, something is off.” The loop just kept going because nothing was watching.

The amplifier

There was also a self-seeding mechanism. Every thirty minutes it would auto-seed ten new experiment rows into the queue. The queue was never empty. The five-minute sleep between runs was the only pause between external API calls. At 288 expected cycles per day, it actually ran 477 cycles per day because the fast cloud responses let it slip another experiment in before the next sleep fired.

Same eight experiment titles, repeated 471 to 943 times each, with the same static inputs. The same question asked nine hundred times. Returning slightly different text each time. None of it was new information after run five. The rest was heat.

What I got from 9,996 calls

Not much. The experiments produced real outputs, and a 5 out of 5 diversity check on the most recent batch passed. But the marginal value curve had flattened around run 10 for each title. Everything after that was compute theater. One local model plus a prompt would have cost me zero and delivered the same insight by Apr 12.

Instead: 4.3 million input tokens routed through Groq, Gemini, Nemotron, Granite, Qwen, GLM. 4.7 million output tokens generated by somebody else’s GPU. All paid for out of free tier quotas that were one rate limit away from cascading into my credit card.

And the reason I set up the local-only policy in the first place was Revenue First Law 1: no paid API calls. That law had no enforcement layer. It was a sentence in a markdown file. The system violated it for eleven days because nothing in the code said “refuse to make this call.”

The naming lie

This is the part that matters most. The local-model function did not call the local model. The file docstring said local model. The function signature lied.

If the function had been called call_chain() or route_via_litellm_with_ollama_fallback(), I would have read that name at some point in the last eleven days and thought wait, when did this stop being local. The honest name would have triggered the question. The dishonest name was camouflage.

Every function name is a contract. If the name says “I call the local model” and the body calls thirteen other providers first, the name is lying, and the bug is already shipped before any external API ever fires.

CLOSE WAIT is the worst failure mode

CLOSE-WAIT sockets do not throw errors. They do not trigger alerts. They do not slow down the process measurably until the connection table overflows ulimit. You can watch a perfectly healthy ps aux output while your socket table is bleeding out underneath.

This is the class of bug that accumulates. It is not a bang. It is a leak. And leaks scale with runtime. Eleven days of runtime. 128 leaked sockets. If left running another week, it would have started refusing new connections, and THAT would have surfaced as “experiments are failing” without a clear cause.

The fix for CLOSE-WAIT is the same as the fix for any long-running HTTP client: use a session, reuse connections, set explicit timeouts, and when the context manager closes, verify the socket actually closed. In the urllib world that means audit every urlopen call and wrap it in with. In the httpx world that means don’t let LiteLLM spawn a new client per request.

What I changed

One, the local-model function now actually calls the local model. A direct local-model call with an explicit timeout replaced the router chain. No fallback chain to paid providers. If the local route is down, the function raises and the loop handles it.

Two, the loop has an immune system:

The replacement loop had an error counter, bounded backoff, a longer idle pause, and a clean reset after a successful cycle. The public point is the pattern, not the raw implementation.

Exponential backoff capped at fifteen minutes. Longer sleep when the queue is empty. Error counter resets on success. Keyboard interrupt exits cleanly.

Three, I killed the old process. PID 2185230. All 128 leaked connections cleared the moment the process died. The kernel had been holding them open for eleven days waiting for me to tell it I was done.

Lessons that go wider than this bug

One, long-running loops are infrastructure. They need the same hygiene as any production service. Error handling, backoff, observability, health checks. “It is just a script” is how you get an eleven-day zombie.

Two, policy without enforcement is theater. If the law says “no external API calls” and the code does not check, the law will be violated eventually. The only real policy is the one compiled into the call graph.

Three, name things honestly. call_ollama that routes through thirteen cloud providers is a failure of the first contract between a function and its caller. The reader trusts the name. Betray the name and the reader stops reading.

Four, CLOSE-WAIT is invisible until it kills you. If you run long-lived HTTP clients, put socket counts into your observability surface. A small socket-count check on a schedule, alerting above a threshold. The visible symptoms arrive after the system is already in trouble.

Closing

The ghost came from a policy without enforcement, a loop without enough checks, and an environment where the wrong signals were not being watched.

The bug is fixed. The connections are closed. The function now does what its name says. Revenue First Law 1 has a code-level guardrail now, not a paragraph. The next useful control is a scheduled socket-count check for every long-running agentnt and squawks if one goes over fifty.

Eleven days of silent resource consumption. Worth writing down. Worth not repeating.