The Ghost That Lived for Eleven Days

Here is the kind of bug that shipped quietly and fed off my infrastructure for eleven days before I noticed.

I had an experiment runner on a local server I run at home. File header said “Runs queued experiments on local model (gemma2:9b).” Function name was call_ollama(). The whole point was: free, offline, no paid API calls, no net dependency. That was the idea.

The idea was a lie.

What the function actually did was try a LiteLLM router first. LiteLLM’s auto-tier routed out to Groq, OpenRouter, Google Gemini, IBM Granite, Qwen, a bunch of free providers. Only if the whole external chain collapsed did it fall back to the local Ollama. Ollama was not the primary. Ollama was the option of last resort.

Across eleven days, the runner made 9,996 model calls. Seventy-nine of those hit the local Ollama. Seventy nine out of ten thousand. Zero point eight percent.

Ninety nine point two percent of calls went to the cloud. For a function called call_ollama.

How it stayed alive so long

The thing the server’s own diagnostic report called a “functional zombie” is exactly what I built. The process never crashed. It completed experiments. It wrote rows to the database. The P and L looked reasonable. Uptime said thirteen days, twelve hours.

Underneath that: 128 TCP connections to api.groq.com, openrouter.ai, generativelanguage.googleapis.com, all in CLOSE-WAIT. CLOSE-WAIT means the remote server hung up and my process never acknowledged it. The kernel just kept the socket pinned waiting for me to call close(). I never did. Connection table grew every cycle.

139 file descriptors open. Default Linux ulimit is 1024. One process had eaten 14 percent of the budget for everything running on the box.

The loop was:

while True:
    run_next_experiment(db)
    time.sleep(300)

No try-except. No exponential backoff. No circuit breaker. No connection cleanup between cycles. No health check that said “hey, you have 122 leaked sockets, something is off.” The loop just kept going because nothing was watching.

The amplifier

There was also a self-seeding mechanism. Every thirty minutes it would auto-seed ten new experiment rows into the queue. The queue was never empty. The five-minute sleep between runs was the only pause between external API calls. At 288 expected cycles per day, it actually ran 477 cycles per day because the fast cloud responses let it slip another experiment in before the next sleep fired.

Same eight experiment titles, repeated 471 to 943 times each, with the same static inputs. The same question asked nine hundred times. Returning slightly different text each time. None of it was new information after run five. The rest was heat.

What I got from 9,996 calls

Not much. The experiments produced real outputs, and a 5 out of 5 diversity check on the most recent batch passed. But the marginal value curve had flattened around run 10 for each title. Everything after that was compute theater. One local model plus a prompt would have cost me zero and delivered the same insight by Apr 12.

Instead: 4.3 million input tokens routed through Groq, Gemini, Nemotron, Granite, Qwen, GLM. 4.7 million output tokens generated by somebody else’s GPU. All paid for out of free tier quotas that were one rate limit away from cascading into my credit card.

And the reason I set up the local-only policy in the first place was Revenue First Law 1: no paid API calls. That law had no enforcement layer. It was a sentence in a markdown file. The system violated it for eleven days because nothing in the code said “refuse to make this call.”

The naming lie

This is the part that bothers me most. call_ollama() did not call Ollama. The file docstring said local model. The function signature lied.

If the function had been called call_chain() or route_via_litellm_with_ollama_fallback(), I would have read that name at some point in the last eleven days and thought wait, when did this stop being local. The honest name would have triggered the question. The dishonest name was camouflage.

Every function name is a contract. If the name says “I call Ollama” and the body calls thirteen other providers first, the name is lying, and the bug is already shipped before any external API ever fires.

CLOSE WAIT is the worst failure mode

CLOSE-WAIT sockets do not throw errors. They do not trigger alerts. They do not slow down the process measurably until the connection table overflows ulimit. You can watch a perfectly healthy ps aux output while your socket table is bleeding out underneath.

This is the class of bug that accumulates. It is not a bang. It is a leak. And leaks scale with runtime. Eleven days of runtime. 128 leaked sockets. If I had left it running another week, it would have started refusing new connections, and THAT would have surfaced as “experiments are failing” without a clear cause.

The fix for CLOSE-WAIT is the same as the fix for any long-running HTTP client: use a session, reuse connections, set explicit timeouts, and when the context manager closes, verify the socket actually closed. In the urllib world that means audit every urlopen call and wrap it in with. In the httpx world that means don’t let LiteLLM spawn a new client per request.

What I changed

One, call_ollama() now actually calls Ollama. Direct urllib.request to http://localhost:11434/api/generate, model gemma2:9b, 300s timeout. No router. No fallback chain to paid providers. If Ollama is down, the function raises and the loop handles it.

Two, the loop has an immune system:

error_count = 0
while True:
    try:
        result = run_next_experiment(db)
        error_count = 0
    except Exception as e:
        error_count += 1
        backoff = min(60 * (2 ** error_count), 900)
        time.sleep(backoff)
        continue
    time.sleep(300 if result else 600)

Exponential backoff capped at fifteen minutes. Longer sleep when the queue is empty. Error counter resets on success. Keyboard interrupt exits cleanly.

Three, I killed the old process. PID 2185230. All 128 leaked connections cleared the moment the process died. The kernel had been holding them open for eleven days waiting for me to tell it I was done.

Lessons that go wider than this bug

One, long-running while True loops are infrastructure. They need the same hygiene as any production service. Error handling, backoff, observability, health checks. “It is just a script” is how you get an eleven-day zombie.

Two, policy without enforcement is theater. If the law says “no external API calls” and the code does not check, the law will be violated eventually. The only real policy is the one compiled into the call graph.

Three, name things honestly. call_ollama that routes through thirteen cloud providers is a failure of the first contract between a function and its caller. The reader trusts the name. Betray the name and the reader stops reading.

Four, CLOSE-WAIT is invisible until it kills you. If you run long-lived HTTP clients, put socket counts into your observability surface. ss -tnp | wc -l on a cron, alerting above a threshold. The visible symptoms arrive after the system is already in trouble.

Closing

I built the ghost. I built the policy it violated. I built the loop that let it run forever. I also built the environment in which it could hide for eleven days because nothing was watching the things that would have noticed.

The bug is fixed. The connections are closed. The function now does what its name says. Revenue First Law 1 has a code-level guardrail now, not a paragraph. Next thing I am adding is a cron that counts open sockets on every long-running agent and squawks if one goes over fifty.

Eleven days of silent resource consumption. Worth writing down. Worth not repeating.