A Memory System That Survives AI Context Loss

A five-layer AI memory architecture built in markdown files, with an immune system instead of a database, tested across 30 days of multi-agent work. The problem is not retrieval. The problem is continuity across context resets.

Okay so here’s where this started. A useful AI system needs memory across sessions. Without it, every new work session starts cold: talking to someone with amnesia. Same agent, same files, same projects, but zero memory of what we’d done 30 minutes ago.

The context window resets. Everyone knows this. But knowing it and feeling it when your agent asks you what the AML engine version is for the fifth time, those are different things.

The build started from a practical failure, not a theory about AI memory architecture.

The Problem Everyone’s Solving Wrong

If you search “AI agent memory” right now, you’ll find a hundred projects building vector databases. Embeddings. Semantic search. The idea is: store everything the AI has ever seen, retrieve the relevant bits when needed. Mem0 published research showing a 26% accuracy boost with their memory layer. The ICLR 2026 workshop had an entire track on agent memory.

And all of it, every single project, is solving retrieval.

Here’s the thing though. Retrieval is Layer 5 of the problem. It’s the least critical layer. A Claude session that remembers every fact but forgets what it was working on is less useful than one that forgets some facts but knows exactly where it left off.

The actual problem is continuity. Not “what do I know?” but “who am I, what was I doing, and what mistakes have I already made?”

That’s the gap. And it’s not a database problem. It’s an architecture problem.

What already exists

Before the five layers, the prior art matters. Three established approaches solve parts of the problem.

Stormy AI coined “Compound Engineering”, the idea that an agents.md file gets smarter every time the AI makes a mistake. That’s functionally what my immune memory file does. GuardLoop built an open-source tool around the same concept. And the A-MEM paper from February 2025 described an agent that autonomously connects new memories to existing ones, which is what my knowledge base layer does.

So no, I didn’t invent mistake logging. I didn’t invent agent memory files.

The useful difference is the layered architecture. Most systems have one or two of these pieces. A five-layer stack can handle different memory failures, with a priority hierarchy that inverts what you’d expect.

  <!-- Layer Architecture Infographic -->
  
    </iframe>

INTERACTIVE The Five-Layer Memory Architecture: hover over each layer for details

The Five Layers, From Bottom to Top

Layer one is identity memory. This is who the agent is, what it is called, and what the rules are. It loads first, before anything else. If an agent does not know its own name and operating constraints, nothing else it knows is trustworthy.

For one implementation this is a plain text file loaded first. Not a database. It defines principles, interaction rules, and limits before the session starts.

Layer two is mistake memory. This is the immune system. Every time the agent does something wrong, breaks something, or misunderstands something, the mistake gets documented here with a rule attached. Not the story of what happened. The rule. “Never do X because Y happened.”

This is the layer everyone skips. Vector databases make retrieval fast. Nobody builds a systematic record of failures. But the failures are what prevent the agent from making the same mistake in session 47 that it made in session 12.

Layer three is current state memory. This is what was happening at the end of the last session. What task was in progress. What decision was pending. What file was open. Without this layer, every new session starts cold and I spend the first ten minutes re-establishing context that was active thirty minutes ago.

I keep this in a session log file. Short entries. Timestamp and a few sentences about where things were left. The agent reads this on startup and knows where to pick up.

Layer four is working knowledge: architecture decisions, component descriptions, and project context. Not just facts in a database. A living document that gets updated as the system changes.

This is the layer that most resembles a wiki. It goes stale if it is not maintained. I update it when something significant changes, not after every session.

Layer five is retrieval. This is often built first, but it works better after the other layers exist. Semantic search over files and notes. Useful for finding specific information when the other four layers do not have it. Not useful for continuity, identity, or mistake prevention.

The priority order inverts what you would expect. Layer one matters most. Layer five matters least. Almost every AI memory project inverts this, building retrieval first because retrieval is technically interesting and continuity is operationally boring.

Why the Inverted Priority Works

When I started building I thought retrieval was the core problem. My agent could not find things it had seen before. So obviously I needed semantic search.

Semantic search helped the agent find things. It still lacked continuity: which session mattered, what work was active, and what mistakes had already happened. It still forgot what it had been doing. It still made the same mistakes it had made before.

Retrieval did not solve continuity. It solved a different problem.

Identity and state are what make sessions feel connected. Mistake memory is what makes the agent actually improve over time instead of cycling through the same errors. Those three layers are worth more than a hundred percent semantic recall.

The retrieval layer sits on top as an efficiency tool, not a foundation. When the agent knows who it is, what it was doing, and what not to do, the question of whether it can find a specific file becomes much less important.

What Breaks This

The five-layer system works as long as the files are maintained. Files that stop being updated become worse than useless because the agent trusts them and acts on stale information.

The most common failure mode is the working knowledge file going out of date. A component changes. The file still says what it used to say. The agent reads the file and operates on the old model.

My fix for this is a staleness check. Files that have not been updated in more than seven days get flagged. The flag does not stop the agent from reading the file. It tells the agent to treat the content as potentially outdated and to verify before acting on it.

The second failure mode is mistake memory getting too long to read quickly. Once the mistakes file hits a certain size, it stops being useful because the agent cannot process all of it in a single context load. I trim it periodically, keeping the rules and removing the narrative around each rule. Rules compress well. Stories do not.

Thirty Days In

Thirty days of running this system consistently, the sessions feel different from what they were at the start.

The agent knows its constraints without being reminded. It knows where the last session ended. It does not make the mistakes it made in weeks one and two, because those mistakes are documented and loaded as rules at the start of every session.

The retrieval layer gets used maybe ten percent as much as I expected. Most of what I need is in the working knowledge file or the session log. Full semantic search is for edge cases.

The thing that worked better than I expected is the mistake memory. Writing down what went wrong and turning it into a rule feels trivial when you are doing it. Across thirty sessions, it has prevented more time loss than any other piece of the system.

The immune system is the useful part. The database is the interesting part. Build the immune system first.