I was watching my OpenClaw agent promise me something the other day. "I'll always update memory in place from now on. Never append." It sounded great. Confident. Reliable. Like talking to someone who genuinely understood the problem and was going to fix it.

Two sessions later it was appending again. Not because it lied. Not because it's broken. Because it literally cannot remember making that promise.

This is the thing nobody tells you when you start building with AI agents. They will tell you they'll do something from now on. They will sound completely sincere. And they will forget the moment the session ends or the context window compacts. Every single time.

How your agent forgets

There are two ways it happens and both are silent.

The first is obvious: session boundaries. OpenClaw agents are stateless between sessions. When a session ends and a new one starts, it's day one. The agent loads its bootstrap files (MEMORY.md, SOUL.md, USER.md, AGENTS.md, TOOLS.md), reads the system prompt, and starts fresh. Those bootstrap files are all it has. If your instruction wasn't written down in one of them, it's gone.

The second is less obvious and honestly worse: context compaction. OpenClaw's context window is finite. When the conversation gets long enough to approach the limit, compaction kicks in. Older messages get summarized into a compressed entry while recent messages stay intact. The summary gets saved to the session's JSONL history file so it persists across requests.

Before compaction runs, OpenClaw does something smart: it fires a silent agentic turn that reminds the model to flush important facts to disk. This is the "good version." But it depends on the agent actually writing useful stuff down in that moment. If the context fills up too fast, or the flush doesn't trigger in time, facts get compressed into a summary that loses the detail you needed. The compacted summary says "discussed vendor configuration" when what you actually need is the specific port number you agreed on.

The memory file trap

OpenClaw ships with a built-in memory system. MEMORY.md is a bootstrap file that gets injected into the system prompt at the start of every private session. It's meant to hold curated, durable facts. The agent also loads daily note files from a memory/ directory: today's and yesterday's logs. Anything older gets pulled in on demand through search tools.

The foundation is solid. The problem is what happens over time.

Say you write "the dog's name is John" in your memory file. A week later you rename the dog to Tom. The agent adds a new line: "dog renamed to Tom." Now your memory file says both things. John and Tom. The agent reads top to bottom, hits "the dog's name is John" and uses that. It might never even get to the update below.

This is what happens when your memory file becomes a changelog instead of a source of truth. Every update is an append. Facts pile up. Old facts shadow new ones. The agent reads the first answer that looks right and moves on. Your dog is John again.

And it gets worse. OpenClaw enforces a per-file limit of 20,000 characters (configurable via bootstrapMaxChars) and an aggregate limit of 150,000 characters across all bootstrap files. Note: those are character counts, not token counts. So as your changelog grows, the file hits the cap and OpenClaw silently truncates it during load. The newest facts at the bottom are the first to get cut. The exact opposite of what you want.

This isn't a bug

I want to be clear about this. It's not broken. It's architecture. Context windows are finite. Compaction is necessary. Stateless sessions are how these systems work. You can't fix it by asking the agent to try harder or remember better. That's like asking a goldfish to take notes.

The mistake is treating AI agents like people who learn on the job. They don't. Every session is a job interview. The agent walks in knowing only what's on its resume (the bootstrap files) and what you tell it during the conversation. Nothing else survives.

What OpenClaw gives you and where it falls short

OpenClaw provides real tools for this. MEMORY.md for long-term state. Daily notes for short-term context. A built-in memory search using a local embedding model (embeddinggemma, hybrid search with 70% vector weight and 30% text weight). Pre-compaction memory flush to save important facts before context gets compressed. Session history stored as JSONL on disk. These are solid foundations.

The gap is in maintenance. OpenClaw doesn't enforce how the agent updates MEMORY.md. It doesn't prevent appending. It doesn't clean up stale facts. It doesn't stop the file from growing into a contradictory mess. The tools are there. The discipline isn't.

Here's what I added on top to close that gap.

The in-place update rule

First and most important: I enforced a strict rule in the agent's system prompt. Update in place, never append. If a fact changes, overwrite the old value with the new one. One fact, one location, always current. This single rule changed everything about how the memory file behaves over time. The file stays small, stays accurate, and most importantly stays cache-friendly (more on that below).

The MEMORY.md cleanup agent

Even with the "update in place" rule, drift happens. The agent is working fast, handling multiple tasks, and sometimes it appends instead of updating. Or a fact goes stale and nobody catches it. Over weeks of active use, the memory file accumulates contradictions and dead entries.

So I run a separate agent dedicated to memory hygiene. It reads MEMORY.md, identifies duplicates, stale facts, changelog-style entries, and contradictions, then rewrites the file to reflect only the current state.

The rules are strict: it can only compact what's already written. It cannot invent new facts. It cannot remove security rules or hard constraints. When it's unsure whether something is stale, it leaves it in. Conservative by design. After each run it commits the changes locally and reports what it removed and what it kept so you can audit it.

One thing I learned the hard way: test it on a copy first. Don't point it at your real memory file until you've seen what it does to a duplicate. Run it, diff the result, check for anything it shouldn't have touched. Then go live.

The LCM compaction cron

Separate from the MEMORY.md cleanup, LCM itself needs a scheduled compaction pass. Without one, it only compacts reactively when the context gets full. I run a nightly cron at 04:00 UTC using Sonnet as the summarization model. A smarter model means higher quality summaries, which means better recall when the agent searches compressed history later. Every morning the context starts clean.

LCM for conversation history

Memory files handle what the agent knows. But what about what happened? Conversations contain important context that doesn't always make it into MEMORY.md. A decision you made three weeks ago. A bug you discussed and resolved. A preference you stated once.

LCM (Lossless Claw) is a third-party plugin that replaces OpenClaw's built-in sliding-window compaction with a DAG-based summarization system. Instead of just compressing old messages into a flat summary, it builds a tree of summaries at different levels of detail. Every message is permanently stored in a SQLite database. Nothing gets deleted.

When the agent needs to recall something from weeks ago, it uses lcm_grep to search the compressed history and lcm_expand to drill into specific summaries and recover the original details. Two retrieval tools instead of one flat summary.

The summarization runs on a small local model through Ollama so it costs effectively nothing. No API calls, no cloud, just CPU cycles. I've tested a few options here: Nemotron Cascade 2 is my top pick since it's a mixture-of-experts model with only 3B parameters active at inference, so it actually runs faster than the smaller dense models while being more capable. It competes with Claude Haiku 4.5 on summarization quality despite being a fraction of the size, which is remarkable for a model you run locally for free. Qwen 3.5 9B is a solid alternative, or Qwen 3.5 4B if you're tight on resources. All three handled LCM summarization perfectly with zero degradation or mistakes in the compressed output. The summaries stayed accurate and the agent could recall details from weeks-old conversations without issues regardless of which model ran the compression.

Think of it this way: MEMORY.md is what the agent knows (distilled facts). LCM is what happened (compressed but recoverable history). Two different retrieval paths for two different kinds of recall.

QMD for smart retrieval

Clean memory is half the problem. The other half is retrieval. OpenClaw has a built-in memory search, but QMD takes it further. It's an optional local search engine created by Tobi Lutke that runs as a sidecar alongside OpenClaw.

QMD combines three search methods: BM25 keyword matching, vector semantic search, and LLM re-ranking. Instead of dumping your entire memory file into context, it pulls only the relevant snippets for the current conversation. The agent calls memory_search to find what it needs and memory_get to pull just those lines.

The token savings are significant. We're talking 60 to 97 percent reduction depending on your memory size. That means fewer compaction events, which means less silent destruction of facts you need.

QMD re-indexes your files every five minutes with a fifteen second debounce. No manual reindexing. It runs entirely locally using three small GGUF models totaling about 2GB. No API keys, no cloud, no data leaving your machine.

Why this keeps costs near zero

Here's where it gets interesting. This isn't just about recall quality. It's the reason the whole setup is economically viable to run every day.

Anthropic charges for every token sent to the model. A typical session loads about 54,000 tokens of context on startup: bootstrap files, system instructions, conversation history. Without caching, you'd pay for all 54k tokens on every single message.

Prompt caching is an Anthropic API feature that OpenClaw uses transparently. There's nothing to configure. It just works. But here's the thing: the cache is keyed on exact content. If your MEMORY.md changes between messages (because you appended a new fact), the token fingerprint changes and the cache busts. You pay full price on everything again.

Because I enforce in-place updates, the file stays stable between messages. Same content, same position, same fingerprint. The cache stays hot. Reads cost roughly 10x less than fresh input tokens.

In practice, a typical exchange looks like this: 54,000 tokens of context, but only 200 or so are actually new (your message plus metadata). The rest is a cache hit. Instead of paying around $0.08 per message, you're paying about $0.001. That's an 80x cost reduction.

On a busy day with 50 exchanges, the difference is roughly $4.00 without caching versus $0.05 with it. And it only works because the memory files are stable, structured, and don't bloat. If you append instead of updating in place, every new fact changes the file, busts the cache, and forces a full re-read at full price.

The context stays small too. At 54k out of a 1 million token window, you're using about 5%. Small context means efficient cache reads and plenty of headroom before compaction triggers.

This matters even if you're not on API pricing. On a Claude Max plan at $100 per month, you hit daily and weekly rate limits fast. Without these techniques, a heavy workday blows through your limit by noon and you're blocked until it resets. With stable memory, smart retrieval, and cache-friendly architecture, the same plan handles a volume of work that would otherwise be impossible. The tokens you save on context overhead are tokens you spend on actual useful output instead.

The full picture

Four things layered on top of what OpenClaw ships with: the LCM extension for lossless conversation history, the nightly compaction cron for clean context every morning, QMD for precise memory retrieval, and the in-place update discipline that keeps the prompt cache hot.

Each one does one thing. Together they take OpenClaw from a useful AI assistant to a full-context, near-zero cost, long-running system that doesn't forget and doesn't break the bank.

The deeper point

When an AI agent tells you "I'll do this from now on," it's not lying. In that moment, within that context window, it fully intends to follow through. But intention without persistence is nothing. The promise dies when the session ends or when the context compacts.

The fix is never to trust the promise. Build systems that don't require the agent to remember. Make the memory file the source of truth, keep it clean with automated compaction, use smart retrieval so the agent reads the right fact instead of the first fact, and store conversation history in a database so nothing gets lost when the context window resets.

Stop designing for an agent that learns. Design for one that forgets. Everything gets simpler after that.