Blog

Why LLMs Are Great at Greenfield and Terrible at Your Codebase

LLMs don't forget your codebase. They never knew it. Here's why that distinction matters — and why the fix isn't better prompts.

APRIL 2026 · 5 MIN READ

You've felt this. You know you have.

You open a new project, point Claude or Copilot or Cursor at it, and the thing just works. Clean architecture, good separation, correct idioms. You're building at three or four times your normal speed and the code is honestly pretty good.

Then the project gets real. You've got forty files. There's a handler that talks to a service that depends on a config that was written three sessions ago. You ask the AI to add a feature and it rewrites something it built last Tuesday — breaks a code path it doesn't know exists — and you don't catch it because the codebase grew faster than your understanding of it.

Sound familiar?

The stranger with perfect syntax

Here's what's actually happening. Every time you start a new AI coding session, you're handing your codebase to a stranger. A very talented stranger — one who writes clean code, knows every framework, and works fast. But a stranger.

They didn't build what's already there. They don't know why the config parser handles three different date formats. They don't know that the SMS handler validates sender phone numbers because of an incident in February where an unauthorized number triggered a cascade of outbound messages. They don't know that the if/else block on line 247 exists because the LinkedIn DOM renders differently for Sales Navigator accounts.

They don't remember. They never knew.

This is different from forgetting. A developer who forgets can be reminded — "oh right, that's why we did it that way." An LLM looking at the code can see what is there but not why it's shaped that way. The why lived in the conversation that produced the code, and that conversation is gone.

The industry's answer doesn't work

The current industry answer to this is some combination of:

Write better prompts (include more context)
Use RAG to index your codebase
Add more comments to your code
Use CLAUDE.md / cursor rules / repo-level instructions

These are all variations of "remember harder." And they help, to a point. But they don't solve the actual problem.

RAG on your codebase gives the model access to your code — which it already has. The understanding of why code is shaped a certain way isn't in the code. It's in the conversation that produced it. You could theoretically RAG-link code edits to their preceding conversational exchanges, but that's exceedingly fragile and nobody's doing it well.

Comments help until they don't. Comments describe intent at the line level. They don't describe system-level invariants — "this tool must never write to the CRM directly because the CRM sync runs on a separate cadence and direct writes cause conflicts." That's architectural knowledge. It lives in someone's head or nowhere.

Better prompts are the most common advice and the least scalable. You're asking the human — whose understanding of the codebase is already lagging behind the AI's output — to provide the context the AI needs to not break things. You're making the bottleneck do more work.

A story about what happens

I'll tell you what happened to me. I was building a LinkedIn connection automator for a client outreach system. The first version was clean — one path, one handler, worked great.

Then I needed to handle a DOM variant. LinkedIn renders the connection button differently depending on whether you have Sales Navigator, whether the person is a 1st/2nd/3rd connection, whether you're on a search results page or a profile page.

Each Claude Code session added another code path. Session one built the base case. Session two added Sales Navigator handling. Session three added the search results variant. Each session saw the code and thought "I'll just add a branch here."

By the fifth session, the tool had nested conditionals four levels deep. No single session could see the full branching logic. A fix to one path would silently break another. I'd approve changes that looked right because they matched my intent — but I wasn't holding the full implementation anymore either.

The tool became unmanageable. I abandoned it and rebuilt from scratch.

The root cause

The root cause wasn't the AI being bad at code. The code was fine — each individual session wrote clean, correct code for the path it could see.

The root cause was that no orchestration layer existed. Each session reimplemented routing logic inside the executor. Instead of a dispatcher that said "here are the five LinkedIn DOM variants, here's how to detect which one you're looking at, route to the correct handler" — the detection logic and handling logic were tangled together in one growing file.

An experienced developer might have caught this earlier. Might have refactored to a strategy pattern before it got out of hand. But here's the thing — the pace of AI-assisted development means you hit this wall faster than your instincts fire. By the time the code smells bad, you've already got three sessions of work invested in the wrong structure.

And this is just one tool. Scale this pattern across a real system — dozens of tools, multiple integration boundaries, shared state — and you've got a codebase that nobody understands. Not the AI, not you.

The actual answer

The answer isn't "remember harder." It's design your systems so that forgetting doesn't matter.

Build architecture that makes the wrong thing hard to do — not through documentation or review or memory, but through structure. Make it so that when the next AI session opens your code, the structure itself guides it toward correct changes and away from dangerous ones.

One file that does one thing can be modified without understanding the rest of the system. A config file that declares routing rules can have a new route added without touching the router. A contracts registry that says "this tool owns this data, talk to it through this interface" prevents the next session from reaching into internals it doesn't understand.

This isn't theoretical. I've been operating a production AI system with 75+ tools for over a year. The tools that follow these patterns survive AI modification. The ones that don't become the Coldstream automator — impressive for three days, then unmaintainable.

The next four posts in this series lay out the specific patterns, interchange formats, and meta-layer tooling that make this work. Not as theory — as stuff running in production right now.

But the principle fits in one line: stop trying to remember. Build systems that don't need you to.