The Session Problem: Living Documentation and Architectural Health
Even good architecture rots when AI sessions keep touching it. End-of-session hooks, nightly reviews, and architectural health monitors keep the system honest.
Parts 1, 2, and 3 gave you the structural patterns: single-responsibility files, config-driven behavior, dispatchers, contracts, event logs, manifest-driven dispatch.
Good. Build all of that. It will help immediately.
Then watch it rot.
The decay problem
Architecture doesn't degrade because someone makes a bad decision. It degrades because thirty small, locally-correct decisions accumulate into something nobody designed.
With AI-assisted development, this happens faster. A human developer making three changes a day has time to notice drift — "wait, this module is getting too big." An AI session making thirty changes in an afternoon doesn't notice. It optimizes for the task at hand. The task at hand is never "evaluate whether this codebase is still architecturally sound."
You can have perfect patterns on day one and a mess by day thirty. Not because the patterns were wrong, but because nothing enforced them over time.
This part is about the meta layer — the tooling that watches the architecture while the AI modifies the code.
Session hooks: capture what the AI learned
Every AI coding session learns things about your codebase that aren't written down anywhere. "The config parser accepts three date formats because the CRM exports changed in March." "The SMS handler checks sender phone because of the February incident." "The task tracker's complete command also updates the parent initiative's progress counter."
That knowledge exists in the session. When the session ends, it's gone. The next session will re-derive it, or miss it, or infer something wrong.
The fix is a session hook — a step that fires at the end of every coding session that says: document what you learned about everything you touched.
This doesn't need to be fancy. It's a prompt that asks the AI to write down, for each file it modified, what non-obvious things it discovered:
- Why is the code shaped this way?
- What invariants does it maintain?
- What would break if you changed this without knowing the context?
- What does this tool depend on that isn't obvious from the imports?
These notes accumulate in per-tool knowledge files. The next session reads the knowledge file before modifying the tool. It doesn't start from zero — it starts from every previous session's understanding.
The trust problem
There's a catch. The AI might write down something wrong. It might misunderstand why code is shaped a certain way and document its misunderstanding as fact. The next session reads the wrong note and makes changes based on a false premise.
This is real, and it's why session hooks aren't sufficient on their own. You need a verification layer.
Nightly reviews: cross-check notes against code
A nightly review is a separate AI session — running on a schedule, not interactively — that does one thing: reads every knowledge file and cross-checks it against the actual code.
"The knowledge file for crm_sync.sh says it uses atomic writes via temp file and rename. Does the code actually do this?"
"The knowledge file for task_ctl.sh says the complete command updates parent initiative progress. Does the code actually do this? Does a parent initiative concept exist anywhere in the codebase?"
The review produces a patch file: corrections, additions, things that look wrong. A human reviews the patches in the morning — a five-minute scan, not a full code review. Approve, reject, or flag for deeper investigation.
This creates a living documentation system. Not documentation that was accurate when it was written and has been drifting ever since — documentation that's verified against the code nightly and corrected when it drifts.
What this actually catches
In practice, nightly reviews catch three categories of problems:
Stale documentation. A session modified a tool's behavior but didn't update the knowledge file. The review notices the discrepancy.
Wrong documentation. A session wrote down something plausible but incorrect. The review checks it against the code and flags the mismatch.
Undocumented dependencies. A session added a dependency between two tools — tool A now reads a file that tool B produces — but didn't document it in the contracts registry. The review notices the new file access pattern and flags it for contract registration.
None of these are catastrophic individually. All of them compound into the kind of invisible decay that makes codebases unmaintainable.
Architectural health monitors
Session hooks capture knowledge. Nightly reviews verify it. But neither of them answers the bigger question: is the architecture still clean?
This is where it gets interesting, because this is tooling that barely exists anywhere. Code quality tools check style, complexity, test coverage. Nobody builds automated reviewers for code responsibility boundaries.
An architectural health monitor periodically reviews tools against their declared responsibilities and flags violations:
Complexity creep. "This tool has four code paths that share nothing. Should it be split into a dispatcher and four handlers?"
Responsibility bleed. "This tool is registered in the contracts as owning CRM sync, but it's also writing to the task log. That's task_ctl's domain."
Contract violations. "This tool writes directly to a file owned by another tool, bypassing the declared interface."
Missing dispatchers. "This tool handles three variants inline. Per the architecture standards, it should have been split at variant two."
The monitor doesn't fix these things. It surfaces them. A weekly scan that produces a one-page report: "here's what's drifting, here's what's growing, here's what's bleeding across boundaries." The human — or a dedicated refactoring session — decides what to address.
The resorting manager
I think of this as a resorting manager. Not a code reviewer that checks whether individual functions are well-written, but an architectural reviewer that checks whether things are still where they belong.
A good resorting manager asks: "Is this tool still doing one thing? Is it still talking to other tools through declared interfaces? Has it silently absorbed responsibilities that belong elsewhere?"
In a human-only development environment, a senior architect fills this role — the person who periodically reviews the system and says "this is getting too tangled, let's refactor." In an AI-assisted environment, the pace of change outstrips any human's ability to monitor. You need automated assistance.
This is a novel concept. I haven't seen anyone building it. But it's the logical endpoint of designing for forgotten context — if you accept that neither the human nor the AI reliably tracks architectural health over time, you build tooling that does it for them.
The three layers together
These three mechanisms — session hooks, nightly reviews, and architectural health monitors — form a feedback loop:
- Session hooks capture what each session learned (real-time)
- Nightly reviews verify that captured knowledge matches reality (daily)
- Health monitors check that the architecture is still sound (weekly)
The structural patterns from Parts 2 and 3 give you a clean starting point. The meta layer keeps it clean as the system evolves.
Without the meta layer, your architecture has a half-life. Good patterns degrade under constant modification, and the pace of AI-assisted development means the half-life is shorter than you think.
With the meta layer, the architecture is self-monitoring. Not self-healing — that's a bridge too far right now — but self-aware. It knows when it's drifting and tells you before the drift becomes debt.
In Part 5, I'll step back from the technical details and talk about what all of this means at the team and organization level. Because the gap between "we use AI coding tools" and "we design for AI collaboration" is about to become the most important competitive differentiator in software.