The Pure Kernel Pattern: Keeping Domain Logic Free of Infrastructure

Every software architecture article tells you to separate concerns. Separate your business logic from your data access. Separate your HTTP handlers from your domain models. Use dependency injection. Everyone nods. Then you look at the codebase and the domain logic imports the database client, the HTTP framework, the LLM SDK, and three cloud SDKs.

The separation exists in theory. In practice, the domain logic and the infrastructure are woven together. You cannot pull one out without pulling the other.

When I started building Planclave — the planning system that runs multi-agent review cycles — I wrote it the normal way. The planning logic, the LLM calls, the database queries, the API handlers, all in one package. It worked. Then I tried to write unit tests.

The Problem

The first sign of trouble was the test imports. To test whether the plan synthesis logic correctly merged findings from three reviewers, I needed to import the synthesis module. The synthesis module imported the LLM client. The LLM client imported API keys from environment variables. The environment needed a database connection for the prompt templates.

To test a pure logic function — "given these three sets of findings, produce a merged set" — I needed a running database, valid API keys, and network access to an LLM provider.

That is not a unit test. That is an integration test wearing a unit test's clothes.

It gets worse. When the LLM provider changed their response format, the parsing logic broke. Not because the parsing logic was wrong, but because the LLM client dependency triggered a network call during what should have been a logic-only test. Tests that should run in milliseconds took seconds. Tests that should run offline required connectivity. Tests that should be deterministic became flaky because the LLM response in the setup phase varied between runs.

If your unit tests need network access, they are not unit tests. If your domain logic imports an HTTP client, your domain logic is not pure.

The Extraction

The fix was radical extraction. I pulled every piece of logic that made a decision about plans, findings, readiness, and reconciliation into a single package: the kernel. The kernel has zero dependencies. No HTTP framework. No database driver. No LLM SDK. No cloud client. Nothing that touches the outside world.

The kernel contains:

  • Plan data models — plain dataclasses, no ORM decorators
  • Readiness assessment logic — pure functions over plan state
  • Finding synthesis and deduplication — pure functions over review findings
  • Reconciliation logic — matching findings to plan steps
  • Validation rules — structural checks on plan shape

What does the kernel not contain? Anything that requires I/O. The kernel does not call an LLM. It does not query a database. It does not make HTTP requests. It does not read files. It receives data, makes decisions, returns data.

The key constraint: the kernel's dependency list is empty. Not small. Empty. If you cannot install the kernel package and run its tests in an environment with no network, no database, and no API keys, the kernel is not pure.

Callback Injection for AI Calls

The hardest part was the planner itself. The planner needs to call an LLM to generate plan revisions. That is inherently an I/O operation. How do you keep the planning logic pure when planning requires AI?

The answer is callback injection. The kernel defines the planning workflow as a series of steps. Each step that requires external input declares what it needs as a callable — a function signature. The kernel calls the function. It does not know what the function does behind the scenes.

# In the kernel (pure, no dependencies)
class Planner:
    def __init__(self, generate_revision: Callable[[str], str]):
        self._generate = generate_revision

    def revise(self, plan: Plan, findings: list[Finding]) -> Plan:
        prompt = self._build_revision_prompt(plan, findings)
        response = self._generate(prompt)
        return self._parse_revision(response, plan)
# In the infrastructure layer (has dependencies)
from openai import OpenAI
from kernel import Planner

client = OpenAI()

def generate_revision(prompt: str) -> str:
    return client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    ).choices[0].message.content

planner = Planner(generate_revision=generate_revision)

The kernel builds the prompt. The kernel parses the response. The kernel decides what to do with the result. The actual LLM call lives outside the kernel, injected at runtime.

In tests, you inject a stub:

def fake_generate(prompt: str) -> str:
    return '{"steps": [...], "assumptions": [...]}'

planner = Planner(generate_revision=fake_generate)
result = planner.revise(test_plan, test_findings)
assert result.steps[0].title == "Updated step"

No network. No API keys. No flakiness. The test runs in under a millisecond.

What This Buys You

Tests that are actually tests. The kernel has comprehensive unit tests covering every decision path — readiness transitions, synthesis merge rules, validation failures, reconciliation matching. They all run without infrastructure. The full suite finishes in under two seconds. Same input, same output, every time.

Refactoring without fear. When I rewrote the LLM prompt construction to produce better plan revisions, I changed the prompt builder (kernel code) and ran the tests. If the tests passed, the logic was intact. I did not need to run a full Planclave cycle against a live LLM to verify I had not broken the synthesis logic. The tests proved it directly.

Infrastructure swappability. Planclave can run with OpenAI, Anthropic, or a local model. The kernel does not care. The kernel calls whatever function was injected. Swapping providers is a one-line change in the infrastructure layer. The planning logic does not change.

Clear code boundaries. When a bug report comes in — "the planner did not address a high-severity finding" — I know where to look. The decision about which findings to include in the revision is kernel logic. The LLM's actual response is infrastructure. If the decision was wrong, the bug is in the kernel and I have a test for it. If the response was malformed, the bug is in the parsing and I have a test for that too.

The Cost

This is not free. The kernel has more files than a monolithic design would have. The separation between kernel and infrastructure means more interfaces, more type definitions, more wiring code in the application layer. There is a cognitive overhead to understanding where a piece of logic lives.

And the discipline is ongoing. Every time a feature gets added — a new readiness rule, a new validation check, a new synthesis behavior — the temptation is to add it where it is convenient, which often means importing an infrastructure dependency. Keeping the kernel pure requires saying no to those shortcuts. Code review helps. A CI check that verifies the kernel package has no infrastructure dependencies helps more.

The kernel stays pure because the build system enforces it. Good intentions do not survive contact with a deadline. Automated checks do.

The Principle

The general principle is one of the oldest in software: separate what you decide from what you do. Decisions belong in pure logic. Actions belong in infrastructure. When you mix them, you get code that is hard to test, hard to refactor, and hard to trust.

The pure kernel pattern is not new. It is hexagonal architecture, ports and adapters, clean architecture — the same idea people have been writing about for decades. What makes it worth writing about here is that I kept ignoring it until the lack of it made the codebase unmaintainable. The extraction was painful. The tests that came after made it worth it.

If your unit tests need a database, extract the logic. If your domain models import your ORM, extract the logic. If your planning code imports your LLM client, extract the logic. Put it in a package with no dependencies. Inject the I/O. Run the tests in a vacuum.

That is the entire pattern.