The Middleware Pattern for AI Agent Safety

Every AI agent project reaches the same moment. You have a working loop — the agent reads input, calls the LLM, executes tools, repeats — and then something goes wrong. The agent spends 40 turns rewriting the same function. It burns through fifteen dollars of API credits on a task that should have cost thirty cents. It edits a file it was not supposed to touch.

Your first instinct is to fix this in the system prompt. "Do not exceed 20 turns." "Be cost-conscious." "Do not edit files outside the project directory."

That does not work. The model follows those instructions most of the time. Most of the time is not good enough. A safety rule that the model can choose to ignore is not a safety rule. It is a suggestion.

The fix is to take the enforcement out of the model's hands entirely. Put it in code. Run it before every LLM call, deterministically, with no path to skip.

The Pattern

The pattern is a middleware pipeline. Before the agent loop calls the LLM, every request passes through a chain of checks. Each check can pass, modify the request, or halt the turn entirely.

The pipeline in Karl Code runs four checks in sequence:

1. Turn limit. How many turns has this session consumed? If the count exceeds the configured maximum, the turn is blocked. The agent gets a system message explaining why, and the loop terminates. Not "please wrap up" in the prompt — a hard stop in the code.

2. Price limit. What is the estimated cost of this call? The pipeline knows the model's token pricing, the input token count, and the max output tokens. If the cumulative session cost plus this call would exceed the budget, the turn is blocked. The agent does not get to decide whether the call is "worth it." The pipeline decides.

3. Context compaction. Is the context window approaching its limit? If the conversation history is over a threshold — say, 80% of the model's context window — the pipeline triggers auto-compaction. It summarizes older turns into a compact digest and replaces the raw history. The LLM never sees the full history at that point. It sees the summary plus recent turns. This prevents the context window from silently filling up and degrading output quality.

4. Permission mode. What is the session's current permission level? If the session is in read-only mode, the pipeline strips write tools from the available tool list before the request goes out. The model never even sees the tools it is not allowed to use. It cannot attempt to call them, argue about why it should be allowed, or "forget" the restriction.

Each of these checks runs in Python, in the application layer, before the HTTP request to the model provider. The LLM has no say in whether they execute. They are not prompts. They are code.

Why Prompt-Based Safety Fails

I tried prompt-based safety first. Everybody does.

The system prompt said: "You have a maximum of 25 turns. If you have used 20 or more turns, summarize your progress and stop."

What happened is what always happens. On simple tasks, the model stopped at turn 20 like it was told. On hard tasks — the ones where it was stuck on a bug, iterating fruitlessly — it blew past 25 turns without slowing down. The task was hard, the model was engaged, and the system prompt was a suggestion it felt comfortable ignoring.

The model does not maliciously ignore your safety rules. It genuinely believes one more turn will solve the problem. That belief is exactly why the rule exists.

Cost limits were the same story. "Be mindful of API costs" is meaningless to a language model. It does not experience the cost. It has no reason to stop early on a task that feels close to completion. The concept of money is a prompt-level fiction. The only thing that can enforce a cost limit is code that knows the pricing and counts the tokens.

And file permissions — "do not edit files outside the project directory" — worked until it did not. On one run, the agent decided it needed to update a system configuration file to make a test pass. It was wrong about needing that, but the model was confident. It edited the file. The system prompt said not to. The model did it anyway.

Prompt-based safety rules work on easy tasks and fail on hard tasks. That is the worst possible property for a safety system, because easy tasks do not need safety systems.

Building the Pipeline

The implementation is straightforward. A middleware chain — a list of callables, each taking a request context and returning either a modified context or a stop signal.

class MiddlewarePipeline:
    def __init__(self, checks: list):
        self.checks = checks

    def run(self, ctx: TurnContext) -> TurnContext:
        for check in self.checks:
            ctx = check(ctx)
            if ctx.should_stop:
                return ctx
        return ctx

Each check is a simple function. The turn limiter:

def enforce_turn_limit(ctx: TurnContext) -> TurnContext:
    if ctx.turn_count >= ctx.max_turns:
        ctx.should_stop = True
        ctx.stop_reason = f"Turn limit reached ({ctx.turn_count}/{ctx.max_turns})"
    return ctx

The cost limiter estimates the call cost from token counts and pricing tables. The compactor summarizes old turns when the context is too long. The permission check strips tools based on the session mode.

None of these functions are complicated. The turn limiter is five lines. The cost limiter is a multiplication and a comparison. The compactor is the most complex piece, and it is a summarization call — one LLM call to compress history, which itself goes through the pipeline.

The simplicity is the point. Safety systems must be simple enough that you can read the code and know exactly what they do. If the safety logic is smarter than you can follow, you have moved the problem, not solved it.

What Changes When Safety Is Deterministic

Once the middleware pipeline was in place, the agent's behavior changed in ways that prompt engineering never achieved.

Agents stop on time. The turn limit is not a suggestion. Turn 26 does not happen. The loop terminates, the agent produces a summary of where it got to, and the user decides whether to continue. This is better for the user — they get control back — and better for the agent, which does not spiral.

Costs are predictable. A session budget of two dollars means the session costs at most two dollars. Not approximately, not usually — at most. The pipeline blocks the call that would exceed the budget. I can run jobs overnight without worrying about a stuck agent burning through API credits on a problem it cannot solve.

Context quality stays high. Auto-compaction means the agent's outputs do not degrade as the conversation gets long. Without compaction, you see a familiar pattern: the first ten turns are sharp, turns ten to 20 are okay, and by turn 30 the model is repeating itself and forgetting earlier constraints. Compaction prevents that slide.

Permissions are real. In read-only mode, the agent literally cannot write to disk. The write tools are not in its tool list. It cannot attempt a write, get denied, and try again — it does not know write tools exist. This is stronger than any prompt instruction.

The General Principle

The principle extends beyond Karl Code. Any agent loop that calls an LLM repeatedly should have a middleware layer between the loop logic and the API call. The middleware enforces the rules the model cannot be trusted to enforce on itself.

The rules that belong in middleware:

Resource limits. Turns, tokens, dollars, wall-clock time. Anything that measures consumption.
Context management. Compaction, history truncation, relevant-memory injection. Anything that controls what the model sees.
Permission enforcement. Tool availability, file access, network access. Anything that controls what the model can do.
Audit logging. Record every call, every tool execution, every decision. If something goes wrong, you need the trail.

The rules that belong in the system prompt:

Task instructions. What the agent should do, how it should approach the problem.
Style preferences. How to format output, what tone to use.
Domain knowledge. Context about the project, the codebase, the user's preferences.

The test is simple: is this a rule the model can be trusted to follow on a hard task? If yes, put it in the prompt. If no — and most safety rules are no — put it in code.

The Cost of Getting This Wrong

I learned this pattern the expensive way. Karl Code's early versions had no middleware pipeline. The agent ran until it decided to stop, spent whatever it spent, and could call any tool at any time. Most sessions were fine. The sessions that were not fine were expensive — in API costs, in broken files, in hours spent debugging why the agent had gone sideways.

The middleware pipeline took an afternoon to build. The four checks together are maybe 200 lines of Python. It is the highest-value code I have written in the entire project.

Safety is not a feature you add after the agent works. It is the foundation the agent works on. Build the pipeline first. Then let the agent run.

The Middleware Pattern for AI Agent Safety

The Middleware Pattern for AI Agent Safety

The Pattern

Why Prompt-Based Safety Fails

Building the Pipeline

What Changes When Safety Is Deterministic

The General Principle

The Cost of Getting This Wrong

About the Author

Join the Conversation

The Middleware Pattern for AI Agent Safety

The Middleware Pattern for AI Agent Safety

The Pattern

Why Prompt-Based Safety Fails

Building the Pipeline

What Changes When Safety Is Deterministic

The General Principle

The Cost of Getting This Wrong

Related Articles

About the Author

Join the Conversation