Write Your Intent Before You Execute: A Pattern for Reliable AI Systems
The narrator-tts pipeline processes books. A typical novel is 60 to 100 chapters, split into thousands of text segments, each rendered to an audio file. A full book run takes hours. The GPU runs hot. The disk fills up with WAV files. And eventually — not every time, but often enough — something crashes.
A CUDA out-of-memory error on segment 847. A power flicker at segment 1,200. A model that produces empty audio on a specific passage and the pipeline hangs waiting for output that never arrives.
When the process dies, the question is: what was supposed to happen?
If you cannot answer that question, you cannot recover. You do not know which segments completed, which ones were in progress when the crash hit, and which ones were never started. You can scan the output directory for files, but that tells you what exists — not what should exist. You are reconstructing intent from artifacts, which is forensic work, not recovery.
There is a better way. Write the intent down before you start.
The Pre-Flight Catalog
The pattern is simple. Before the pipeline begins any work, it writes a catalog file to disk. The catalog is a complete list of every operation that will be performed — every segment that will be rendered, with its source text, its expected output path, and its status.
{
"book_id": "the-incubator",
"total_segments": 3,847,
"segments": [
{"id": "0001", "text": "Chapter One...", "output": "audio/0001.wav", "status": "pending"},
{"id": "0002", "text": "The laboratory was...", "output": "audio/0002.wav", "status": "pending"},
...
]
}
Every segment starts as pending. When the pipeline picks up a segment to process it, the status changes to in_progress and the file is flushed to disk. When the segment finishes and the audio is verified, the status changes to completed.
The key insight is not the status tracking. The key insight is that the full catalog exists before any work begins. The pipeline writes down everything it intends to do, in order, before it does any of it.
Why This Matters More Than You Think
Most pipelines discover their work as they go. They read the source text, split it into segments, and process each one. The list of segments lives in memory. If the process crashes, the list is gone. The output directory has some WAV files. The source text is still there. But the mapping between them — which segment produced which file, which segments were skipped, where the pipeline was when it died — that information existed only in RAM.
A recovery system cannot reconstruct a plan that was never written down. It can only guess from artifacts.
With a pre-flight catalog, recovery is deterministic. The pipeline restarts, reads the catalog, and finds every segment where status is not completed. Those are the segments that need reprocessing. No guessing. No scanning directories. No comparing file sizes or modification times. The catalog is the source of truth.
The status field only needs two values for recovery purposes: completed and everything else. I learned this the hard way — the intermediate states (in_progress, failed, skipped) are useful for monitoring and debugging, but for recovery, the question is binary: did this segment finish or not?
The Pattern in Practice
The catalog is written as a JSON file on local disk. JSON because it is human-readable and every language can parse it. Local disk because the catalog must be writable by the pipeline and readable by the recovery system, and those are usually the same process on the same machine.
The write order matters. The catalog is written before the pipeline starts processing. Not after the first segment. Not after the split-and-plan phase. Before any GPU work happens, the catalog is on disk.
If the catalog write fails, the pipeline does not start. This is non-negotiable. A pipeline that cannot record its intent has no business executing.
The flush matters too. After every status update, the catalog file is flushed to disk — not buffered, not deferred, written synchronously. If the process crashes between a status update and a flush, the catalog on disk does not match reality. The recovery system will reprocess a segment that was actually completed, which is wasted work but not corrupted work. That is an acceptable tradeoff. Wasted work is recoverable. Silent data loss is not.
What Happens Without It
I ran the pipeline without a catalog for the first version. It worked fine for short tests — a chapter here, a few paragraphs there. The first full book run crashed at segment 2,100 of 3,847.
The recovery process was: scan the output directory, sort the files by name, find the gap. Segment 2,100 was partially written — the file existed but was truncated. I had to figure out that the truncation meant "in progress when crashed" rather than "completed but corrupted." Then I had to write a script to resume from segment 2,100, which meant hard-coding the start point. Then I had to verify that segments 1 through 2,099 were actually complete and not silently broken.
It took two hours to recover from a crash that should have been a thirty-second restart.
After adding the catalog, the same crash takes the pipeline thirty seconds to recover from on its own. It reads the catalog, finds that segments 1 through 2,099 are completed, finds that 2,100 is in_progress (which means: reprocess it), and resumes. No human intervention.
The General Principle
The pre-flight catalog is not specific to TTS pipelines. It applies to any system that does long-running, multi-step work where a crash mid-execution would leave the state ambiguous.
Batch processing jobs. Before processing a batch, write the full list of items to disk. Each item gets a status. Recovery reads the list and picks up where it left off.
File conversion pipelines. Before converting 500 files, write the input paths and expected output paths. A crash halfway through means you re-run and the catalog tells you exactly what is left.
Multi-agent task dispatchers. Before dispatching tasks to workers, write the task list with assignments. If the dispatcher crashes, the recovery system knows what was dispatched and what was not.
The pattern has three rules:
- Write the complete plan before executing any of it. Not incrementally. Not as you discover work. All at once, before the first operation.
- Update status after each step, with synchronous writes. The catalog on disk must reflect reality as closely as possible at every moment.
- Treat the catalog as the source of truth on recovery. Do not trust the output directory. Do not trust your memory. Trust the catalog.
The catalog is a contract the pipeline writes with its future self. "Here is what I intended to do. If I do not finish, pick up where I left off."
The Cost
The cost is one file write per segment, plus the initial catalog write. For 3,847 segments, that is 3,848 disk writes over a multi-hour run. The performance impact is negligible — the GPU work for each segment takes seconds, and a synchronous file write takes milliseconds.
The real cost is the discipline. You have to write the catalog code before you need it. You have to resist the temptation to discover work incrementally because incremental discovery is simpler to code. You have to design the status schema upfront and commit to maintaining it.
That discipline pays off the first time the pipeline crashes at 2 AM and you wake up to a completed book instead of a forensic puzzle. Write your intent before you execute. It is the cheapest reliability upgrade you will ever build.