The Hidden Work That Makes AI Actually Work

AI agents are often criticized for being "out of date," especially when they struggle with modern programming languages or rapidly evolving dependencies. In practice, this is rarely a limitation of the models themselves. Today's AI systems can read current documentation, explore dependency repositories, and reason about breaking changes when properly equipped.

The real issue is behavioral, not cognitive. By default, AI agents optimize for the lowest-friction path to an answer, which almost always means relying on training data instead of invoking tools, reading live documentation, or adapting to workflow-specific constraints. This behavior is reinforced by existing tooling, which makes fallback to familiar patterns easy while treating correctness-driven workflows as optional configuration.

As a result, developers quietly perform significant hidden work-shaping prompts, chaining tools, enforcing documentation reads, and constraining agent behavior-to make AI systems function reliably with modern stacks. Until AI tooling rewards correctness over convenience and treats workflows as first-class inputs, "up-to-date AI" will remain less a model problem and more a design decision.

AI Can Read the Docs - So Why Doesn't It?

There's a persistent belief that AI struggles with modern software stacks because the models are "out of date." While that explanation is convenient, it's mostly wrong.

In practice, AI agents can work effectively with fast-moving ecosystems like Rust or with newer versions of Python dependencies. They can read current documentation. They can dig through cookbooks and examples in dependency repositories. They can reason about breaking changes and updated APIs.

They simply don't do those things by default.

What looks like a capability gap is, more often, a tooling and incentive problem. To put it simply, LLMs are optimized for the lowest-friction path available to them. They will always choose the path of least resistance, which is to use their training data instead of invoking tools or reading live documentation. It's the job of the agent orchestration layer — the framework that wraps the LLM — to force it to do the right thing, at the right time. And so far, those orchestration frameworks are still maturing.

Capability vs. Default Behavior

It's important to separate two ideas that are often conflated:

Capability: what an AI agent can do when properly equipped
Default behavior: what it does when unconstrained

Most critiques of AI stop at capability. In reality, the more important question is behavioral:
What path does the agent take when multiple paths are available?

Given a choice between:

using internal training data, or
invoking tools, reading live docs, and reconciling unfamiliar information

AI agents overwhelmingly choose the first option.

Not because it's better-but because it's cheaper.

Training Data Is the Path of Least Resistance

From the agent's perspective, training data has several advantages:

It's instantly accessible
It requires no tool invocation
It introduces no latency
It carries no parsing or interpretation overhead
It fits cleanly into the model's internal reasoning space

By contrast, reading current documentation requires:

calling external tools
navigating inconsistent formats
resolving contradictions
updating prior assumptions
and accepting uncertainty

When both paths are available, LLMs optimize for lowest friction, not highest correctness.

This is not laziness. It's optimization under the incentives we give them. So let's give them better incentives in our orchestration frameworks.

Why Rust and New Dependencies Expose the Problem

This behavior becomes especially visible in ecosystems that change quickly or enforce strict correctness.

Rust

Rapidly evolving idioms
Strong compiler guarantees
Low tolerance for outdated assumptions

Modern Python Dependencies

Frequent API churn
Breaking changes with familiar names
Outdated examples lingering in blogs and Q&A sites, many of which are used to train the models

In these environments, stale knowledge fails fast and loudly.

The agent isn't "bad at Rust."
Rust is simply unforgiving of assumptions that were true two years ago.

You can give your AI of choice a simple prompt to test this out for yourself.
"When was the cutoff for the data you were trained on?"

Gemini 3: As an AI model, my core training data goes up until early 2024.

GPT-5.2: My general training data has a knowledge cutoff of June 2024.

The Myth: "AI Just Isn't Good at This Yet"

This framing misses the point.

AI agents:

Can read current docs
Can explore dependency repositories
Can reason over updated APIs
Can adapt to workflow-specific constraints

But the tooling rarely requires them to do so.

As long as a cheaper fallback path remains open, agents will continue to default to whatever is fastest-even if it's wrong.

MCP and Live Documentation: Necessary, Not Sufficient

Model Context Protocol (MCP) and similar approaches are a step in the right direction. They make current, authoritative information available to the agent.

But availability is not enforcement.

Without guardrails:

MCP becomes optional
Training data remains a valid fallback
Outdated assumptions go unchallenged and lead to "hallucinations"

The result is an agent that can be correct, but is never compelled to be.

In practice, developers end up doing the enforcement manually. Tweaking system prompts, custom modes and tools all in an effort to force the agent to do the right thing.

The Invisible Work Making AI "Work"

Much of the real progress in AI-assisted development isn't happening in glossy demos. It's happening through unglamorous, iterative work that rarely reaches management dashboards:

Forcing documentation reads before generation
Pinning dependency versions aggressively
Shaping prompts to invalidate stale priors
Chaining tools to surface compiler errors early
Constraining fallback behaviors
Repeatedly correcting the agent until it learns the workflow

This is why AI often looks "plug-and-play" from the outside-and anything but from the inside.

Why "Turn-Key AI" Keeps Missing the Mark

The idea of turn-key AI assumes:

No configuration
No workflow modeling
No ecosystem-specific tuning

That assumption breaks down immediately in modern software environments.

AI agents don't just need knowledge.
They need contracts:

When to trust training data
When to distrust it
When live sources are mandatory
When correctness matters more than speed

Without those contracts, the agent will always choose the shortcut. It takes months of trial and error, working intimately with the agent to get it to do the right thing. What's more, that "right thing" changes depending on the project, the dependencies, and the specific task at hand.

What Better Tooling Actually Looks Like

If companies want to sell "turn-key AI", they need to build tools that would allow a user to create those contracts in minutes, not months.

Fixing this doesn't require smarter models. It requires better defaults, smarter configuration panels. For example, we should be able to tell the AI:

Force documentation and tool reads when versions mismatch
Invalidate training priors when dependencies change
Make live sources authoritative, not advisory
Penalize undocumented assumptions
Treat workflows as first-class inputs, not afterthoughts

In short: Reward correctness over familiarity.

The Real Reframe

AI agents are not failing because they lack intelligence, context, or access to information. They fail because we have designed their environments to reward familiarity over correctness and speed over verification.

When an agent defaults to training data, it is doing exactly what our tooling allows-and often encourages-it to do. Until workflows, version awareness, and source authority are treated as first-class constraints, no amount of model improvement will fix the problem.

If we want AI systems that work reliably with modern software, we don't need "smarter models." We need tools that make the right behavior unavoidable.

The Hidden Work That Makes AI Actually Work

The Hidden Work That Makes AI Actually Work

AI Can Read the Docs - So Why Doesn't It?

Capability vs. Default Behavior

Training Data Is the Path of Least Resistance

Why Rust and New Dependencies Expose the Problem

Rust

Modern Python Dependencies

The Myth: "AI Just Isn't Good at This Yet"

MCP and Live Documentation: Necessary, Not Sufficient

The Invisible Work Making AI "Work"

Why "Turn-Key AI" Keeps Missing the Mark

What Better Tooling Actually Looks Like

The Real Reframe

About the Author

Join the Conversation

The Hidden Work That Makes AI Actually Work

The Hidden Work That Makes AI Actually Work

AI Can Read the Docs - So Why Doesn't It?

Capability vs. Default Behavior

Training Data Is the Path of Least Resistance

Why Rust and New Dependencies Expose the Problem

Rust

Modern Python Dependencies

The Myth: "AI Just Isn't Good at This Yet"

MCP and Live Documentation: Necessary, Not Sufficient

The Invisible Work Making AI "Work"

Why "Turn-Key AI" Keeps Missing the Mark

What Better Tooling Actually Looks Like

The Real Reframe

Related Articles

About the Author

Join the Conversation