Data Engineering

Even the Best Context Can't Fix Bad Data

Thiru Arunachalam, Founder & CEO, WALT
May 15, 2026
All posts

Every data engineering team at the most significant global enterprises today is pouring resources into context–retrieval pipelines, semantic search, agent memory, prompt scaffolding. The assumption underneath all of it: if we just give the AI enough context, it'll figure the rest out.

But context is like lipstick on a pig: it can dress up bad data, but it cannot fix it.

Context is key and a unified data foundation is a defensible north star, provided your house in order. Clean, current, well-defined data is the precondition for everything else and if that data is crappy, then context engineering would merely compound all that noise.

What context actually does

Context engineering is the practice of giving an AI model the surrounding information it needs to produce a relevant, accurate response. A well-scoped retrieval pipeline over clean, well-defined data is genuinely useful and improves the outcomes you get from LLMs and AI agents.

However, LLMs and AI agents cannot reason on bad data. Context alone cannot fix this problem. Context takes what's there and packages it for consumption–lipstick on a pig. But if the data underneath is wrong, all context does is package the wrong things more efficiently. This is the trap that most data and AI teams are walking into right now. They're treating context as the solution to unreliable results from LLMs and agents. You can't ‘context engineer’ your way out of bad source definitions with better retrieval.

The foundation metaphor for data is a trap.

The deeper question is why so much enterprise data ends up in this state in the first place. Most data platforms were built on a premise that sounds reasonable until it meets reality: build a solid foundation, lock it down, and everything built on top will be trustworthy. For buildings, that logic holds. Once concrete is poured and set, you construct upward. Nothing below changes.

That analogy doesn’t work for businesses.

The business will change — the customer experiences it ships, the products it launches, the markets it enters. The foundational data layer driving decisions must be agile and reconfigurable, in step with the business, and not re-jigged every time strategy shifts. This metaphor has shaped how data platforms are built for decades. Data engineering still operates on quarterly release cycles, monolithic warehouses, and a hand-off between "the people who build the pipeline" and "the people who use the data."

How software solved the agility problem, and why the data industry stayed rigid

Software engineering cracked agility a decade ago with DevOps. Large consumer companies now ship products to tens of millions of users every week because of:

1. CI/CD pipelines: Every commit is shippable.

2. Feature flags: Deploy is decoupled from release, so risky changes can ride dark and roll back instantly.

3. Microservices with clear contracts: Teams ship independently without coordinating across a monolith.

Data engineering never underwent a similar transformation, and the reason is trust. Traditional data quality relied heavily on locking things down with master data management, fixed schemas, contract-first ingestion, registries that declare "this is the contract, do not break it."

In this rigid environment, schema evolution and drift are treated as exceptions. Violations of the contract are penalized (but shouldn’t be). The data platform team says no to both upstream app changes (their schemas are load-bearing) and downstream business changes (the model can't bend fast enough). The data platform itself becomes a constitution instead of a flexible, agile, reconfigurable codebase.

That rigidity can kill businesses. A business must rapidly keep iterating its apps and customer experiences, otherwise it will die. Data platforms should enable that iteration. Data consumers, dashboards and natural-language interfaces alike, should see the impact of business decisions in near real time.

What bad data is actually costing you

Raise your hand if you’re familiar with this vicious cycle:

Change is hard → platform team says no → consumers route around them → federation, satellites, mesh emerge as escape hatches → the foundation gets weaker, not stronger → the team doubles down on control.

The CDO becomes the No function, and the business loses the very flexibility that data platforms were supposed to provide. By chasing after context layer solutions, you are trying to fix a structural problem with tooling. In this environment, retrieval pipelines get layered over data that was already inconsistent at the source. All this does is make agents get better at finding the wrong answers faster.

Here’s another familiar scenario. A product manager asks for three new fields on a dashboard and six weeks later they're still waiting. That request traverses a queueing network: source-system review, ingestion changes, Bronze landing, Silver conformance, modeling, semantic layer updates, BI changes, governance sign-off. Each node is owned by a different team with its own backlog.

What is the TCO of the enterprise data stack for the agent era?

Total cost of ownership for a data officer is usually articulated in three buckets: storage, compute, and software. But the most expensive part of the data platform is the data engineering effort, and the lost time-to-market for the business waiting on it.

A cross-functional question that should take seconds takes two weeks. A board deck stalls because three teams have three different margin numbers. An AI pilot stays permanently in POC because the data foundation was never ready for production.

The question most teams are asking is the wrong one

Most organizations are asking: how do we build a better context layer? The question they should be asking: how do we make the data we already have actually mean something?

Those are different questions with different answers.

The first leads to retrieval pipelines, prompt scaffolding, and larger context windows–all essential and extremely useful when the underlying data is trustworthy. The second leads to autonomous data engineering that supports flexibility and agility. That is the sequence the industry keeps skipping. Fix the data first, then give your AI something worth retrieving.

How do autonomous agents tackle the rigidity problem

With autonomous and semi-autonomous systems in the mix, flexibility without compromising quality is achievable. We now have versioned schemas, contract testing in CI, automated lineage, machine-readable data products, runtime validation, etc. and each of these can be accelerated further by agents.

Autonomous data engineering gents can build the best in class medallion layers on top of any data estate. They can also run data operations. Schema evolution stops being a crisis and becomes a managed update. A breaking change in a source system doesn’t cascade into broken dashboards, but becomes an auditable event with a traceable resolution.

With autonomous agents, the data layer can finally be engineered to be dynamic from day one: continuously monitored, continuously updated, continuously aligned with how the business actually defines and uses its data.

A Cambrian explosion is coming for your data platform

Where data lives or how it is pulled will not be the deciding factor anymore. As the Cambrian explosion hits, consumption patterns will change fundamentally.

In the next 18 months, the highest-volume consumer of your platform won’t be human analysts, but the agents your business users are building right now. These agents will query your platforms and source systems constantly, at machine speed.

Is your stack ready for agentic consumption? Because most organizations are not there yet.

And the gap between "we have a data warehouse" and "our data platform is agent-ready" is not a tooling gap, but a data gap.

Solving a data problem with more retrieval infrastructure is the wrong move. The architecture has to change, and so does the way you think about what a data foundation is supposed to do.

How to reframe the data foundation problem

Treat the foundation like product code, not a constitution. Ship small, evolve schemas, version contracts cheaply, and design for change. View quality as a property of how you operate, and not how locked down you are. A platform that can adapt continuously is more trustworthy than one that enforces rigidity and quietly falls behind.

Autonomous data engineering agents can build the world class medallion, deliver reliability (not observability), optimize your estate and reduce your bill. The real question isn’t whether you have built the foundation, but whether your foundation can move at the speed of business.

To become agent-native, fix the data first. Context alone won’t do it.

Agents are going to be your first-class consumers, and your data architecture must be built from ground up for this pattern of data consumption. Agents querying inconsistent, ungoverned, context-patched data will fail at machine speed, and a context layer or a semantic layer won’t fix your underlying data problems.

The good news: you do not need to replace your stack to get there. Every topology and every warehouse you already run can be made agent-native. Becoming future-proof does not require an 18-month transformation program with a top IT consulting firm. You can catapult to agent-native on whatever you already have. Once you’re agent-native, the agents themselves can help you migrate the storage and compute underneath, if and when it actually makes sense.

Stop engineering context around bad data. Your agents are only as good as the data underneath them. Assess data readiness for agentic AI first and make sure that the data reasoning layer is accurate, relevant, and updated.

Want to see what agent-ready data looks like in production? Book a demo with the WALT.