2. Leakage — The Silent Killer of ML Reliability

March 1, 2026

Ashok Suthar

Your test set is your only honest signal about the future. Leakage is what silently corrupts it — from the inside.

Leakage is the only ML problem where everything looks fine — until it isn’t.

Your metrics improve. Your model looks sharp. Your team is confident. And your production system is quietly getting worse.

That gap between what you see and what is real — that is leakage.

What leakage actually is

Leakage happens when your model learns from information it should never have had access to at training time.

Not because someone made a careless mistake. But because data pipelines are complex, time is easy to ignore, and the model will happily learn from anything you give it — correct or not.

The model does not know it is cheating. It just learns.

Nassim Taleb in Fooled by Randomness writes about a trader who has an extraordinary winning streak — not because of skill, but because the market conditions happened to reward his specific strategy. He mistakes noise for signal. He mistakes luck for edge. Then conditions change, and he is exposed.

Leakage is the ML version of this. Your model has an extraordinary evaluation streak. But the streak is built on information it should never have had. When production conditions arrive — real data, real time, no leakage — the edge disappears. And you are left debugging a system that looked perfect on paper.

The root cause: broken causal order

Here is the principle that grounds everything:

In the real world, you train first. The future arrives later.

Your model learns from historical data. Then it makes predictions on new, unseen data. That is the only valid sequence.

Leakage breaks this sequence. It allows future information — data that would not exist at prediction time — to influence what the model learns during training.

The model builds its understanding on evidence it could never have in production. So in production, that evidence is missing. The model fails quietly.

What this looks like in practice

Three forms I have seen cause the most damage:

Target leakage — a feature in your training data is derived from, or correlated with, the outcome — but only after the outcome is known. Freakonomics taught me to always ask: is this variable actually upstream of the event, or downstream of it?Example: including a “payment reversal flag” to predict fraud, when typically reversals only happen after fraud is suspected or confirmed. The model learns a perfect signal that does not exist at decision time.
Temporal leakage — future data gets used to train a model that should only know the past. Example: using a customer’s average transaction value calculated over the full year to predict fraud in January. In January, you only know January.
Pipeline leakage — a preprocessing step is fit on the full dataset before the split, letting the model subtly see the future through aggregated statistics. Example: calculating the global average of failed login attempts across the entire dataset to fill missing values before splitting. The model quietly learns the behavior of future fraudsters during training.

Why it is dangerous specifically

Most errors in ML degrade your metrics. Leakage improves them.

That is what makes it a silent killer. There is no obvious warning. Your cross-validation looks great. Your stakeholders are happy. You ship.

Then production performance disappoints. Debugging begins. Root cause is buried somewhere in the data pipeline, weeks or months back.

By then, engineering decisions — model choice, feature selection, architecture — were all made on false evidence.

The mental model to carry forward

Every feature, every statistic, every transformation in your training pipeline must answer one question:

Would this information exist at the moment of prediction in production?

If the answer is no — or even maybe — it does not belong in training.

Always remember, “Leakage is not a data quality problem. It is a causal ordering problem. The fix is not cleaning data. It is thinking clearly about time.“

Protecting the test set from leakage gives you a clean estimate. But a single clean estimate carries its own problem — which is what we discuss next.