Backtests Didn’t Make $220 Million. Human Instinct Did.

There is a moment every volatility trader knows well. You run the backtest, the numbers look clean, the Sharpe ratio is sitting pretty, and you feel like you have found something real. Then the market does something the model never saw coming, and you watch the trade unwind in ways that no historical simulation ever prepared you for.

I have been in that seat before. But I’ve also been on the right side of exactly those moments too, not because my model was smarter, but because I understood what the model couldn’t tell me.

The COVID Volatility Lesson

I spent years trading the VIX book at JP Morgan, and the single most instructive experience of my career happened during COVID. The VIX basis, which is the relationship between VIX options and futures on one side and the S&P options complex on the other, blew out to levels nobody had ever seen. Under normal conditions, that spread stays somewhere between one and three vols. During COVID, it hit fifteen.

The closed-form formula that governs the arbitrage between those two products was screaming that the VIX complex was trading four, five, even six standard deviations too rich. Every model on the street was flagging it as a sell. And traders who listened to those models, who tried to fade the basis on the way up from six to fifteen, got destroyed, taking entire funds down with them.

We made $220 million on my book that year. The next highest P&L on the street was around $120 million. The reason we outperformed had nothing to do with having a better backtest and everything to do with understanding flows, reading human psychology in real time, and recognizing that when the world is actually on fire, owning certain things as a hedge makes sense no matter what the model says.

Why Backtests Fail in Real Markets

Backtests alone simply aren’t good enough to keep up with modern markets. A backtest can’t incorporate the feeling of watching global markets seize up in March 2020. Models can’t sort through the infinite, ever-changing sentiment of those chaotic months, take in new CDC recommendations in real time or guess the moments that people might panic. That judgment can only come from experience.

None of this is to say that historical data is useless, but rather that an over-reliance on backtesting gives traders a false sense of completeness. You build a model, you run it against five or ten years of data, and the output looks authoritative. But markets are dynamic and they don’t always perform like they have in the past. The relationships that held from 2010 to 2019 don’t automatically hold up in a world shaped by zero interest rate policy, meme stock flows, and the kind of retail participation we have seen since 2020.

Retail Changed the Game

Retail traders bought the COVID dip and the tariff dip before most hedge funds did, in part because any reasonable backtest would tell you that those trades are historically questionable at absolute best. But these dips weren’t the same as the ones that had come before. One of the biggest market makers on the street lost over a billion dollars to retail call buying on Tesla. The game has changed, and a backtest built on the old game will not save you in the new one.

From Backtests to Adaptive AI

After going from exotics trading to a hedge fund seat, to running the equity derivatives desk at BGC, I’ve come to realize that the real advantage of today’s AI tools isn’t just that they are accessible, it’s that they are built to move with the market rather than behind it.

Goldman Sachs sells a backtesting product called Marquee that lets you test something like buying a one-month straddle and hedging it daily, and it will give you your P&L and your Sharpe. That’s useful to a point, but it’s an exercise that’s firmly rooted in the past. Backtests are, by design, a rearview mirror. They function on a fixed dataset, running against a static set of conditions, all in hopes of telling you something useful about a market that has already moved on. Well-built, cutting-edge AI tools don’t suffer from the same issues, they can be tailored to take in the ever-changing waterfall of new information and account for it in whatever way the trader needs most.

The Rise of Modular Market Intelligence

This modularity is the thing that people are not talking about enough. What I am building, and what other ex-traders like the Vol Signals team are building, works completely differently. You can feed it a live data stream from Cboe or any of the major exchanges, and the system can break down in real time what the entire market is long, what it is short, where gamma is concentrated, and how those positions are shifting as new information comes in.

Seven years ago, firms like Element and Tudor Capital would literally email us at BNP asking for our best guess on where S&P gamma positioning stood. It was a survey. There was nothing quantitative behind it. Now a single trader can pull that same data off an exchange feed, run it through Claude or ChatGPT, and have a live dashboard with hard quantitative truth behind every number. The system updates with the market rather than waiting for the next backtest cycle.

Markets Now Require Adaptability

That adaptability is exactly what the current market environment demands. Retail participation has reshaped flow dynamics in ways that no pre-2020 backtest can fully account for. The tariff dip, the COVID V-shape, the Tesla call-buying episode — those were all moments where the market was telling us something new, something that had no clean historical analogue.

A rigid backtesting framework is not designed to absorb that kind of signal quickly. An AI system that is pulling live positioning data, processing new inputs as they arrive, and operating on a modular architecture that you can reconfigure as conditions change, that is a tool built for the market as it actually exists, not the market as it existed three years ago.

AI Still Needs Human Judgment

The key distinction, though, is that none of this works without the underlying market knowledge to guide it. I have seen people online talking about letting an AI agent run for twelve hours overnight and trusting whatever it produces. That is a recipe for disaster and a sure-fire way to make something completely useless. These models still hallucinate, but the reason they work in the hands of experienced traders is that an experienced trader knows what the output is supposed to look like.

When I first fed the Bergomi model for VIX into an AI system, it got the details wrong. But I knew it was wrong because I had spent years working with that model in a live trading environment. That background is what lets you course-correct and push the technology in the right direction instead of just accepting whatever it spits out.

The Real Edge Is Still Human

The traders who are going to lose their edge are the ones who treat backtesting as gospel rather than a starting point. New AI systems are a refinement of backtesting, not a replacement for it. They’re a way for traders to wring a few invaluable extra details out of the market before they make their next move. The traders who are going to thrive are the ones who use AI to do what it’s genuinely good at, aggregating data, surfacing relationships, building modular systems that can handle complex multi-leg structures, and then bringing their own judgment to bear on everything the model cannot see.

But the technology is not the edge. The moment when the arbitrage is screaming one thing and the market is telling you something else entirely still belongs entirely to the traders themselves. Integrating these tools, making them work together to show you the fullest possible picture of the market at a given moment, is going to be foundational to any success story. But the instinct is still the real edge and that was always be definitively human.

Ishan Malik is Managing Director, Index Volatility at BGC Group. Over nearly two decades, he has traded equity derivatives across the buy-side and sell-side, including senior roles at JPMorgan Chase & Co., BNP Paribas, and Verition Fund Management.

Recommended