How to evaluate a trading strategy: the metrics that matter

Evaluating a trading strategy is where most people go wrong, because they look at the one number that matters least: total return. A strategy with a huge return and a terrifying drawdown is unusable; a modest, steady one with a high Sharpe and small drawdown is a keeper. Worse, a great-looking backtest with only thirty trades is statistical noise dressed up as an edge. This guide walks through the metrics that actually tell you whether a strategy is worth trading — return-versus-risk, expectancy, sample size — and the red flags that expose a fake edge before it loses you money.

On this page
  1. Beyond total return
  2. Return vs risk
  3. Win rate and expectancy
  4. Sample size
  5. Red flags
  6. The evaluation checklist
  7. FAQ

Why total return is the wrong starting point

Total return ignores how much pain you endured to get it. A strategy that returns 80% but suffers a 60% drawdown is almost impossible to trade — you would abandon it at the bottom. Evaluation is about return relative to risk, not return alone.

Return versus risk

The core metric is the Sharpe ratio — return per unit of volatility. A Sharpe above 1 is decent, above 2 is excellent, below 1 is questionable. Pair it with maximum drawdown (the worst peak-to-trough loss) and the recovery time. A strategy you can actually hold through its worst period is worth more than a higher-return one you would quit.

smooth: high Sharpe same return, deep drawdowns
Two strategies can end at the same return; the smoother one is far more tradeable and scores a higher Sharpe.

Win rate and expectancy

Win rate alone is meaningless — a 90% win rate with occasional huge losses can still lose money. What matters is expectancy: average win × win rate minus average loss × loss rate. A positive expectancy after fees and slippage is the real bar. Check yours on the win-rate profit calculator.

Sample size

Thirty trades is noise

A backtest with a handful of trades tells you almost nothing — the results could easily be luck. You want hundreds of trades across multiple market regimes (bull, bear, chop) before trusting a metric. A strategy that only ever traded one bull market has not been tested; it has been flattered.

Red flags that expose a fake edge

Too few trades; a perfect equity curve with no losing streaks (often a hidden martingale or look-ahead bug); results that collapse on out-of-sample data; many tuned parameters; and a strategy only profitable on one asset or timeframe. Each is a sign of overfitting.

The evaluation checklist

Run the strategy on the backtester, then ask: Sharpe > 1? Max drawdown survivable? Positive expectancy after costs? Hundreds of trades across regimes? Holds up on walk-forward out-of-sample data? Only a yes to all of these earns it real money — and even then, start small with paper trading.

Not financial advice. This content is educational. Automated and algorithmic trading carries a real risk of financial loss. Never trade money you cannot afford to lose. Review the SEC investor.gov and CFTC resources before trading.

Frequently asked questions

How do you evaluate a trading strategy?

Look beyond total return at return relative to risk. Check the Sharpe ratio (return per unit of volatility), maximum drawdown and recovery time, expectancy after fees and slippage, and sample size — hundreds of trades across bull, bear and ranging markets. A strategy you can actually hold through its worst period with a positive edge is what you are looking for.

What is the most important metric for a strategy?

There is no single metric, but the Sharpe ratio and maximum drawdown together are the most telling, because they capture return relative to the risk and pain endured to earn it. Total return alone is misleading — a high return with a deep drawdown is untradeable. Expectancy after costs then confirms the edge is real, not just well-timed.

How many trades do I need to trust a backtest?

Generally hundreds, spread across multiple market regimes. A backtest with only a few dozen trades is statistical noise that could easily be luck, and one that only covers a single bull market has been flattered rather than tested. The more trades and the more varied the conditions, the more confidence you can place in the metrics.

What are the red flags of a fake trading edge?

Too few trades, a perfect equity curve with no losing streaks (often a hidden martingale or look-ahead bug), results that collapse on out-of-sample data, many finely tuned parameters, and profitability only on one asset or timeframe. Each is a classic sign of overfitting — a strategy that fits the past but will not survive live.

MB

Mustafa Bilgic

Algorithmic trading practitioner · Founder, AITradingBot.us

Mustafa builds and backtests automated trading systems and writes about them without the hype. Every tool on this site is free and runs entirely in your browser.