What is statistical arbitrage? Stat arb explained with code

Statistical arbitrage — “stat arb” — is a family of quantitative strategies that profit not from predicting a single asset’s direction but from the statistical relationship between many assets. The classic version bets that two historically correlated instruments that have temporarily diverged will converge again. It is market-neutral by design: long the cheap leg, short the expensive one, so broad market moves cancel out and only the spread matters. This guide explains the core idea, the z-score signal that drives it, the very real risks, and shows working Python.

On this page
  1. What stat arb is
  2. The core idea
  3. The z-score signal
  4. A stat arb in code
  5. The risks
  6. The retail reality
  7. FAQ

What statistical arbitrage is

Statistical arbitrage is a quantitative approach that exploits statistically predictable relationships between assets rather than the direction of any one asset. The best-known form is pairs trading: find two instruments whose prices normally move together, and when their spread stretches abnormally wide, bet on it snapping back. It is “arbitrage” only in a loose sense — there is no risk-free lock, just a statistical edge that holds on average.

The core idea: mean reversion of a spread

Take two correlated assets — say two exchange-listed ETFs, or BTC and ETH. Their spread (or ratio) tends to oscillate around a stable mean. When the spread diverges far from that mean, stat arb goes long the underpriced leg and short the overpriced leg, profiting as the spread reverts. Because you are long one and short the other, overall market direction largely cancels — the position is market-neutral.

mean +2σ short spread −2σ long spread
The spread reverts to its mean; stat arb fades the ±2σ extremes and exits as it returns to the mean.

The z-score signal

The standard trigger is the z-score of the spread: how many standard deviations it sits from its rolling mean. Enter when the z-score exceeds roughly ±2 and exit as it returns toward 0. This is a disciplined, codeable form of mean reversion applied to a spread rather than a single price.

A stat arb in code

python · statarb.pyimport pandas as pd, numpy as np

def zscore_signal(a, b, win=30):
    spread = np.log(a) - np.log(b)
    mean = spread.rolling(win).mean()
    std = spread.rolling(win).std()
    z = (spread - mean) / std
    # +1 = long A / short B ; -1 = short A / long B
    sig = pd.Series(0, index=a.index)
    sig[z < -2] = 1
    sig[z > 2] = -1
    return sig.shift(1)   # act next bar, no look-ahead

The risks

Correlation is not a contract

The deadly stat arb failure is when the relationship breaks — a merger, a regulatory shock, a delisting — and the spread keeps widening instead of reverting. Now both legs lose, and a market-neutral position becomes a double loss. Real stat arb also pays double fees and shorting/borrow costs, and crypto shorting carries funding and liquidation risk. Always test for cointegration, not just correlation.

The retail reality

Professional stat arb runs hundreds of pairs at low latency with cheap borrow — edges retail rarely matches. For an individual, a single well-chosen, cointegrated pair backtested honestly on the backtester and paper traded is a realistic learning project; a hundred-pair book is not. Mind shorting costs and the borrow that crypto often lacks.

Not financial advice. This content is educational. Automated and algorithmic trading carries a real risk of financial loss. Never trade money you cannot afford to lose. Review the SEC investor.gov and CFTC resources before trading.

Frequently asked questions

What is statistical arbitrage in simple terms?

Statistical arbitrage is a quantitative strategy that profits from the statistical relationship between assets rather than the direction of any single asset. The classic form bets that two historically correlated instruments that have temporarily diverged will converge again — going long the cheap one and short the expensive one so broad market moves cancel out.

How does a statistical arbitrage signal work?

The standard signal is the z-score of the spread between two assets: how many standard deviations the spread sits from its rolling mean. The strategy enters when the z-score reaches roughly plus or minus two and exits as it returns toward zero. It is disciplined mean reversion applied to a spread instead of a single price.

Is statistical arbitrage risk-free?

No. Despite the name, it is not risk-free arbitrage — it is a statistical edge that holds only on average. The biggest danger is the historical relationship breaking permanently (from a merger, shock or delisting), so the spread widens instead of reverting and both legs lose. It also pays double fees plus shorting and borrow costs.

Can retail traders do statistical arbitrage?

On a small scale, yes, but not the way institutions do. Professionals run hundreds of pairs at low latency with cheap borrow, which retail cannot match. An individual can realistically learn from a single well-chosen, cointegrated pair, backtested honestly and paper traded first, while watching shorting and borrow costs carefully.

MB

Mustafa Bilgic

Algorithmic trading practitioner · Founder, AITradingBot.us

Mustafa builds and backtests automated trading systems and writes about them without the hype. Every tool on this site is free and runs entirely in your browser.