Trading bot error handling: retries, timeouts and fail-safes
A backtest never throws an exception, but a live bot faces a hostile world: exchanges time out, rate-limit you, reject orders, return partial fills and occasionally go down entirely. The difference between a bot that survives and one that blows up is rarely the strategy — it is the error handling. A bot that crashes mid-trade can leave a naked position; one that retries blindly can double an order. This guide covers the error types you must handle, retry-with-backoff, partial fills, rate limits and the kill switch, all with real code.
Why error handling decides survival
Live trading is unreliable by nature. A bot that assumes every API call succeeds will eventually crash at the worst possible moment — mid-position, with a stop not yet placed. Robust error handling turns transient failures into harmless retries and turns genuine problems into a safe, controlled stop rather than an uncontrolled loss. It is the operational core of risk management.
The error types to handle
With ccxt the failures fall into clear buckets: NetworkError (timeouts, DNS, connection resets — usually transient, safe to retry), ExchangeError (rejected order, bad params — usually a real bug, do not blindly retry), RateLimitExceeded (back off and slow down), and InsufficientFunds (a logic error in sizing). Each needs a different response.
Retry with exponential backoff
python · retry.pyimport ccxt, time
def safe_call(fn, *args, tries=5):
for i in range(tries):
try:
return fn(*args)
except ccxt.NetworkError as e:
wait = 2 ** i # 1, 2, 4, 8, 16s
print(f"network error, retry in {wait}s: {e}")
time.sleep(wait)
except ccxt.ExchangeError as e:
print(f"exchange rejected, NOT retrying: {e}")
raise # a real bug — surface it
raise RuntimeError("exhausted retries")
Order-level safety
If a create_order call times out, you do not know whether it filled. Blindly retrying can place the order twice. Instead, re-fetch open orders and recent trades to learn the true state before acting, and use a unique clientOrderId so a duplicate is rejected by the exchange. Reconcile state from the exchange — never assume.
Handling rate limits
Keep ccxt’s enableRateLimit on so it self-throttles, and still catch RateLimitExceeded to add an extra pause. Hammering an exchange gets your key temporarily banned, which can strand an open position. Slow and steady beats fast and banned — relevant to any high-frequency design.
The kill switch
Wrap the main loop in a top-level handler that, on any unrecoverable error, flattens positions (or at least cancels open orders) and stops — a software dead-man’s switch. Pair it with the max-drawdown kill switch from risk management and full logging so you can see exactly what happened.
Frequently asked questions
Why is error handling important for a trading bot?
Because live trading is unreliable: exchanges time out, rate-limit, reject orders and occasionally go down. A bot that assumes every call succeeds will eventually crash mid-trade, potentially leaving a naked position with no stop in place. Robust error handling turns transient failures into harmless retries and genuine problems into a safe, controlled stop instead of an uncontrolled loss.
How should a trading bot handle network errors?
Network errors such as timeouts and connection resets are usually transient, so the right response is to retry with exponential backoff — waiting 1, then 2, then 4 seconds and so on for a few attempts. In ccxt you catch ccxt.NetworkError specifically and retry it, while letting genuine exchange rejections halt the bot instead of being retried blindly.
What happens if an order request times out?
A timed-out order is dangerous because you do not know whether it actually filled, so blindly retrying can place it twice. The safe approach is to re-fetch open orders and recent trades to discover the true state before acting, and to attach a unique clientOrderId so the exchange rejects an accidental duplicate. Always reconcile state from the exchange rather than assuming.
What is a kill switch in a trading bot?
A kill switch is a top-level safety mechanism that, on an unrecoverable error or a breached risk limit, flattens positions or at least cancels open orders and stops the bot — a software dead-man’s switch. Combined with a maximum-drawdown limit and full logging, it ensures that when something goes badly wrong the bot fails safely instead of compounding the damage.