Trading with the Kelly criterion
How probabilistic forecasts can be fully leveraged to an optimal allocation using the Kelly criterion
Forecasting the direction of the next price movement is only part of the problem of trading. A proper capital allocation to that forecast its equally significant, althought is importance is often underestimated. Here, I'm considering the problem of capital allocation on a single strategy, and not that of optimal portfolio allocation when we are investing in more than one instrument, like the framework of Modern Portfolio Theory.
A remarkable insight into this subject has been given by (Kelly, 1956). A good introduction to the subject can be found in Wikipedia. Thorpe also has a very interesting technical review (Thorp, 2008) and (Cover, 1999), like Kelly in its original paper, describes the interesting connections with information theory.
Kelly addresses the problem of optimal capital allocation under a statistically favourable (in expectation) betting opportunity. Two important conclusions from his work are:
- Even in the presence of a favourable bet, over-allocation of capital will lead to ruin in the long run with probability 1.
- Knowing the probability of the possible outcomes allows us to optimally size our bet (or position) in the sense of maximizing the expected growth rate of our portfolio.
Note: While point 1 may seem counterintuitive at first, think that if your portfolio falls 50%, it has to grow 100% to come back to its initial value.
Let's consider the following formulation of the Kelly criterion:
Let's define $R_t$ as the random variable describing the returns of our strategy (or returns of a price time-series) at time $t$. The portfolio value at time $t+1$ is then the random variable $P_{t+1}$ given by
$$ P_{t+1} = p_t (1+ f R_t) $$
with $p_t$ the portfolio value at $t$ and $f$ the respective fraction allocated to the strategy. Note that, $f$ can be negative, meaning that we are going short on the strategy (or instrument). If we have leverage available, $\vert f \vert$ can also be greater than 1. The portfolio value $P_{t+1}$ can also be written as
$$ P_{t+1} = e^{\Lambda}, $$
assuming, without loss of generality, $p_t = 1$. Here, $\Lambda = \mathrm{log}(1 + f R_t)$ is the random variable describing the portfolio growth rate. The Kelly criterion gives the optimal value of $f$ in the sense of maximizing the expected value of $\Lambda$. If $R_t$ follows a normal distribution $\mathcal{N} ( \mu_r, \sigma_r)$, (Thorp, 2008) has shown that the Kelly-optimal value of $f$ is given by
$$ f = \frac{\mu_r}{\sigma_r^2}. $$
Note: In order to study the long-term portfolio growth, we are actually interested in the random variable
$$ P_{t+1} = p_1 (1+ f R_1)(1+ f R_2) ... (1+ f R_t). $$
However, we are assuming that the $R_1$, $R_2$, ..., are independent. In this case, we can simplify the problem into the formulation above.
In order to appreciate the power of the Kelly criterion, we are going to conduct Monte-Carlo simulations, where at each time step the returns of our price time series are going to be drawn from a normal distribution $\mathcal{N} (\mu_r, \sigma_r)$. However, in order to simulate a more realistic and dynamical scenario, $\mu_r$ and $\sigma_r$ will themselves be modelled as stochastic processes as well. In particular, I am going to model $\mu_{r,t}$ via an Ornstein-Ulhenbeck process, and $\sigma_{r,t}$ via a geometric Ornstein-Ulhenbeck process. The latter, for instance, allows the time-series to become heteroskedastic.
The Ornstein-Ulhenbeck process is defined as:
$$ dY_t = - \frac{(Y_t - \mu)}{\tau}dt + \sigma \sqrt{\frac{2}{\tau}} dW_t, $$
with $\mu$ and $\sigma$ the process mean and standard deviation, respectively, and $W_t$ is Brownian motion. The geometric version is defined as
$$ dY_t = - \frac{(Y_t - \mu)}{\tau}dt + \sigma Y_t dW_t. $$
We can numerically integrate the stochastic differential equations above using the Euler-Maruyama method, implemented in the functions below:
import numpy as np
def Ornstein_Ulhenbeck(T, dt, mu, sigma, tau, Y0):
# Initializations
Y = list()
t = np.arange(0, T, dt)
Y.append(Y0)
# Parameters
N = len(t)
sigma_term = sigma*np.sqrt(2.0/tau)*np.sqrt(dt)
normal_draws = np.random.normal(loc=0.0, scale=1.0, size=N)
# Integration
for i in range(1, N):
Y.append(Y[-1] - dt*(Y[-1]-mu)/tau + sigma_term*normal_draws[i])
return np.array(Y)
def geometric_Ornstein_Ulhenbeck(T, dt, mu, sigma, tau, Y0):
# Initializations
Y = list()
t = np.arange(0, T, dt)
Y.append(Y0)
# Parameters
N = len(t)
sigma_term = sigma*np.sqrt(dt)
normal_draws = np.random.normal(loc=0.0, scale=1.0, size=N)
# Integration
for i in range(1, N):
Y.append(Y[-1] - dt*(Y[-1]-mu)/tau + sigma_term*Y[-1]*normal_draws[i])
return np.array(Y)
Let's simulate 1000 price bars:
Stochastic path for the mean of the returns - $\mu_{r,t}$
T = 1000
dt = 1
mu = 0
sigma = 0.002
tau = 1
path_mean = Ornstein_Ulhenbeck(T=T, dt=dt, mu=mu, sigma=sigma, tau=tau, Y0=mu)
And plotting the results:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 1, figsize=(14, 4))
axes.plot(path_mean, '-', color=(0.8,0.5,0.5,1.0))
axes.set_xlabel("Bar")
_=axes.set_ylabel("Path for returns mean")
Stochastic path for the volatility of the returns - $\sigma_{r,t}$
T = 1000
dt = 1
mu = 0.02
sigma = 0.01
tau = 10
path_std = geometric_Ornstein_Ulhenbeck(T=T, dt=dt, mu=mu, sigma=sigma, tau=tau, Y0=mu)
Plotting the results:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 1, figsize=(14, 4))
axes.plot(path_std, '-', color=(0.8,0.5,0.5,1.0))
axes.set_xlabel("Bar")
_=axes.set_ylabel("Path for returns volatility")
Price path
From the paths generated above, we can now draw the returns $R_t \sim \mathcal{N} (\mu_{r,t}, \sigma_{r,t})$ and the resulting price series $p_t = p_1 \displaystyle\prod_{t'= 1}^t (1+r_{t'})$:
path_returns = np.random.normal(loc=path_mean, scale=path_std)
path_price = 100*np.cumprod(1+path_returns)
fig, axes = plt.subplots(1, 2, figsize=(14, 4))
axes[0].plot(path_returns, '-', color=(0.8,0.5,0.5,1.0))
axes[0].set_xlabel("Bar")
axes[0].set_ylabel("Realized returns")
axes[0].set_title("Returns")
axes[1].plot(path_price, '-', color=(0.5,0.5,0.8,1.0))
axes[1].set_xlabel("Bar")
axes[1].set_ylabel("Realized price")
axes[1].set_title("Price")
plt.show()
The parameters above have been chosen such that the hit ratio - fraction of bars where the expected return $\mu_{r,t}$ is of the same sign as the realized return $r_t$ - varies approximately between 0.5 and 0.6. For this particular realization:
hit_ratio = np.sum(np.sign(path_returns*path_mean)>0)/len(path_returns)
print(hit_ratio)
Let's create a function that calculates the portfolio growth. It takes as parameters the initial portfolio value, the fraction of the portfolio invested at every bar (negative for short positions and positive for long positions), and the realized price returns
I am also assuming that we are effectively bankrupted it at any point our porfolio drops below 1 percent of its initial value:
def calculate_portfolio_growth(portfolio_initial, fractions, price_returns):
N = len(fractions)
strategy_returns = fractions*price_returns
portfolio_growth = portfolio_initial*np.cumprod(1+strategy_returns)
if True in [value<=1e-2*portfolio_initial for value in portfolio_growth]:
bust_ind = np.min(np.where(portfolio_growth<1e-2*portfolio_initial)[0])
portfolio_growth[bust_ind:] = 0
return portfolio_growth
We are going to assume maximum available leverage of 10 and three different allocation strategies:
- Full leverage
- Half leverage
- Optimal Kelly allocation (limited to a maximum of 10x leverage)
portfolio_initial = 1
leverage = 10
##### Strategy 1 - full leverage
fractions1 = leverage * np.sign(path_mean)
portfolio1 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions1, price_returns=path_returns)
##### Strategy 2 - half leverage
fractions2 = 0.5*leverage * np.sign(path_mean)
portfolio2 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions2, price_returns=path_returns)
##### Strategy 3 - Kelly
fractions3 = path_mean/path_std**2
fractions3[fractions3>leverage] = leverage
fractions3[fractions3<-leverage] = -leverage
portfolio3 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions3, price_returns=path_returns)
Let's now plot the performance of the different allocation strategies:
fig, axes = plt.subplots(1, 1, figsize=(7, 5))
axes.plot(portfolio1, '-', label="Full leverage", color=(0.8,0.5,0.5,1.0))
axes.plot(portfolio2, '-', label="Half leverage", color=(0.5,0.5,0.8,1.0))
axes.plot(portfolio3, '-', label="Kelly optimal", color=(0.5,0.7,0.7,1.0))
axes.legend()
axes.set_xlabel("Bar")
axes.set_ylabel("Portfolio value")
axes.set_yscale("log")
As stated before, which we can now empirically corroborate, one of the key insights of the Kelly criterion is that, even in the presence of a favorable betting (or trading) opportunity, overallocation will eventually lead to ruin.
In the realization above, the full leverage allocation does lead to ruin. The half leverage strategy has a positive return but underperforms the optimal Kelly allocation.
We can calculate the average absolute leverage in the Kelly allocation:
print(np.round(np.mean(np.abs(fractions3)), 2))
Thus, despite the similar used leverage (on average), the Kelly strategy outperforms the constant 5x leverage strategy.
A better way to conduct this analysis is through the Sharp ratio, which can be shown to be maximized (in expectation) by the Kelly-optimal allocation.
def calculate_sharp_ratio(portfolio):
N = len(portfolio)
if True in [value==0 for value in portfolio]:
bust_ind = np.min(np.where(portfolio==0)[0])
portfolio = portfolio[:bust_ind]
returns = (portfolio[1:]-portfolio[0:-1])/portfolio[0:-1]
sharp_ratio = np.sqrt(252)*np.mean(returns)/np.std(returns)
return sharp_ratio
In the above, we are associating each bar to a trading day, and annualizing the Sharp ratio. The results:
print("Sharp ratio, full leverage =", np.round(calculate_sharp_ratio(portfolio1),2))
print("Sharp ratio, half leverage =", np.round(calculate_sharp_ratio(portfolio2),2))
print("Sharp ratio, Kelly optimal =", np.round(calculate_sharp_ratio(portfolio3),2))
Note that, an overall multiplicative factor in the used leverage does not change the Sharp ratio. The reason why the Sharp ratio may be different between the full- and the half-leverage strategies is because of the possibility of going bankrupt.
While the results above corroborate our expectations about the Kelly allocation, their statistical significance is somewhat meaningless, because we have considered a single realization, albeit a long one with a total of 1000.
I am going to perform many realizations (200) of the experiment above and check the performance statistics.
Note: The processes simulated here are essentially ergodic, meaning that $m$ realizations of $N$ bars each, is equivalent to a single realization of $mN$ bars, in terms of their statistical properties. For the sake of simplicity, I'm going to choose the former.
n_runs = 200
strategy1 = list()
strategy2 = list()
strategy3 = list()
for i in range(0, n_runs):
# Price path
path_mean = Ornstein_Ulhenbeck(T=1000, dt=1, mu=0, sigma=0.002, tau=1, Y0=0)
path_std = geometric_Ornstein_Ulhenbeck(T=1000, dt=1, mu=0.02, sigma=0.01, tau=10, Y0=0.02)
path_returns = np.random.normal(loc=path_mean, scale=path_std)
path_price = 100*np.cumprod(1+path_returns)
# Full leverage
fractions1 = leverage * np.sign(path_mean)
portfolio1 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions1, price_returns=path_returns)
strategy1.append(calculate_sharp_ratio(portfolio1))
# Half leverage
fractions2 = 0.5 * leverage * np.sign(path_mean)
portfolio2 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions2, price_returns=path_returns)
strategy2.append(calculate_sharp_ratio(portfolio2))
# Kelly
fractions3 = path_mean/path_std**2
fractions3[fractions3>leverage] = leverage
fractions3[fractions3<-leverage] = -leverage
portfolio3 = calculate_portfolio_growth(portfolio_initial=portfolio_initial, fractions=fractions3, price_returns=path_returns)
strategy3.append(calculate_sharp_ratio(portfolio3))
sharp_ratio = np.array([strategy1, strategy2, strategy3]).T
strategy_labels = ["Full leverage", "Half leverage", "Kelly optimal"]
And now plotting the results:
fig, axes = plt.subplots(1, 1, figsize=(6, 4))
axes.boxplot(sharp_ratio)
axes.set_xticklabels(strategy_labels, rotation=0)
axes.set_ylabel("Sharp ratio")
plt.tight_layout()
plt.show()
print("Mean Sharp ratio, full leverage =", np.round(np.mean(sharp_ratio[:,0]),2))
print("Mean Sharp ratio, half leverage =", np.round(np.mean(sharp_ratio[:,1]),2))
print("Mean Sharp ratio, Kelly optimal =", np.round(np.mean(sharp_ratio[:,2]),2))
Now we have statistically significant results demonstrating the outperformance of the Kelly allocation strategy. The Sharp ratio of the full-leverage strategy is highly left-skewed, due to the many times the strategy goes bankrupt.
Note: Quantitatively, the results shown here are strongly dependent on the parameters chosen in the Monte-Carlo simulation. One may even wonder if I'm not somehow overfitting these parameters in order to arrive at the expected result. However, running the experiment for different sets of parameters yields that the Kelly allocation is systematically the one that shows better performance.
I would like to end this article with a few notes:
- I'm applying the Kelly criterion in a slightly different way than usual. We typically consider the statistics of the returns of a given strategy, or instrument, and then calculate the optimal allocation fraction, which does not vary from one bar to the other. Then the task is to continuously rebalance the portfolio as to maintain that constant allocation fraction. Here, I'm assuming that I am in the presence of a fully probabilistic pricing model which, at each bar, outputs the full probability distribution of the next price return. Given the assumption of independence between these distributions, I'm conjecturing, but not proving, that by calculating the optimal Kelly fraction at each bar I'm still maximizing the long-run expected growth rate of my portfolio, in expectation.
- I constructed the simulations above assuming the returns to be normally distributed. While this is not a terrible assumption, it may not be a good one depending on the situation. The advantage of normally-distributed returns is the close-form solution for the optimal Kelly fraction. For a generic distribution, however, we can perform Monte-Carlo simulations to infer the optimal Kelly fraction.
- By construction, in the controlled experiment described here, I know exactly the probability distribution of returns. In a real application, while we can develop full probabilistic models in complete analogy with the ideas described here, one must consider the uncertainty about the model itself. This may mean reducing, for instance, using a fraction of the optimal Kelly fraction to reduce the risk of model uncertainty.
References:
- Kelly, J. L. (1956). A new interpretation of information rate. The Bell System Technical Journal, 35(4), 917–926. https://doi.org/10.1002/j.1538-7305.1956.tb03809.x
- Thorp, E. O. (2008). The Kelly criterion in blackjack sports betting, and the stock market. In Handbook of asset and liability management (pp. 385–428). Elsevier.
- Cover, T. M. (1999). Elements of information theory. John Wiley & Sons.