Machine learning applied to finance and trading is often regarded with either skepticism and distrust, or as the ultimate tool to squeeze otherwise hidden or hard to recognize profit opportunities. While I believe the second to be true, financial machine learning does come with complex challenges often not present in other areas. These include the low signal to noise ratio, serial dependence and hidden look-ahead biases, regime shifts and the complete lack of time-invariance of the data-generating processes, among others.

Finance and trading are not traditional sciences, due to the inability to perform controlled experiments on real systems. Also, due to the problems state above, finding statistically meaningful systematic trading strategies is a very difficult exercise. As such, often the best use of machine learning in finance is in uncovering fundamental relations, rather than hard-to-interpret statistical relations.

It is not my intent here to describe profitable trading strategies. Instead, this page contains a collection of ideas and expositions on the use of machine learning and statistics in finance and trading, often demonstrated with controlled experiments and synthetic data.

Articles

Regression to the mean and the decline of out-of-sample performance

In this short article, I will show why the out-of-sample performance of optimized models tends to decline, on average, which is a fundamental consequence of regression to the mean. This general shortfall of model optimization becomes more significant in environments of low signal to noise ratio, which is always the case in financial machine learning models.

May 25, 2021
Feature clustering

In this article, I will show how the previously described framework of optimal probabilistic clustering can be extended to feature clustering. The problem of feature (or variable) clustering arises in several aspects of systematic trading, like portfolio construction, feature selection, among others.

Apr 11, 2021
Optimal probabilistic clustering - Part II

In Part I of this series, I introduced an entropy-based approach to optimal clustering. Here, I will introduce the idea of entropy regularization, besides other improvements, that will allow us to deal with more complex and realistic datasets. I will also conduct experiments to assess the performance of the optimal probabilistic clustering framework.

Mar 9, 2021
Metrics for feature distance

Often it becomes necessary to quantify the similarity, in terms of information content, between two variables (or features). Here, I describe several quantities that, by satisfying the requirements necessary to become a metric, induce a topological structure on the set of features.

Feb 21, 2021
Optimal probabilistic clustering - Part I

Clustering is a general class of unsupervised learning tasks with many applications in finance, including portfolio construction, feature selection, regime detection, etc. In part I of this series, I will begin by describing how entropy-based metrics allow for optimal probabilistic clustering as well as the quantification of cluster quality.

Feb 12, 2021
Trading with the Kelly criterion

How probabilistic forecasts can be fully leveraged to an optimal allocation using the Kelly criterion

Feb 7, 2021
Mutual information for feature selection

How information-theoretic quantities allow full-distribution and model-independent feature selection

Feb 6, 2021

Articles

Regression to the mean and the decline of out-of-sample performance

Feature clustering

Optimal probabilistic clustering - Part II

Metrics for feature distance

Optimal probabilistic clustering - Part I

Trading with the Kelly criterion

Mutual information for feature selection