Machine learning applied to finance and trading is often regarded with either skepticism and distrust, or as the ultimate tool to squeeze otherwise
hidden or hard to recognize profit opportunities. While I believe the second to be true, financial machine learning does come with complex challenges
often not present in other areas. These include the low signal to noise ratio, serial dependence and hidden look-ahead biases, regime shifts and the complete
lack of time-invariance of the data-generating processes, among others.

Finance and trading are not traditional sciences, due to the inability to perform controlled experiments on real systems. Also, due to the problems state above,
finding statistically meaningful systematic trading strategies is a very difficult exercise. As such, often the best use of machine learning in finance
is in uncovering fundamental relations, rather than hard-to-interpret statistical relations.
It is not my intent here to describe profitable trading strategies. Instead, this page contains a collection of ideas and expositions on the use of
machine learning and statistics in finance and trading, often demonstrated with controlled experiments and synthetic data.
Articles
In this short article, I will show why the out-of-sample performance of optimized models tends to decline, on average, which is a fundamental consequence of regression to the mean. This general shortfall of model optimization becomes more significant in environments of low signal to noise ratio, which is always the case in financial machine learning models.
May 25, 2021
In this article, I will show how the previously described framework of optimal probabilistic clustering can be extended to feature clustering. The problem of feature (or variable) clustering arises in several aspects of systematic trading, like portfolio construction, feature selection, among others.
Apr 11, 2021
In Part I of this series, I introduced an entropy-based approach to optimal clustering. Here, I will introduce the idea of entropy regularization, besides other improvements, that will allow us to deal with more complex and realistic datasets. I will also conduct experiments to assess the performance of the optimal probabilistic clustering framework.
Mar 9, 2021
Often it becomes necessary to quantify the similarity, in terms of information content, between two variables (or features). Here, I describe several quantities that, by satisfying the requirements necessary to become a metric, induce a topological structure on the set of features.
Feb 21, 2021
Clustering is a general class of unsupervised learning tasks with many applications in finance, including portfolio construction, feature selection, regime detection, etc. In part I of this series, I will begin by describing how entropy-based metrics allow for optimal probabilistic clustering as well as the quantification of cluster quality.
Feb 12, 2021
How probabilistic forecasts can be fully leveraged to an optimal allocation using the Kelly criterion
Feb 7, 2021
How information-theoretic quantities allow full-distribution and model-independent feature selection
Feb 6, 2021