Bayesian Ensembles of Binary-Event Forecasts

Casey Lichtendahl

Tuesday, 2/21/17
12pm – 1:30 pm

Abstract: Many firms face critical decisions that rely on forecasts of binary events—events such as whether a borrower will default on a loan or not. In these situations, firms often gather forecasts from multiple experts or models. This raises the question of how to aggregate the forecasts. Because linear combinations of probability forecasts are known to be underconfident, we introduce a class of aggregation rules, called Bayesian ensembles, that are non-linear in the experts’ probabilities. These ensembles are generalized additive models of experts’ probabilities. These models have three key properties. They are coherent, i.e., consistent with the Bayesian view. They can aggregate calibrated or miscalibrated forecasts. And they tend to be more extreme, and therefore more confident, than the commonly used linear opinion pool. Empirically, we demonstrate that our ensemble can be easily fit to real data using a generalized linear model framework. We use this framework to aggregate several forecasts of loan defaults in the Fannie Mae single-family loan performance data. The forecasts come from several leading machine-learning algorithms. Our Bayesian ensemble offers an improvement out-of-sample over the linear opinion pool and over any one of the individual machine learning algorithms considered, two of which—the random forest and extreme gradient boosted trees—are already ensembles themselves.