Historical and Recent Trends in the Forecasting Literature

Talk for the WARN-D Friday Science Meeting

Björn Siepe

bjoern.siepe@uni-marburg.de

Psychological Methods Lab, University of Marburg

February 16, 2024

Introduction

Why talk about this topic?

the secret reason:

Ok but really, why should you care?

“The quiet revolution of numerical weather prediction” (Bauer, Thorpe, and Brunet 2015):

Forecasting Competitions

History

Critical statisticians

We should be looking for the true model!
Maybe you do not know how to model with ARIMA…
I suspect it is more likely to depend on the skill of the analyst … these authors are more at home with simple procedures than with Box-Jenkins. (Chatfield, based on Hyndman (2020)

Makridakis & Hibon (1979)

Our Empirical evidence disagrees
Of course we do
might be useful for Dr. Chatfield to read some of the psychological literature quoted in the main paper, and he can then learn a little more about biases

M-Competitions

Competition	Year	N° Time Series	Insights/Novelty
M1	1982	1001	Simple models work well Combining forecasts works well Changed forecasting forever
M2	1993	29	Not really relevant
M3	2000	3003	Somewhat simple models work well with modifications
M4	2020	100,000	Combination of ML and Stats works well, pure ML not Probabilistic Prediction
M5	2021	42,000	Hierarchical Time Series Ensembles + Pure ML works well
M6	2022	100	… to be continued

Utility of Competitions

Advantages

Empirical evidence
Benchmarking
Methodological development and cumulative science (Fildes and Ord 2004)

Issues

arbitrary conditions
vibe-based analysis of results (Koning et al. 2005)
Questionable generalizability (Fildes and Ord 2004)
Reproducibility Issues? (Boylan et al. 2015)
‘Winner takes it all’? (see Strobl and Leisch (2022)

Model Selection, Uncertainty & Combination

Model selection

“Typical” Workflow:

flowchart LR
    A(Use fancy models) --> B(Select best one)
    B --> C(Forecast)
    C --> D[Profit]

Problem?

Use simple models as benchmarks (see M-competitions, Makridakis, Spiliotis, and Assimakopoulos (2018)

Selecting a single model:

ignores uncertainty about this selection (Kaplan 2021)
tends to perform poorly in many forecasting settings (Chatfield 1996)

Model combination

‘Law’ of forecast combination

‘The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. In many cases one can make dramatic performance improvements by simply averaging the forecasts.’ (Clemen 1989)

Many ways to combine forecasts (Wang et al. 2023; Aastveit et al. 2018):
- Simple averaging
- Bayesian approaches (Yao et al. 2022; Yao et al. 2018; Dorie et al. 2022)
- Machine learning approaches (ensemble learning, trees, bagging, stacking, deep learning, etc.) (Petropoulos, Hyndman, and Bergmeir 2018; Hastie, Friedman, and Tibshirani 2017)

Ensemble modeling

Two sources of uncertainty (Gneiting and Katzfuss 2014):

Met Office UK & Bauer, Thorpe, and Brunet (2015)

Relevance for psychology

Simple models as benchmarks (Makridakis, Spiliotis, and Assimakopoulos 2018; Makridakis et al. 2022)
Model uncertainty often neglected in inferential and predictive modeling (Kaplan 2021)

Probabilistic Forecasting

Probabilistic forecasting methods

Long history of probabilistic weather forecasting

‘The probability of rain was much smaller than at other times.’ (Dalton, 1793, based on Murphy (1998)

Relevance for psychology

Practically linked to decision theory in health settings, e.g., for JITAIs (Begoli, Bhattacharya, and Kusnezov 2019; Chen et al. 2021)

Combining Information

Extending mixed models

use of random effects gained attention in machine learning literature (Salditt, Humberg, and Nestler 2023; Sigrist 2022; Kilian, Ye, and Kelava 2023; Wörtwein et al. 2023)

Flexible Mixed Model

From

\[ \hat{y} = \underbrace{X\beta}_{\text{fixed effects}} + \underbrace{Z\upsilon}_{\text{random effects}} \]

\[ \hat{y} = \overbrace{ml_{fixed}(X)}^{\text{fixed effects}}+\overbrace{Z\upsilon}^{\text{random effects}} \]

(Kilian, Ye, and Kelava 2023)

Relevance for Psychology

Improving on what we already have
Current combinations of ‘idiographic’ and ‘nomethetic’ forecasts: often ad-hoc

Summary

Takeaways

Forecasting competitions can lead to methodological improvement
Model uncertainty and model combination are integral parts of forecasting
Probabilistic forecasting is important for decision making
Combining predictions is challenging, but lots of cool stuff in the making

Me

Feel

free

contact

bjoern.siepe@uni-marburg.de

Slides at bsiepe.github.io/talks

References

Aastveit, Knut Are, James Mitchell, Francesco Ravazzolo, and Herman van Dijk. 2018. “The Evolution of Forecast Density Combinations in Economics.” Tinbergen Institute, Amsterdam and Rotterdam. 2018.

Bauer, Peter, Alan Thorpe, and Gilbert Brunet. 2015. “The Quiet Revolution of Numerical Weather Prediction.” Nature 525 (7567, 7567): 47–55. https://doi.org/10.1038/nature14956.

Begoli, Edmon, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. “The Need for Uncertainty Quantification in Machine-Assisted Medical Decision Making.” Nature Machine Intelligence 1 (1, 1): 20–23. https://doi.org/10.1038/s42256-018-0004-1.

Boylan, John E., Paul Goodwin, Maryam Mohammadipour, and Aris A. Syntetos. 2015. “Reproducibility in Forecasting Research.” International Journal of Forecasting 31 (1): 79–90. https://doi.org/10.1016/j.ijforecast.2014.05.008.

Chatfield, Chris. 1996. “Model Uncertainty and Forecast Accuracy.” Journal of Forecasting 15 (7): 495–508. https://doi.org/10.1002/(SICI)1099-131X(199612)15:7<495::AID-FOR640>3.0.CO;2-O.

Chen, Irene Y., Shalmali Joshi, Marzyeh Ghassemi, and Rajesh Ranganath. 2021. “Probabilistic Machine Learning for Healthcare.” Annual Review of Biomedical Data Science 4 (1): 393–415. https://doi.org/10.1146/annurev-biodatasci-092820-033938.

Clemen, T. 1989. “Combining Forecasts: A Review and Annotated Bibliography.” International Journal of Forecasting 5 (4): 559–83.

Dorie, Vincent, George Perrett, Jennifer L. Hill, and Benjamin Goodrich. 2022. “Stan and BART for Causal Inference: Estimating Heterogeneous Treatment Effects Using the Power of Stan and the Flexibility of Machine Learning.” Entropy 24 (12): 1782. https://doi.org/10.3390/e24121782.

Fildes, Robert, and Keith Ord. 2004. “Forecasting Competitions: Their Role in Improving Forecasting Practice and Research.” In, edited by Michael P. Clements and David F. Hendry, 1st ed. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470996430.

Gneiting, Tilmann, and Matthias Katzfuss. 2014. “Probabilistic Forecasting.” Annual Review of Statistics and Its Application 1 (1): 125–51. https://doi.org/10.1146/annurev-statistics-062713-085831.

Hastie, Trevor, Jerome H. Friedman, and Robert Tibshirani. 2017. The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Second Edition, corrected at 12th printing. New York, NY, US: Springer.

Hyndman, Rob J. 2020. “A Brief History of Forecasting Competitions.” International Journal of Forecasting, M4 Competition, 36 (1): 7–14. https://doi.org/10.1016/j.ijforecast.2019.03.015.

Kaplan, David. 2021. “On the Quantification of Model Uncertainty: A Bayesian Perspective.” Psychometrika 86 (1): 215–38. https://doi.org/10.1007/s11336-021-09754-5.

Kilian, Pascal, Sangbeak Ye, and Augustin Kelava. 2023. “Mixed Effects in Machine Learning – A Flexible mixedML Framework to Add Random Effects to Supervised Machine Learning Regression.” Transactions on Machine Learning Research.

Koning, Alex J., Philip Hans Franses, Michèle Hibon, and H. O. Stekler. 2005. “The M3 Competition: Statistical Tests of the Results.” International Journal of Forecasting 21 (3): 397–409. https://doi.org/10.1016/j.ijforecast.2004.10.003.

Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. “Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward.” PLOS ONE 13 (3): e0194889. https://doi.org/10.1371/journal.pone.0194889.

Makridakis, Spyros, Evangelos Spiliotis, Vassilios Assimakopoulos, Zhi Chen, Anil Gaba, Ilia Tsetlin, and Robert L. Winkler. 2022. “The M5 Uncertainty Competition: Results, Findings and Conclusions.” International Journal of Forecasting, Special issue: M5 competition, 38 (4): 1365–85. https://doi.org/10.1016/j.ijforecast.2021.10.009.

Murphy, Allan H. 1998. “The Early History of Probability Forecasts: Some Extensions and Clarifications.” Weather and Forecasting 13 (1): 5–15. https://doi.org/10.1175/1520-0434(1998)013<0005:TEHOPF>2.0.CO;2.

Petropoulos, Fotios, Rob J. Hyndman, and Christoph Bergmeir. 2018. “Exploring the Sources of Uncertainty: Why Does Bagging for Time Series Forecasting Work?” European Journal of Operational Research 268 (2): 545–54. https://doi.org/10.1016/j.ejor.2018.01.045.

Salditt, Marie, Sarah Humberg, and Steffen Nestler. 2023. “Gradient Tree Boosting for Hierarchical Data.” Multivariate Behavioral Research, January, 1–27. https://doi.org/10.1080/00273171.2022.2146638.

Sigrist, Fabio. 2022. “Gaussian Process Boosting.” Journal of Machine Learning Research 23: 1–46.

Strobl, Carolin, and Friedrich Leisch. 2022. “Against the ‘One Method Fits All Data Sets’ Philosophy for Comparison Studies in Methodological Research.” Biometrical Journal Advance Online Publication. https://doi.org/10.1002/bimj.202200104.

Wang, Xiaoqian, Rob J. Hyndman, Feng Li, and Yanfei Kang. 2023. “Forecast Combinations: An over 50-Year Review.” International Journal of Forecasting 39 (4): 1518–47. https://doi.org/10.1016/j.ijforecast.2022.11.005.

Wörtwein, Torsten, Nicholas Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, and Louis-Philippe Morency. 2023. “Neural Mixed Effects for Nonlinear Personalized Predictions.” In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 445–54. https://doi.org/10.1145/3577190.3614115.

Yao, Yuling, Gregor Pirš, Aki Vehtari, and Andrew Gelman. 2022. “Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful.” Bayesian Analysis 17 (4): 1043–71. https://doi.org/10.1214/21-BA1287.

Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13 (3). https://doi.org/10.1214/17-BA1091.

Appendix

#|eval: false
library(forecast)    # OG package for time series forecasting
library(stan4bart)   # full-on probabilistic, multilevel, model combination
library(forecastHybrid)  # for model combination
library(Mcomp)       # data from m competitions
library(gpboost)     # xgboost-like, but probabilistic and multilevel

#|eval: false
import sktime   # for time series forecasting with sklearn-like interface
import skpro    # for probabilistic forecasting 
import gluonts  # for probabilistic forecasting, including deep-learning
import ngboost  # xgboost-like, but probabilistic!
import mapie    # model-agnostic, conformal prediction
import mlforecast # machine learning forecasting