Historical and Recent Trends in the Forecasting Literature

Talk for the WARN-D Friday Science Meeting

Björn Siepe

Psychological Methods Lab, University of Marburg

February 16, 2024

Introduction

Why talk about this topic?

the secret reason:

Ok but really, why should you care?

“The quiet revolution of numerical weather prediction” (Bauer, Thorpe, and Brunet 2015):

Forecasting Competitions

History

Critical statisticians

  • We should be looking for the true model!

  • Maybe you do not know how to model with ARIMA…

  • I suspect it is more likely to depend on the skill of the analyst … these authors are more at home with simple procedures than with Box-Jenkins. (Chatfield, based on Hyndman (2020)

Makridakis & Hibon (1979)

  • Our Empirical evidence disagrees

  • Of course we do

  • might be useful for Dr. Chatfield to read some of the psychological literature quoted in the main paper, and he can then learn a little more about biases

M-Competitions

Competition Year N° Time Series Insights/Novelty
M1 1982 1001
  • Simple models work well
  • Combining forecasts works well
  • Changed forecasting forever
M2 1993 29
  • Not really relevant
M3 2000 3003
  • Somewhat simple models work well with modifications
M4 2020 100,000
  • Combination of ML and Stats works well, pure ML not

  • Probabilistic Prediction

M5 2021 42,000
  • Hierarchical Time Series

  • Ensembles + Pure ML works well

M6 2022 100
  • … to be continued

Utility of Competitions

Advantages

  • Empirical evidence
  • Benchmarking
  • Methodological development and cumulative science (Fildes and Ord 2004)

Issues

Model Selection, Uncertainty & Combination

Model selection

“Typical” Workflow:

flowchart LR
    A(Use fancy models) --> B(Select best one)
    B --> C(Forecast)
    C --> D[Profit]

Problem?

  • Use simple models as benchmarks (see M-competitions, Makridakis, Spiliotis, and Assimakopoulos (2018)

Selecting a single model:

  • ignores uncertainty about this selection (Kaplan 2021)
  • tends to perform poorly in many forecasting settings (Chatfield 1996)

Model combination

‘Law’ of forecast combination

‘The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. In many cases one can make dramatic performance improvements by simply averaging the forecasts.’ (Clemen 1989)

Ensemble modeling

Two sources of uncertainty (Gneiting and Katzfuss 2014):

Met Office UK & Bauer, Thorpe, and Brunet (2015)

Relevance for psychology

Probabilistic Forecasting

Probabilistic forecasting methods

Long history of probabilistic weather forecasting

‘The probability of rain was much smaller than at other times.’ (Dalton, 1793, based on Murphy (1998)

Relevance for psychology

Combining Information

Extending mixed models

Flexible Mixed Model

From

\[ \hat{y} = \underbrace{X\beta}_{\text{fixed effects}} + \underbrace{Z\upsilon}_{\text{random effects}} \]

to

\[ \hat{y} = \overbrace{ml_{fixed}(X)}^{\text{fixed effects}}+\overbrace{Z\upsilon}^{\text{random effects}} \]

(Kilian, Ye, and Kelava 2023)

Relevance for Psychology

  • Improving on what we already have
  • Current combinations of ‘idiographic’ and ‘nomethetic’ forecasts: often ad-hoc

Summary

Takeaways

  1. Forecasting competitions can lead to methodological improvement
  2. Model uncertainty and model combination are integral parts of forecasting
  3. Probabilistic forecasting is important for decision making
  4. Combining predictions is challenging, but lots of cool stuff in the making

giphy.com/goodfortunesonly

giphy.com/goodfortunesonly

Me

Feel

free

to

contact

me

bjoern.siepe@uni-marburg.de

Slides at bsiepe.github.io/talks

References

Aastveit, Knut Are, James Mitchell, Francesco Ravazzolo, and Herman van Dijk. 2018. “The Evolution of Forecast Density Combinations in Economics.” Tinbergen Institute, Amsterdam and Rotterdam. 2018.
Bauer, Peter, Alan Thorpe, and Gilbert Brunet. 2015. “The Quiet Revolution of Numerical Weather Prediction.” Nature 525 (7567, 7567): 47–55. https://doi.org/10.1038/nature14956.
Begoli, Edmon, Tanmoy Bhattacharya, and Dimitri Kusnezov. 2019. “The Need for Uncertainty Quantification in Machine-Assisted Medical Decision Making.” Nature Machine Intelligence 1 (1, 1): 20–23. https://doi.org/10.1038/s42256-018-0004-1.
Boylan, John E., Paul Goodwin, Maryam Mohammadipour, and Aris A. Syntetos. 2015. “Reproducibility in Forecasting Research.” International Journal of Forecasting 31 (1): 79–90. https://doi.org/10.1016/j.ijforecast.2014.05.008.
Chatfield, Chris. 1996. “Model Uncertainty and Forecast Accuracy.” Journal of Forecasting 15 (7): 495–508. https://doi.org/10.1002/(SICI)1099-131X(199612)15:7<495::AID-FOR640>3.0.CO;2-O.
Chen, Irene Y., Shalmali Joshi, Marzyeh Ghassemi, and Rajesh Ranganath. 2021. “Probabilistic Machine Learning for Healthcare.” Annual Review of Biomedical Data Science 4 (1): 393–415. https://doi.org/10.1146/annurev-biodatasci-092820-033938.
Clemen, T. 1989. “Combining Forecasts: A Review and Annotated Bibliography.” International Journal of Forecasting 5 (4): 559–83.
Dorie, Vincent, George Perrett, Jennifer L. Hill, and Benjamin Goodrich. 2022. “Stan and BART for Causal Inference: Estimating Heterogeneous Treatment Effects Using the Power of Stan and the Flexibility of Machine Learning.” Entropy 24 (12): 1782. https://doi.org/10.3390/e24121782.
Fildes, Robert, and Keith Ord. 2004. “Forecasting Competitions: Their Role in Improving Forecasting Practice and Research.” In, edited by Michael P. Clements and David F. Hendry, 1st ed. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470996430.
Gneiting, Tilmann, and Matthias Katzfuss. 2014. “Probabilistic Forecasting.” Annual Review of Statistics and Its Application 1 (1): 125–51. https://doi.org/10.1146/annurev-statistics-062713-085831.
Hastie, Trevor, Jerome H. Friedman, and Robert Tibshirani. 2017. The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Second Edition, corrected at 12th printing. New York, NY, US: Springer.
Hyndman, Rob J. 2020. “A Brief History of Forecasting Competitions.” International Journal of Forecasting, M4 Competition, 36 (1): 7–14. https://doi.org/10.1016/j.ijforecast.2019.03.015.
Kaplan, David. 2021. “On the Quantification of Model Uncertainty: A Bayesian Perspective.” Psychometrika 86 (1): 215–38. https://doi.org/10.1007/s11336-021-09754-5.
Kilian, Pascal, Sangbeak Ye, and Augustin Kelava. 2023. “Mixed Effects in Machine Learning – A Flexible mixedML Framework to Add Random Effects to Supervised Machine Learning Regression.” Transactions on Machine Learning Research.
Koning, Alex J., Philip Hans Franses, Michèle Hibon, and H. O. Stekler. 2005. “The M3 Competition: Statistical Tests of the Results.” International Journal of Forecasting 21 (3): 397–409. https://doi.org/10.1016/j.ijforecast.2004.10.003.
Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. “Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward.” PLOS ONE 13 (3): e0194889. https://doi.org/10.1371/journal.pone.0194889.
Makridakis, Spyros, Evangelos Spiliotis, Vassilios Assimakopoulos, Zhi Chen, Anil Gaba, Ilia Tsetlin, and Robert L. Winkler. 2022. “The M5 Uncertainty Competition: Results, Findings and Conclusions.” International Journal of Forecasting, Special issue: M5 competition, 38 (4): 1365–85. https://doi.org/10.1016/j.ijforecast.2021.10.009.
Murphy, Allan H. 1998. “The Early History of Probability Forecasts: Some Extensions and Clarifications.” Weather and Forecasting 13 (1): 5–15. https://doi.org/10.1175/1520-0434(1998)013<0005:TEHOPF>2.0.CO;2.
Petropoulos, Fotios, Rob J. Hyndman, and Christoph Bergmeir. 2018. “Exploring the Sources of Uncertainty: Why Does Bagging for Time Series Forecasting Work?” European Journal of Operational Research 268 (2): 545–54. https://doi.org/10.1016/j.ejor.2018.01.045.
Salditt, Marie, Sarah Humberg, and Steffen Nestler. 2023. “Gradient Tree Boosting for Hierarchical Data.” Multivariate Behavioral Research, January, 1–27. https://doi.org/10.1080/00273171.2022.2146638.
Sigrist, Fabio. 2022. “Gaussian Process Boosting.” Journal of Machine Learning Research 23: 1–46.
Strobl, Carolin, and Friedrich Leisch. 2022. “Against the ‘One Method Fits All Data Sets’ Philosophy for Comparison Studies in Methodological Research.” Biometrical Journal Advance Online Publication. https://doi.org/10.1002/bimj.202200104.
Wang, Xiaoqian, Rob J. Hyndman, Feng Li, and Yanfei Kang. 2023. “Forecast Combinations: An over 50-Year Review.” International Journal of Forecasting 39 (4): 1518–47. https://doi.org/10.1016/j.ijforecast.2022.11.005.
Wörtwein, Torsten, Nicholas Allen, Lisa B. Sheeber, Randy P. Auerbach, Jeffrey F. Cohn, and Louis-Philippe Morency. 2023. “Neural Mixed Effects for Nonlinear Personalized Predictions.” In INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 445–54. https://doi.org/10.1145/3577190.3614115.
Yao, Yuling, Gregor Pirš, Aki Vehtari, and Andrew Gelman. 2022. “Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful.” Bayesian Analysis 17 (4): 1043–71. https://doi.org/10.1214/21-BA1287.
Yao, Yuling, Aki Vehtari, Daniel Simpson, and Andrew Gelman. 2018. “Using Stacking to Average Bayesian Predictive Distributions (with Discussion).” Bayesian Analysis 13 (3). https://doi.org/10.1214/17-BA1091.

Appendix

Interesting packages

#|eval: false
library(forecast)    # OG package for time series forecasting
library(stan4bart)   # full-on probabilistic, multilevel, model combination
library(forecastHybrid)  # for model combination
library(Mcomp)       # data from m competitions
library(gpboost)     # xgboost-like, but probabilistic and multilevel
#|eval: false
import sktime   # for time series forecasting with sklearn-like interface
import skpro    # for probabilistic forecasting 
import gluonts  # for probabilistic forecasting, including deep-learning
import ngboost  # xgboost-like, but probabilistic!
import mapie    # model-agnostic, conformal prediction
import mlforecast # machine learning forecasting