Overconfident forecasts in infectious disease epidemiology

In 1984, Picard and Cook published a classic paper on cross-validation of regression models (JASA 79: 575-583) in which they defined the Optimism Principle: “a model chosen via some selection process provides a much more optimistic explanation of the data used in its derivation than it does of other data that will arise in a similar fashion”.

Earlier this month, Picard and Osthus posted (on medRxiv) a paper showing that infectious disease epidemiological forecast models can suffer from overconfidence: “the advertised coverage probabilities of [forecast] intervals fell short of their subsequent performances.” They analyze a set of 20 COVID-19 epi forecasting models registered at the CDC’s COVID-19 Forecast Hub , finding that the “statistical evidence for overconfidence is overwhelming.” This is arguably a very timely and very visible example of the Optimism Principle. Furthermore, “there is no apparent relation between purported and actual COVID-19 forecasting ability”. They propose a calibration procedure that uses current feedback about a model’s forecast error to widen forecast intervals in the next forecast.

Link to the paper:

Discussion questions:
(1) Is the authors’ assessment of the overconfidence of epi forecasting models consistent with your experience as an observer or user of such models, or even as a developer of them?
(2) Do policy makers (or the public) interpret the width of forecast intervals as measures of uncertainty? If so, does this affect their decisions or attitudes? In other words, would any such overconfidence matter?
(3) If so, does the alleged poor performance of such models erode policy makers’ (and the public’s) confidence in epi forecast models, and more broadly, their attitudes about “science”?


isnt ‘forecast’ a misnomer? i thought they were merely running simulations to evaluate different scenarios, as scientist might, but then the media deliver this to the public as a forecast? i found the danish modeller in this interview to be very humble and realistic about the value of modelling: Denmark's state modeller: Why we've ended ALL Covid laws - YouTube


Here is my cynical take:
In regards to 2, decision makers will present as overconfident if they can spin a story around the data that expands their discretionary power. They will focus on the uncertainty if someone is attempting to use the data to undermine their power.

In regards to 3, the public will forget quickly, because the media will distract them with positive spin elsewhere.

Here is a nice editorial by researcher Matthew Mercuri (Just follow the science)

In response to questions about policy decisions related to the coronavirus, government officials around the world have invoked the importance of science. Their decisions, we are told, would “follow the science,” or something similar (eg, be “based on,” “led by,” “guided by”).* As a practicing scientist, I would appear remiss to suggest that such a practice is not a good thing.

1 Like

Yes. Shades of “Y2K wasn’t real because nothing bad happened” here. Forecasts which are intended to inform policy with a view to avoiding disaster shouldn’t end up being correct. We want them to become counterfactuals, that’s the whole point.

1 Like

For context, the authors are writing about one-day-ahead forecasts. Probably not enough time for policy decisions to influence outcomes?

No, it’s not. I wasn’t responding to the OP.