Ethics of Rushing COVID19 Risk Prediction Models

This may be an unusual post relative to normal questions, but I would really appreciate a thoughtful discussion on the topic.

In the US, we currently do not have sufficient PCR testing for COVID19. As a result we have to ration testing until we can scale it to needed availability. We do not know when this will be, but likely weeks. I believe our failures are unique among OECD countries, although certainly it affects other parts of the world with limited testing.

Risk Prediction Models:
Risk prediction models are used frequently in medicine, but they are challenging to create & validate. Good models take time to develop, and bad models can cause more harm than good. Prediction models should therefore be subject to testing and validation prior to implementation for use in health care decision making.

Ethical Dilemma:
Their is a growth of patients in the US with suspected COVID19, but insufficient testing for all of them. As a result, many patients with uncertain pre-test probability of disease remain in limbo as to their disease status, and need for isolation. Compounding this effort is that untested patients who dont isolate may worsen the pandemic.

Ethical Question:
Should we rush an unvalidated risk prediction model aimed for optimal specificity to assist in improving post-test probability of predicting disease for patients who are unable to access testing?

Ethical breakdown:
Rushing a model in an emergent situation, i would like to argue, is not unlike FDA emergency provision for untested drugs.

In this situation, a model aimed for high specificity would in theory assist in identifying all patients likely to have COVID19, and encourage isolation. On the other hand, those with false positives will be able to have testing in days or weeks when available, so the potential harm of un-needed self-isolation is hopefully short lived.

However, a bad model, even geared for high specificity, might lead to false re-assurance if it returns a negative prediction. This may be mitigated by not having “negative” results; instead results can be provided on a range from indeterminate to highly likely to have disease.

Counter points:
This post in part is spurred by many statisticians on twitter (whom i greatly respect) voicing concerns against rushing unvalidated prediction scores. They make excellent points, as linked below.

The risk prediction model:
This discussion is actually more than academic. Their is a pre-print for a risk prediction model of COVID19 based on data from China. Should we use these values?

Thanks for any who read this post, thoughts and other opinions would be appreciated. Its rare that an urgent clinical need finds its way here on data methods, so I appreciate peoples time and perspective to ensure we take care of patients in an appropriate and ethical manner.


Outside of the ethical considerations and some of the statistics/methods concerns in their pre-print (small sample size, forward selection, table 3 which is univariable evaluation of predictors, etc) I’m not sure this model would be of much practical use even if it was valid. They simplify their model to a score like this in the end:


This makes it impossible to reach their treshold of 10 or more points if you don’t have either a positive CT scan or close contact with a confirmed case. In a setting where tests are limited, the second one will often be unknown and I’m not sure they will perform a CT for every person with respiratory infection complaints (in my country this only happens for severe cases or situations where the diagnosis is uncertain). And even if they do, this will probably happen less as soon as large numbers of people have to be screened.

As such, this model will indeed give a high score for patients who are very likely to have a COVID19 infection but in essence does nothing more than mimic the decision making doctors already do at the moment I think (respiratory infection symptoms + COVID19 contact = likely COVID19 infection, e.g. a family member of a confirmed case that develops symptoms). In other words, this model probably highlights the individuals with a very high pre-test probability, but does it really add to the decision making we are doing already and is this highlighted group really the group you want to test? If you would decide based on this, you would only test people for which the diagnosis is almost certain (think of the family member example), while I think we are mostly concerned about testing in those for which the diagnosis is uncertain (e.g. contact with COVID19 is unknown/uncertain).


Thanks for starting this discussion, Raj.

I think we should keep in mind that prediction models have the ability to do more harm than good if they are developed poorly. Especially in this situation, where the time and possibilities for validation of these models is very limited together with the possible impact on life-or-death medical decisions, this is not a moment to ignore the standards and guidance that are found in PROBAST, TRIPOD, books (e.g. Harrell’s, Steyerberg’s, Riley’s), and many papers (e.g. see BMJ Methods and Reporting). As I have said in the tweet you quote, I expect we will regret using rushed prediction models that don’t follow the standards. It’s, unfortunately, the life of patients we are talking about here.


Thanks Sebastiaan. I hope everybody sees the possible impact of the dichotomization at >44 y/o. This is embarrassing


Thanks Maarten.

One thing I would add is that none of those guidelines discuss design or use for emergency situations.

For drugs, their are extraordinary circumstances where it is considered ethical to I’ve them without efficacy data.

Should there be a similar exception for risk prediction models?

(Note I agree that the referenced paper is not good)



Completely agree that the score system they provide is clinically useless.

On the other hand, could if we had access to the data, could we create a batter prediction model? Should we?

At the moment, I find the high specificity of CD4, CD8, & NK cells intriguing. There is lots of other evidence that low lymphocyte count is a characteristic of COVID19. In addition, other epidemiologic modelling offers a potential range of values for community prevalence. So as a clinicians, I can combine all this with pt symptoms to sort of gestalt risk of having COVID19, and try to advise pts accordingly.

It seems to me that rather than relying on a gestalt, any modeling based on data (even with a large range of predictions) would be superior.


This is a good question, I don’t know the answer.

I do think there is a difference between drugs and prediction models. In emergency situations, drugs may be essential for patients to survive, this is not the case for prediction models (as far as I know): they are only there to give information to help making decisions.

That said, I can hardly imagine the horrible situation some medical doctors are currently finding themselves in, especially in Italy at the moment. And I can imagine the need for prediction models for help making important medical decisions. So, if we are going to develop such models let us all use the existing expertise and guidance to avoid making avoidable mistakes in development we will later regret. Time is of the essence, I know, but following the expertise and guidance doesn’t need to make it a slow proces.


Thank you Maarten. I completely agree, and would much rather follow the expertise of you an others than whatever may start turning up on Twitter!


It is a shame that sensitivity and specificity were even computed as they lead to downstream errors such as dichotomization. Instead one should attempt to extract full value from basic lab tests by fitting flexible nonlinear models. And temperature should not be dichotomized either.


Not to detract from the discussion, but I love the SHARE feature on Data Methods.

1 Like

For the “Italian situation” … i.e., ICU triage … we have the APACHE model, which was purpose-built for exactly that purpose, and which has 30+ years of history, study and improvement. Unless COVID19 represents acute respiratory symptoms and physiological measures that are novel, APACHE would seem to be a good fit for that catastrophic scenario.

As I understand it, in China where the model discussed above comes from, they’ve set up dedicated “Fever Clinics” as assessment & treatment centers, so their default action is to send all people with indications to a Fever Clinic, and at those facilities they’re doing CT scans at a very high volume. Their model reflects their clinical practice … which is very different from what other places are doing, certainly very different from the current US practice. So their model may not generalize well to other settings.

Cassie Kozyrkov, chief decision scientist at Google, has a really good perspective/framework for considering what I’ll call “advisory, what should we do?” models:

Q1: What is the decision-maker’s default action? i.e., what would s/he do, if a decision has to be made in the next 2 seconds, based on information at hand?

Q2: What information would cause the decision-maker to choose some other action?

Think about which kinds of decision-makers are involved in COVID19 … members of the public at large, people with suspected or known COVID19, clinicians and care givers, health care facility administrators, govt agencies, policy-makers, employers, etc … where would predictive models help people make different/better decisions?

Also worth bearing in mind that some of these point of care models were/are designed with an expectation that the use case will be a checklist, and the clinician involved is going to be adding up the earned points in her/his head. In that case, the model scoring has to be very simple, or it’s not going to be used. As has been famously noted, all models are flawed, some are (nevertheless) useful.


just fyi: they are discussing COVID19 data on the RadStats mailing list (rad=radical).

Eg, from John Bibby:
Gender effects and coronavirus
It seems clear that
A. Coronavirus overwhelmingly kills older people.
B. Older people tend to be female
One would expect therefore to find
C. Coronavirus tends to.kill women.
However the opposite is the case, I.e.
C’. Coronavirus kills men more than women
This suggests quite a strong gender effect in favour of women.



Could be that men tend to be the chief long-range disease vectors, and the social rings of those kinds of players (also largely men) therefore tend to get infected earlier. Consider: international business travelers … strong majority are men. Consider: high-level politicians who travel and mingle widely … e.g., the recent confefe at Mar-A-Lago … tend to be men.

So could be a timing artifact that distorts the data we have to date.

Edit: this is pure speculation. I don’t actually know anything about the demographics of humans as disease vectors.


I didn’t know about APACHE (or ICU triage scores in general) but found this 2011 publication which seems to say exactly that these scores are ineffective in pandemic influenza emergencies.


Great find ! Let’s hope the people who manage ICU capacity are aware of that paper, and any subsequent relevant research.

This is where we need the biochemists, infectious disease experts etc to be part of any model building - especially pre-specifying variables and presenting evidence of possible biological plausibility.

I’d like to add one more Ethical question.
Should consent be waived for all those tested so that large data sets of patient level data becomes available as soon as possible?
**FOR: ** We are our brothers & sisters keeper - the greater good outweighs individual rights.
AGAINST:. Even if anonymised, it may be possible to reconstruct who had a test.

IMHO now is not the time to worry about confidentiality. Now is the time to get answers than help people.

1 Like

I agree with Frank. Their are ethical arguments that supercede confidentiality if someone else’s life is at risk. For example if a patient tells me they want to end the life of another person (and it is a credible threat), our ethical duty as physicians is to report that person and warn the potential victim.

First do no harm.

Well. Regardless of myriad of things to worry about in this topic, it hasn’t stopped this group from pushing out a web based diagnostic predictor AND a mortality predictor: