Determining post-test probability of Covid-19

Marginalizing over covariates will make the predictions depend on the covariate distributions in the clinical population and will not recognize the existence of high-risk persons.

I agree that marginalizing over covariates leads to subpar predictions for certain individuals, which is how I understand the last point made, although I am not certain I am understanding correctly (apologies if not). I also agree that it would be best to condition on the covariates, and I agree that a study in which all covariates are collected would be ideal.

However, it is important also to prevent high risk persons from being exposed to covid. High risk persons are more likely to be exposed when someone with a false negative test result continues about their daily activities. I think that a model that marginalizes over covariates, giving us the rules above, might partially prevent this. Although a model that conditions on covariates will prevent it even more, I think that the marginal model might still help.

Very interesting; I had not thought enough about how self-knowledge of being at high risk would alter behavior thus covering up the high-risk effects.

1 Like

Yes, it is as if, if we let z be covariates, the post-test probability becomes a pre-covariate probability, and then the following post-covariate probability is “guesstimated,” p(dz|z,test) = p(z|test,dz)p(dz|test)/p(z|test).

For example, if one is taking a test to check whether they are still contagious, they might test negative, but implicitly adjust by the fact that they had covid and then recovered (ie, some time has elapsed since symptoms resolved and it is known that the disease is most transmissible pre-symptoms). Similarly, if one has had a sick contact, they can adjust accordingly.

This type of rough adjustment is not ideal, and it would be helpful to have cohort data for the covariates z to figure out how to adjust precisely, but this type of adjustment at least approximates this, and it could prevent a person who is high-risk for having covid from putting false confidence in a negative test and therefore spreading the disease.

—

If one were to use the linked tool, a lingering question to me revolves around p(dz). I have drawn the plots that link p(dz) to p(dz|test), but I still don’t know what to tell a user in terms of setting p(dz).

We have the standard post-test probability function ,p(dz|test)=p(test|dz)p(dz)/p(test). Often it is thought of as f(x)=p(test|dz)x/p(test) where 0\leq x \leq 1. For example, this is how it is conceptualized in the Fagan nomogram, which I essentially reproduced in the plots (with the likelihood ratio fixed according to the covid test characteristics). In this conceptualization, we can just arbitrarily set x and obtain a post-test probability.

Conceptualizing x as a “free” variable, however, is not really valid. The information behind x can cause Bayes rule to break down. If one wants to set p(dz) and one accounts for information z, it is not anymore p(dz), it is p(dz|z), and then we are no longer estimating p(dz|test) but p(dz|test,z), and even more we have implicitly written that p(dz|test,z)=p(test|dz)p(dz|z)/p(test), which is false unless p(test|dz)=p(test|dz,z).

The latter is true when test is independent of z, and, in this case, we can account for covariates z directly in the pretest, without having to implicitly adjust later. I believe this independence is the ideal on which the use of Bayes rule and unconditional sens/spec is built. In other words, the ideal that sens/spec are functions only of the test might be able to be restated as “tests are independent of covariates besides the disease.” If we had a full cohort study with z, test and dz for each subject, we could prove or disprove this independence. Already for example Prince-Guerra et al., 2021 disproved this independence for z=symptoms. However, if we do not have this cohort data, but we still wish to obtain p(dz|test) and then perform an informal adjustment for covariates as described above, where does this leave us?

How we set p(dz) seems to me to depend on how one thinks about it, which is odd. I can imagine, if we sample a random person from a population, then p(dz) should be roughly equal to the population prevalence of the disease. This occurs in screening, in theory. Somehow, intuitively, it seems like this is reasonable; we are sampling this individual unconditionally, even though the sampling depends on the population — it conditions on the population. This however would in a sense allow us to have different “unconditional” p(dz=1), for different times of year for example or geographic locations, even though we are surreptitiously conditioning on then year or location. Then, the idea that we might have a plot for p(dz=1|test=1) for varying p(dz=1) makes sense. So the plots are reasonable.

However, if I am wondering whether I have covid, but I am not allowed to use any information as a guide, ie, p(dz=1) must be unconditional, I think that I would feel most comfortable setting p(dz=1)=0.5. This represents total uncertainty. In this case, p(dz=1) is fixed at 0.5. We are constrained to have only one input for p(dz=1), and the idea of a nomogram that gives us different post-test probabilities for different p(dz=1) no longer makes sense. In this case, it is better to just return a single number, p(dz|test)=p(test|dz)0.5/p(test).

So with these two possibilities, it is unclear how to recommend to set p(dz=1).

Another way to justify the second way is that we do not know the true p_0(dz=1) but we know that it is between 0 and 1, and we can set our best guess to be p(dz=1)= argmin_{\theta}\int_0^1 (\theta-x)^2 dx =0.5.

Without reading the details you seem to be focusing on my biggest question about the simple use of Bayes’ rule: defining “prevalence”. That’s why a nice cohort study is needed. Direct use of a logistic model in a large cohort doesn’t need prevalence.

2 Likes

Thank you for an interesting discussion that I had missed until now because I have been engrossed with completing the latest edition of a book. My understanding is that the first step in estimating diagnostic probabilities is to formulate a series of ‘sufficient’ diagnostic criteria. The next step is to incorporate them into differential diagnoses (e.g. of upper respiratory symptoms (URTS) when the probability of each possibility conditional on URTS would become a ‘prior’. The sufficient criteria can also be used as screening tests results in asymptomatic people to ‘detect’ those with the diagnosis (who almost certainly have the ‘disease’). If the positive result of a test is a sufficient criterion then the post test probability is automatically one. However, this does not guarantee that they have the ‘disease’; it only justifies the diagnosis (i.e. assuming that they have the disease) and acting appropriately. This is an extract from the first chapter of the above book, the 4th edition of the Oxford Handbook of Clinical Diagnosis that I have just submitted to Oxford University Press, where I go into more detail:

How are new diseases discovered and diagnostic criteria formed?

We are alerted to a new disease by unpleasant new outcomes, often preceded by new symptoms and other findings. Covid-19 was a vivid example, where suddenly many people became very ill or died with acute hypoxaemic respiratory failure (AHRF) following acute upper respiratory tract (URT) symptoms similar to the common cold. In addition, large numbers of people in many countries developed these outcomes. This led to a hypothesis that a new virus was responsible.

The resulting studies identified the RNA structure of SARS-COV-2, and from this, the development of RT-PCR and lateral flow device (LFD) tests to detect the virus’s RNA fragments in upper respiratory tract swabs and the development of vaccines. It was assumed on theoretical grounds that nothing other than SARS-COV-2 could cause positive RT-PCR and LFD tests so these were adopted as ‘sufficient’ criteria for Covid-19 that were positive only in those in some phase of the disease and an indication to ask those with a positive result to self-isolate to avoid infecting others in case they were excreting the virus.

The ‘diagnostic lead’ of acute URT symptoms now had an additional differential diagnosis to that of a common cold etc in the form of Covid-19. It can be diagnosed (i.e., assumed to be present) with a positive RT-PCR or a positive LFD test. Either of the latter is sufficient to justify assuming the diagnosis but a negative result does not exclude the diagnosis from the list as the virus fragments cannot be assumed to be excreted and detected in all phases of the illness. Other sufficient criteria can also be used instead such as a characteristic appearance on a pulmonary CT scan or ‘acute URT symptoms with hypoxia on pulse oximetry’, the latter also being an indication for urgent hospital admission. Therefore it is not ‘necessary’ to have a positive PCR or LFD result for the diagnosis; other sufficient criteria can be used instead sometimes. So a positive PCR or LFD result is ‘sufficient’ but not ‘necessary’ for the diagnosis. If a single test is both ‘necessary’ and ‘sufficient’ then it will identify all those and only those with for whom a diagnosis can be used, and be ‘definitive’. Such tests are rare.

With these criteria, it now becomes possible to estimate the proportion of patients with acute URTI symptoms who turn out to have the diagnosis of Covid-19 (i.e., an assumption that they have the disease of Covid-19). It also becomes possible to estimate the frequencies of other symptoms, signs, and test results or their or frequency distributions in those with assumed Covid-19, thus building up a picture of its findings (e.g., loss of taste) and outcomes.

Experience and scientific theories allows us to set up provisional sufficient criteria for confirming diagnoses and offering treatments. This also allows us to make observations on other patients to estimate the probabilities of differential diagnoses or likelihoods of findings in those with such diagnoses as explained in Chapter 13

The risk (or posterior probability of passing the virus on and reducing that risk with self isolation) might be established with a cluster RCT using reverse contact tracing to identify how many contacts contracted the disease from those testing RT-PCR and LFD positive and negative (and by making many strong assumptions and solving some simultaneous equations!). I have drafted a ‘kite flying’ paper along these lines [ 1808.09169.pdf (arxiv.org) ] and would be grateful to @f2harrell and you for your views on this!

I agree with Frank that the best way forward is to estimate ‘posterior’ probabilities directly (e.g. by counting the number of patients in a cohort with URTS who turn out to have one or more of the sufficient criteria of Covid-19). If numerical results are available then this can be done using logistic regression. Having set up the list of differential diagnoses of URTS, then other contextual information (e.g. the presence of a Covid-19 epidemic) changes the posterior probabilities of the list, so that a newly most probable diagnosis of Covid-19 can be confirmed with a positive PCR or LFD or low oxygen saturation on finger pulse oximetry.

2 Likes

This is exceptionally helpful.

is the classical approach, and I’m not sure that this indirectness is needed.

1 Like

By ‘indirectness’, do you mean distinguishing between diagnosis and disease? Or is it the implication that a ‘prior’ probability (e.g. Covid-19 conditional on URTS) has to be combined with an ‘indirect’ likelihood (e.g. of loss of taste in someone with Covid-19) to give an ‘indirect’ posterior probability of Covid-19 conditional on a URTS and loss of taste)?

The full sentence by me in your quote refers to screening so would I be correct in assuming that your comment was related to the way a diagnostically sufficient test result implies the presence of a disease indirectly via a diagnosis? You say that you are not sure why this indirectness is needed, implying that the test result can be used to imply the disease directly. In most cases a screening test does not provide such a diagnostically sufficient criterion but a list of differential diagnoses of which one (e.g. breast cancer) is the target of the screening. The other differential diagnoses could include an innocent lump requiring no intervention.

A disease is a pattern of sensations and observations experienced by patients and others that change in severity over time. Diseases therefore vary in severity and are more often mild in those being screened for and more difficult to identify. Diseases are on a spectrum from mild disease that is self limiting needing no intervention to severe ill health that does need intervention. The question of what level of severity to place the threshold for labelling the patient is not easy. This threshold becomes part of a sufficient criterion for a diagnosis imposed on the disease by diagnosticians.

A diagnosis is a label (based partly on a threshold of illness severity) and a theory that offers an explanation for what has been observed already and a hypothesis about the nature of a patient’s illness or disease and what may happen in future with and without interventions of various kinds. In practical terms, the diagnosis offers a menu of possible actions to be taken, the appropriateness of the latter often depending on RCTs and disease severity that suggests an absolute risk reduction in outcomes.

In conclusion, the diagnosis is an assumption that a particular disease is present as justified by diagnostic criterion. The latter may be sufficient, necessary or both (when the criterion is definitive). When the disease is severe and its outcomes known, then I agree that a test result can also imply the presence of the disease directly. Is this what you meant?

I think you interpret me correctly. The main things I was trying to get at are

  • define the diagnosis and patient signs/symptoms/risk factors solidly (even better measure the severity of the diagnosis)
  • get a cohort study with a lot of diversity of risk factors etc.
  • fit a logistic binary or ordinal model to estimate the probability of disease given patient characteristics, directly
  • dispense with hard-to-define quantities such as prevalence, sensitivity, specificity (hard to define because of strange conditioning)

Thank you. I agree with what you suggest. In the reverse order to your bullet points:

• I agree completely that prevalence, sensitivity, specificity etc are hard to define and should be dispensed with. Instead, we should work with probabilities of outcomes conditional on evidence such as symptoms, signs, test results etc. In the differential diagnostic setting, each symptom etc will be associated with a list of mutually exclusive diagnoses as ‘outcomes’. The Oxford Handbook of Clinical Diagnosis (OHCD) provides hundreds of these lists with an outline of sufficient diagnostic criteria to confirm each diagnosis. However, if a diagnosis is present in the list of more than one symptom for example, then that diagnosis becomes more probable to a diagnostician. If only one diagnosis is common to a number of lists, then that is the most probable diagnosis conditional on those multiple symptoms. I have developed a derivation of Bayes rule using dependence assumptions that models the above reasoning using probabilities of diagnoses conditional on individual findings (not the traditional likelihood of each finding conditional on a diagnosis). When a diagnosis becomes probable, test(s) are then done to try to demonstrate the presence one of its sufficient diagnostic criteria, thus hopefully ‘confirming’ it.

• By fitting a logistic binary or ordinal model to estimate the probability of a disease (i.e. in my vocabulary, each of the differential diagnoses possible conditional on each of the patient characteristics in the cohort) would you be estimating the probability of each diagnosis given all the patient’s characteristics (maybe one at a time)? If I understand correctly, this would be done by using the probability of each diagnostic criterion conditional of each patient characteristic. (Would the logistic models involve some form of conditional independence?) However, I would also like to see this approach applied to a cohort of patients with a confirmed diagnosis to estimate the probability of one of its outcomes with and without a treatment.

• Would the cohort study with a lot of diversity of risk factors etc. be based on similar data to those in the 1970s such as de Dombals et al’s study on acute abdominal pain by applying ‘naïve’ Bayes? If so, I still have such data that were given to me by that group. It contained 131 characteristics that were the results of about 30 questions (or ‘tests’) and 7 diagnoses.

• Defining the diagnosis and patient signs/symptoms/risk factors solidly is the challenge of course. Historically this has been done in an ad-hoc (very non-solid) manner and the severity of the disease not taken into consideration. I explore how this situation could be improved in the final chapter of the Oxford handbook of Clinical Diagnosis (I am awaiting the first PDF proofs at present).

I would be very interested in what you think of all this and how our approaches are related.

1 Like

Glad to see this discussion. I wish I could contribute more frequently to this thread; I am on a difficult rotation at the moment. I hope to think carefully about the points raised here, and I also hope to read your linked manuscript @HuwLlewelyn, thank you.

Just a summary of my initial thoughts when bringing this up, if of interest: although the indirect way is worse than the direct way, I still think one should determine whether the indirect way is better than nothing at all. In particular, many are faced with the task of interpreting covid-19 tests at the moment, and, to my knowledge, there is no direct, conditional model available. One can, however, use the indirect approach.

I am also interested in the discussion about marginal and conditional models as they relate to test interpretation.

1 Like

@HuwLlewelyn continues to provide deeply thought-out and thought provoking comments. I wish I had the knowledge and time to give them more justice. Regarding the multiple diagnosis problem we would benefit from having large general clinical epidemiologic cohorts that would allow us to fit logistic regression model simultaneously, using for example shared random effects or copulas. In some settings a single multinomial logistic model can be used to address the differential diagnosis problem.

2 Likes

Thank you again for your comments. I must emphasise that the system of differential diagnoses that I describe in the Oxford Handbook of Clinical Diagnosis is what I observed my clinical teachers and researchers to use when arguing about diagnoses and tests and also what the discussants used in grand rounds of the Massachusetts General Hospital as published in the New England Journal of Medicine. Clearly the advisors of Oxford University Press agreed with me by getting me to ‘bottle’ the approach in a book for use more widely by medical students and trainees and others!

In order to understand this reasoning process completely and to do research to create new ‘tests’ that perform well during this reasoning, I derived a theorem that models this traditional human way of dealing flexibly with multiple diagnoses. I can also see how logistic regression and other statistical models can be used in conjunction with the mathematical model for this ‘human’ reasoning. This would also allow the data from Covid-19 to be addressed.

I have not been trained nor had experience of applying the wide range of statistical methods available and need to work with those who have to apply these with my model more widely. Perhaps if those with medical and statistical training (eg you @samw235711 ) and full time statisticians (eg you @f2harrell) read the forthcoming 4th edition of the Oxford Handbook of Clinical Diagnosis also, these methods could be combined to enhance medical practice and research.

2 Likes

Hi Huw

Does the framework in your book account for the fact that “next steps,” clinically speaking, are very heavily influenced not just by the content of the differential diagnosis we construct mentally or how commonly each condition on this differential is seen in a given clinical setting, but also by which diagnoses we must not miss. This is an essential part of clinical reasoning in office-based primary care.

Every day, we encounter patients who present with symptoms that have the potential to reflect a serious underlying diagnosis, but which often/usually turn out not to be due to a serious condition. We have to decide how best to triage those patients in a setting where we aren’t able to access additional testing.

For example, pleuritic chest pain is a common presenting symptom in primary care. Our suspicion of underlying pulmonary embolism will increase if the patient has a history of “risk factors” that increase the risk of DVT/PE e.g., active cancer, recent immobilization, post-operative state, prior history of VTE. These are the “no-brainer” decisions; we send these patients to the ER for further workup. But very often, patients have none of these “risk factors” for PE, and yet we don’t have any other obvious explanation for their symptoms (and this fact actually raises our suspicion for PE). In these situations, we often refer the patient to the ER for further investigation anyway, because the downside of missing the PE diagnosis is potentially so serious. I’m sure ER doctors face these same dilemmas, but at least they have options for further testing (e.g., D-dimer) at their fingertips- a luxury that physicians in office-based settings don’t have.

3 Likes

Hi again Erin

Than you for raising a good point. Yes, I explain this in the first two chapters and also the importance of sharing decisions. The decision about which of the differential diagnoses in a list to investigate first by doing tests to try to confirm some or exclude others depends on their probabilities conditional on the presenting complaint (or some other finding) and their seriousness and the patient’s perceptions. However, the book focuses mainly on outlining what tests can be done and how to interpret their results.

As you imply, one way of excluding a serious diagnosis such as PE is to look very carefully for a simple alternative explanation (e.g. tenderness of a rib or intercostal muscle suggestive of musculo-skeletal cause). If there is only one mild symptom of recent onset and no risk factors etc, then I emphasise that an important possible diagnosis, especially in primary care, is a ‘self-limiting condition of unknown cause’. The question of what tests to do depends on the clinical setting and local availability and guidelines. If the practice is in a remote area where tests take a long time to be done and for results to come back, it might even be necessary to start provisional treatment (e.g. subcutaneous LMW heparin) to be stopped if the results are negative in due course.

There is clearly more to clinical practice than diagnosis, but which is of course very important and easier with well designed powerful tests, which I discuss in the final chapter.

1 Like

To clarify, I didn’t say that I’d use chest wall tenderness as a reliable method to “exclude” PE. A high proportion of primary care patients, if you palpate their chest wall vigorously enough, will report tenderness, so I’m not generally reassured by this physical finding alone, especially in the absence of any history of chest wall trauma and when a potentially more serious diagnosis is on the differential.

However, I do agree that many primary care presentations are undiagnosable and that many of these will end up being self-limited. Family physicians’ and ER physicians’ roles are primarily to “catch the bad stuff.” Over time, primary care physicians have to become very comfortable with managing uncertainty. The key is providing proper instruction as to when the patient should seek followup (e.g., if symptoms are getting worse rather than better/if new symptoms develop/if symptoms don’t resolve within a certain timeframe). I’m sure my patients wouldn’t feel reassured by hearing this, but sometimes I look back, at the end of a clinic day, and realize that I don’t know what any of them had ! :slightly_smiling_face:

2 Likes

I only mentioned chest tenderness as an example; I’m sorry if I gave the impression that it was your suggestion. All physical signs have to be elicited carefully of course. Palpating or even asking questions forcefully will tend to elicit a false positive result, which I warn against in the book. However, forceful actions will tend to produce true negatives.

Chest wall pain due to trauma will usually be recognised by the patient anyway without medical advice. Sharp chest pains made worse by inspiration causing diagnostic uncertainty may be caused by a strained muscle due to some innocuous event such as a cough or lifting an unremarkable weight or PE or infective pleural reaction etc., etc. In modern western medicine, a negative d-Dimer would be used to make a PE less probable (not excluded) and a musculo-skeletal cause or pleuritic lung infection etc more probable. A PE may still be a possible cause but all we can do is our best with available technology and medical consensus in the form of guidelines.

I agree with you that despite all this, we will always be left with a degree of uncertainty about what is really going on and keep in touch with the patient to determine the outcome.

2 Likes