# Precision and decision making

I guess what I’m saying is that I don’t think this is really possible without some sort of loss function. How expensive is the test? Who bears those costs? What care is foregone as a result of loss of that money/clinician time? From whose perspective is the decision being made? How long will the test be relevant? What benefit does it provide over the current best-practice (genetic risk + history and environment I assume?). I guess you could try to assume that the OR is directly proportional to utility of the decision, but I have a hard time believing that will lead to good decision making. Even then, the uncertainty related to the decision is more relevant for whether to delay implementation to fund future research to increase precision or implement now and also fund research.

Say that you use some rule that says you shouldn’t recommend treatment if your 95% Credible interval includes 2 (because of some sense that then risks of surgery or risks + costs outweigh benefits). Then in this case you would not recommend use of the screening tool even though it’s more probable to be effective than not. The link above is a seminal paper on exactly this problem (i.e. how to incorporate uncertainty into decision making).

Decision analysis for a given population would be done by a research team and then used by clinicians and patients as part of the decision making process if the patient in front of them is relevant to that analysis.

Some other thoughts that come to mind:

• Your OR is really only a tool to turn an assumed baseline risk into absolute probability for a given population. Making a cut point based on an OR assumes that you have a well enough defined patient population that they all have similar balances of benefits/harms at that OR.
• All of this also assumes your predictions are well-calibrated, your OR of 3.01 can be interpreted causally, etc…
1 Like

I think we are discussing at cross purposes. My point is not whther 3 or 4 or 2 or any other value is the right trheshold, but if we have a threshold what credible interval is relevant. You selected 95% but why? Why not if the 50% credible interval excludes 2, or even just the point estimate.

I just used 95 to be consistent with what you were using above.

The relevant interval is going to depend on whatever loss function you implicitly used when coming to your OR = 3 threshold. Maybe the easier question is at what point would you no longer recommend treatment? OR = 1.3? 2? 2.5? Are you risk averse? Is the patient? Maybe the best approach would be to make a plot with ORs ranging from 1-5 on the x axis and P(OR at least that large) on the y.

Whether or not you recommend using the tool shouldn’t be based on any cut-point from that plot though, if your goal is to maximize patient benefit.

I respectfully disagree. It is perfectly possible to estimate the accuracy of a test by estimating their prospective sensitivity and specificity. Here are a couple of examples: doi: https://doi.org/10.1136/bmj.327.7426.1267 and https://doi.org/10.1111/j.1464-5491.2005.01494.x. The “prospective” sensitivity and specificity of the test may be the best way to assess the accuracy of any tool used for prediction, at least from the clinical perspective. If we are to make a treatment offer to a patient, based on the results of a predictive test, which is the issue in this case (if I got it right), we should know how good our prediction would be conditional on the finding of the predictive tool. A relative measure of effect does not tell that. We do not offer treatments based on risk ratios or odds ratios. As suggested in a previous comment, what is needed in this case is some sort of loss function, such as a cost-effectiveness analysis. Otherwise, we can not know what is the need benefit of using this prediction tool and what is its incremental cost-effective ratio as compared to the standard option (i.e. not using the prediction tool). For instance, when we use the Framingham equation to inform the decision of offering preventive treatment, we do not do it because the risk of cardiovascular events in patients with a high Framingham score is 5 or 10 times higher than in patients with a low Framingham score. We do it because we know that a sizable fraction of patients with a high score will develop a cardiovascular event and that, event if not all of them will develop a cardiovascular event, the benefit of this approach (selecting patients by their expected risk) is greater than the cost (including the cost of not treating patients with a low score who will develop a cardiovascular event).

I have obviously failed to explain the issue sufficiently. The Framingham score is somewhat distinct from a genetic test. The genetic test is either positive or negative - there is not some sort of continuous measure with the test being labelled as positive above a given threshold.

The penetrance or disease risk is akin to the positive predictive value - it’s the probability that someone who tests positive will get the disease in the future. SO we ahve a direct estimate of the PPV and the issue is about the precision of that risk estimate associated with a positive test.

I would no longer offer intervention if I were certain the OR<3. The question is how certain?

Thanks for clarifying this.

I think you have been clear. You’re asking a question about uncertainty in decision making, but I think the issue is that we approach recommendations for medical decision making very differently. I just don’t think you can answer any questions about how you should deal with this kind of uncertainty without really knowing the costs of being wrong/benefits of being right.

If we take for granted that the OR of 3 is a good decision making rule, and that any value below 3 would result in withholding the test then the value of information (i.e. future research) would be at its peak so then the question is whether future research is actually worth it, since as you mention:

So one approach to whether you should make a decision today vs do further research is to conduct a value of information analysis which requires:

1. A estimate of the size of population for whom the test is relevant
2. A time horizon over which this test would be relevant (i.e. when it will be replaced)
3. The cost of future research
4. The time required for additional research
5. A loss function

If the cost of future research (in \$ and foregone health) is more than the value of that information then the right decision is to make the decision today that maximizes utility. This way you’re taking into account all potential benefits, and harms including the harm of delaying access. This assumes you aren’t risk averse, so it will change if you are (i.e. being a little wrong is worse than the truth being a little better). You may be interested in reading Gianluca Baio, Carl Claxton, Mark Sculpher, Andrew Briggs (white book). This is why I don’t think you can separate the problems of specifying a cut point and assessing how you should handle uncertainty in your parameter.

Using retrospective reverse information-flow quantities of sensitivity and specificity is like making three right turns to do a left turn. Coupled with the fact that sens and spec vary significantly by patient types it’s very hard to see why the use of transposed conditionals ever caught on in the first place. If you like sens and spec then you’d have to like the following way of summarizing a hypertension randomized trial: patients with higher blood pressures were more likely to have been randomized to treatment A.

Let me clarify my original concern, as I’m finding it hard to grasp the relevance of your arguments. If I understood correctly, what motivated the original question was the investigators desire to develop a clinical decision rule: When should we offer an oophorectomy to a patient with a positive genetic test? To answer that question the investigators are looking for ways to refine the precision of the odds ratio for the association between the gene variant(s) and the incidence of ovarian cancer. My point (limitation?) is that I fail to see how such a rule could be based on a measurement of the association between the gene variant(s) and the disease. As far as I understand, such a rule can not be defined without taking into consideration the properties of the test (sensitivity and specificity), the expected incidence of the disease, and the absolute and relative cost and benefits of the proposed intervention. I fail to see what this has to do with the flow of information in estimating sensitivity and specificity or the direction of causality in interpreting randomized trials.

They are not that different, since we make the Framingham score positive or negative by using a predefined cut point. I think Tim suggestions are right on target: “we approach recommendations for medical decision making very differently”… you should conduct a value information analysis (I’d say a cost-effectiveness analysis) to define the best cut point. Assuming you could get the best precision possible for the OR, I still do not see how you would go from the OR to a clinical decision rule (a decision to offer or not to offer an oophorectomy to a patient with a positive test). Of course, my not been able to figure out how to do this, does not mean it is impossible.

Yes thanks for pointing us back to the original question. When you mentioned sensitivity and specificity those were unnecessary to the discussion.

While having good precision in estimating ORs (and adjusted regression effects in general), the discussion really needs to be on the absolute risk scale with full adjustment for all relevant clinical variables. Then one can simply plot post-test probabilities against pre-test probs. to see relevance (where “test” = new information such as genetics) for clinical decision making, as is shown here.

I made is explicit in my original post that this is not the question. The question is, given a defined clinical threshold based on a risk, how precisely do we need to estimate this risk. I can only suggest you go back to my original question and reread it carefully.

I see your point now, and I believe we’re in the same page. This is my overall take: A precise OR would be best for assessing whether the new genetic test adds to the “standard” clinical data in predicting who will get the disease. This is a necessary step, before determining how make the best use, from the clinical perspective, of an improved risk score scale. Some form of cost-effectiveness or value of information analysis should be used in the latter step.

I’ll repeat, there is no cut point. This is about risk estimtion in women with a particular genotype.

I went over your question again. This is what through me off course

And then you added:

The emphasis on when to offer prophylactic surgery make me jump into “what information does one need to figure out if prophylactic surgery should be offer to patients with a positive genetic test?” And my guess is that one needs to know the predictive properties of the test, what proportion of the population has a positive test, what is the baseline incidence of the disease, and what are the costs and benefits of an oophorectomy. Basically, one needs to be in the position to make a cost-effectiveness analysis.

I apologize for misunderstanding your question. Still, the exchange has been useful for me and, hopefuly, for others.

I see precise estimation of an effect (e.g., an OR) as a first step, but impact for decision making cannot be judged using relative measures. Instead alterations in absolute risk distributions due to knowing the new information is more central to the original question. The assessment of added information needs to come from things like those discussed here.

Indeed, it is the absolute risk that is relevant to the decision - as I think I made clear in my original question - but the question remains. The distribution of the likely absolute risks and how certain we would like to be that it is above a given threshold is conceptually similar. In germline genetics, absolute risk estimates often based on application of odds ratios as approximations of relative risks to population rates.

What is the “lay” translation of the meaning of “Protein truncating variants were associated with an increased risk with an odds ratio of 3.0 (95% CI 1.6 – 5.7)”?

Does this statement mean that case-control studies showed that the “odds” that women with breast cancer were found to be carrying the allele in question were approximately 3-fold higher than the “odds” that women without breast cancer were found to be carrying the allele?

You seem to be highlighting uncertainty around how to interpret the confidence intervals around results from gene association case-control studies. Specifically, such studies might identify many alleles that appear to be “associated” with ovarian cancer. I think what I’m hearing you say is that point estimate magnitude from these types of studies is typically used as the primary guide to decide which alleles might warrant a closer look. And I think you’re asking whether a point estimate with a wide confidence interval should be considered as compelling (with regard to deciding about “next steps”) as a point estimate of similar magnitude but with a narrower confidence interval?

Since I’m not trained in this area, I only have a vague understanding of what the “next steps” might be after deciding which alleles warrant a closer look. Clearly, there are many other considerations that might factor into estimating an individual woman’s absolute risk of developing ovarian cancer, quite apart from consideration of whether she carries any particular allele…But I don’t think you’re asking about next steps- rather, I think you’re asking about the best way to “triage” all the ORs you might get from gene association case-control studies.

This is a bit of a late reply, Frank, but I believe that in specific circumstances sens and spec for prognostic questions make sense.

Assume the following. You have in mind women at risk of developing breast cancer, and have a prognostic model/marker. You have developed a more intensive screening program, to detect cancer early. You are willing to invite marker positive women to your program.

In this case, (prognostic) sensitivity expresses the proportion of women who will develop breast cancer that are in your program, and specificity is the proportion of women who will not develop breast cancer and are not invited to your more intensive program.

In this case, I am considering the cumulative 5-year sensitivity and the dynamic specificity.

For guiding decision making, these will probably be helpful statistics. Of course, a fully informed decision about more intensive screening will never rely on just sens/spec, but on the full range of consequences.

Patrick

Since sensitivity and specificity condition on outcome status, they are only applicable to retrospective studies such as case-control studies. And they require test results to be purely binary and outcomes to be purely binary.