“However this baseline risk should not be estimated in any old way. (For example some cardiovascular risk calculators will predict that a lipid lowering drug will reduce CV risk in a person with an extremely high blood pressure when the lipid levels in that person are impeccably low risk!)”
Are you questioning the validity of currently-used CV risk calculators (e.g., Framingham- derived from massive, decades-long observational studies) or are you highlighting the fact that CV risk is multifactorial?
If you are suggesting that our estimates of “baseline” CV risk for individual patients should be based on observations derived from RCTs rather than large, longterm observational cohorts, then I’m confused…how exactly could we derive valid risk estimates from RCTs (which might be orders of magnitude smaller and short-term) ?…
As an aside, I have patients with LDL values close to 2.0 (not on a statin), who were subsequently diagnosed with advanced coronary and/or carotid atherosclerosis. I (and their cardiologists) still treat them with statins to reduce their future MI risk.
Beautiful arguments related to the open research question of how best to debias the different types of such datasets. To prevent these considerations from distracting us from the main theme, we use in the aforementioned upcoming paper observational data derived from RCTs. This is consistent with the use of externally developed risk models described in the PATH statement.
Having said that, prospective rigorously conducted observational studies focusing on data collection across departments/institutions/countries can also address these considerations as well as, or better than, external RCTs used to develop risk models. This is because the purpose of RCTs is to estimate relative treatment efficacy whereas such protocols are specifically designed for risk modeling. See here for an example of a companion prospective technology-enabled data collection protocol that has now enrolled hundreds of patients and is allowing us to redefine disease states related to the novel toxicities observed with immune checkpoint therapies.
The Framingham data is used to estimate total CV risk. I have no quarrel with that. My concern is that a RR (e.g. 0.5) based on the treatment of one risk factor (e.g. a statin for hyperlipidaemia) should only be applied to the risk component due to that risk factor alone (e.g. of 2%). This individual risk reduction of 2% x 0.5 = 1% should then be subtracted from the total risk due to hyperlipidaemia, smoking, high BP, diabetes, gender etc. (e.g. 10%-1% to give a reduced total risk of 9% - if one assumes additive risk factors). Instead, some risk tools appear to apply the RR of 0.5 to the total risk (e.g. 0.5 x 10% = 5%, which exaggerates the risk reduction. Something along these lines would be my response also to @Pavlos_Msaouel 's “Why not?” Ideally the estimated risks (e.g. of MI) should be calibrated against the frequency of outcomes (e.g. of MI) in an observational study if the risks were constant enough over time.
Yes, I think we are pretty much aligned. What I do also want to highlight is that the computer scientists have developed syntaxes (graphical and mathematical) that can be used to explicitly represent these considerations. These tools are powerful and upcoming generations of biostatisticians (or data analysts or whatever we want to call them) will very likely be taking advantage of them a lot.
That approach to gauging the benefits of statins doesn’t seem consistent with the inclusion criteria for the large pivotal statin RCTs- these criteria didn’t hinge on patients having some specified pretreatment LDL level, but rather on the presence of clinical evidence of cardiovascular disease. In other words, the thing being randomized was not an LDL level, but rather a high risk patient..
Since the patients enrolled in the pivotal statin trials were “high-risk” for cardiovascular events not simply due to their LDL levels, but rather as a result of their clinical cardiovascular disease (which in turn would have reflected their global risk profiles), it seems kosher (to me) to apply the relative risk reduction seen in these trials to patients’ global CV risk (as estimated from observational studies), and not just to the LDL-attributable CV risk. I’m not an expert on this topic, but the implication that statins would only confer meaningful CV risk reduction for patients with elevated LDL seems at odds with everything I’ve ever been taught…
You are right. The rough estimate of the risk ratio (RR) for statins from high CV risk trials irrespective of lipid levels is about 0.8. Therefore if the untreated risk is 10% over 10 years, then on average, taking a statin will reduce it to 8%. This hides a variation in risk reduction (AKA heterogeneity of treatment effect) which is estimated as 20% reduction per 1mmol/l reduction in LDL cholesterol.
This implies when statins are given without knowing the LDL cholesterol, the average reduction in LDL cholesterol is 1mmol/l giving a reduced risk of 20% and the above RR of 0.8. However, if the LDL cholesterol is reduced by 3mmol/l from 4 to 1mmol/l then there will be a roughly 3 x 20 = 60% reduction in CV risk so the RR is 0.4. However, if the LDL cholesterol is already 1mmol/l then it is unlikely that taking a statin will reduce it beyond measurement error and achieve any significant risk reduction.
You are right that public health policy from the results of overall CV risk trials is to treat irrespective of the individual person’s LDL cholesterol. This avoids time consuming testing, interpretation etc and is estimated to be the most cost effective approach from an economic point of view based on doctor’s time, cost of tests, etc. However, the personal cost of this approach is that many with low LDL cholesterol are treated when they have little prospect of benefit from a statin and may be subjected to its adverse effects.
Many object to this blunderbuss approach especially those who belong to the Over-diagnosis / over-treatment movement. Perhaps if a patient suffers adverse effects (e.g. muscle pain) then it might be sensible to practice ‘personalised’ (as opposed to public health) medicine and to stop the statin, measure the LDL and HDL cholesterol and if they show very low risk for that particular patient, then advise not starting again. If such a patient then suffers a CV event, then the ‘cause’ is probably due to some other risk factor, not lipids. Perhaps that @Scott and JP might like to comment on this.
Hmm- it would be interesting to hear from others on this topic. Although this thread didn’t start out with a focus on statins, the topic provides a perfect clinical example of the challenges of applying RCT findings to individual decision-making…
There are clearly differences of opinion around the potential benefits of statins even at lower LDL. I’d have trouble withholding a statin from a patient with an LDL of 2.0 and documented CAD (and, in my experience, so would cardiologists).
The section of UptoDate entitled “Management of low density lipoprotein cholesterol (LDL-C) in the secondary prevention of cardiovascular disease” includes the following statement in the Summary section:
“For patients with CVD, independent of the baseline low density lipoprotein cholesterol (LDL-C) level, we recommend lifelong high-intensity statin therapy ([atorvastatin] 40 to 80 mg or [rosuvastatin] 20 to 40 mg) rather than moderate intensity statin or no LDL-C lowering therapy ([Grade 1A]. For patients who do not tolerate these doses, the maximally tolerated dose of a statin should be used.”…
And while my own approach as a family physician has been to tailor statin treatment (including dose) primarily to the patient’s CV risk rather than a number (LDL), it’s certainly always important to consider patient preference. Fortunately, statins are extremely well-tolerated for the vast majority of patients.
I accept what you say but would point out that in the case of people with existing CVD we are talking about secondary prevention where it is widely accepted that statins should be prescribed irrespective of lipid profiles, perhaps because of a high prior probability of lipid abnormalities. This is supported by a widely quoted theory that statins also have an anti-inflammatory effect that reduces or reverses atheroma formation (perhaps via LDL cholesterol reduction as there is also a theory that latter causes endovascular inflammation). I am unaware of RCTs to assess the efficacy of statins on the frequency of CV events in people with low LDL cholesterol but below is an observational study where about 40% of people already had CVD so that it was partly a case of secondary prevention [1].
The risk reductions that I was referring to in my previous post concerned primary prevention when no CV events have yet taken place. I agree with you that this discussion illustrates the complexities physicians face when trying to interpret RCTs and observational studies and practising personalised medicine.
What’s the net N for the sum total of all statin RCTs? How are statins not a solved problem by now?
A) The trials weren’t designed/conducted right
B) The trials weren’t analyzed right
C) We’re making progress, but need to stay the course and do more trials
D) Something else is wrong
I think that the answer is D because ‘diagnostic’ tests are not assessed properly. In my opinion, they are not assessed properly for use to provide conditional probabilities of outcomes on intervention and control and the absolute risk (or probability) differences. Instead, risk ratios or odds ratios are applied incorrectly to total risk as explained in post 46. The assessment of medical tests focuses on estimating sensitivities and specificities that only give a rough idea of a test’s value in screening. Tests are also used to give lists of differential diagnoses, to differentiate between them and to form sufficient and necessary diagnostic criteria. The latter are closely related to their use in providing conditional probabilities of outcomes on treatment and control. So in my view the problem is not with RCTs but the backwardness of implementing their results and in the use of diagnosis tests. I explain how much of this might be put right in the Oxford Handbook of Clinical Diagnosis (e.g. http://oxfordmedicine.com/view/10.1093/med/9780199679867.001.0001/med-9780199679867-chapter-13#med-9780199679867-chapter-13-div1-20) and in a recent preprint (see post 38).
What is the problem you’re referring to David? Is it the problem of accurately estimating 10 year MI risk in a primary prevention setting (i.e, are you highlighting the suboptimal nature of existing CV risk estimators/calculators in common use)? Or are you questioning whether it’s valid to extrapolate the relative benefits of statins from secondary prevention trials to a primary prevention context? Or are you questioning whether the relative efficacy of statins might decrease as a patient’s baseline LDL decreases?
From a purely practical standpoint, I don’t really perceive a major problem, provided that we consider that the primary role of statins is to reduce CV risk rather than just LDL.
I know none of the MDs on this site need to hear this, but non MDs might be interested in how primary care physicians and cardiologists usually approach discussions around statin treatment:
Everyone agrees that patients with clinical evidence of CAD (e.g., angina or a history of MI) need statins- these patients have already proven that they have vascular lipid plaque. This treatment context is called “secondary” prevention;
For those without symptoms (e.g., no angina or past history of MI), we use risk calculators derived from large, longterm observational cohort studies (as imperfect as they may be) to estimate 10 year MI risk. Examples of risk factors included in these calculators include: age, smoking status, lipid levels, presence/absence of diabetes and hypertension, etc…We share the 10 year absolute risk estimate with patients and we then usually quote the relative statin effect, as derived from large historical statin RCTs (which were conducted in a secondary prevention context). We apply this relative statin effect to each patient’s estimated absolute baseline risk, thereby providing them with an estimate of the absolute risk reduction they might expect over a given period of time if they were to start a statin;
A third group of patients has arisen in the past few years and the size of this group is expanding rapidly as our imaging techniques improve. These are patients who have imaging done for reasons unrelated to their vascular health, whose CT scans and ultrasounds (chest, abdomen) incidentally show evidence of atherosclerosis. There are no guidelines for how to manage these asymptomatic patients who nonetheless have anatomical evidence of atherosclerosis. My informal observation is that cardiologists tend to recommend statins (usually high intensity) for them (and this has been my own practice as well). These patients fall somewhere “between” the primary and secondary prevention context, not really fitting into either category.
Was there a better way to design the original large statin trials? If so, what would they have looked like? Is there a better way for us to be managing statin discussions than the method we’re using now?
Erin, part of what instigated my MCQ was the tentative-sounding heuristic you had mentioned in an earlier post:
Shouldn’t we be doing studies in such a way that we could by now approach these questions in a less purely heuristic/phenomenalistic manner? Or is this type of therapy too marginal to have such expectations of it? Is precision medicine practicable only in cases where the risks and risk reductions are large in absolute terms?
I’m not sure I’d describe this approach as a “heuristic.” I haven’t really read anything that convinces me (at least in terms I can understand, not coming from a math background), that we’re all committing an egregious error by practising the way we have been now for many years.
I don’t like prescribing pills and most patients don’t like taking them. But it’s important to remember that risk stratification can fail in both directions. Since our predictive ability is suboptimal, one person might be overtreated, but another undertreated. From a public health standpoint and also the standpoint of a doctor who likes her patients, I’m much more concerned that our current risk calculators will still misclassify an occasional young man as “low risk” for coronary disease, only to have him suffer a fatal MI at age 50, than I am that I might be prescribing a statin for a couple of decades to a person who was never “destined” to develop significant atherosclerosis.
I think there’s a risk in overthinking things, at least where this particular drug class is concerned. Considerations will certainly be different for medications with more significant toxicity and unestablished efficacy. But as far as statins go, we’re talking about a dirt cheap drug class (all now generic) with decades of safety data and a generally benign side effect profile.
You seem to be wondering why, after all these years, we still have not developed a more “personalized” approach to statin therapy in the primary prevention context. I would like that too, but easier said than done. As you know, this is basically the idea underlying coronary calcium scoring (the utility of which has proven to be fairly controversial for several reasons). And maybe let’s just all agree not to go down the polygenic risk score rabbit hole
There are undoubtedly many very smart people who have been thinking long and hard about how to achieve what you’re suggesting using a method that is accurate, widely accessible, and easily affordable by healthcare systems. I hope they succeed some day.
This section of your Handbook (just before the one you linked) invokes a supposed distinction between ‘hypotheses’ and ‘theories’ that I find always muddles things.
In fact, you do here much the same thing Cifu & Prasad did in Ending Medical Reversal, recounted in my review:
In the standard understanding of science shared alike by professional scientists and the educated lay public, the process of science involves advancing theories, which (if they are to have a scientific status) must yield predictions that can be tested. (We have all recently seen a stunning example of this process at work in the detection of gravitational waves predicted by Einstein’s General Theory of Relativity.) Against this understanding, however, in the section of Chapter 11 headed How Science and Medical Science Differ, Prasad and Cifu wish to teach us this: “When medical science functions properly, medical paradigms suggest hypotheses: for example the hypothesis that tight blood sugar control benefits diabetics. The next step in the scientific method is to test this hypothesis, in this case by a randomized trial testing whether strict blood-sugar control is better than lenient control.”
Let’s review this linguistic snatch-and-grab in slow-mo. The first bit of legerdemain is to replace theory with Kuhn’s inscrutable paradigm, thus depriving theory of an essential characteristic — its amenability to criticism. The word hypothesis, used interchangeably with theory by a philosopher such as Karl Popper, is also disposed of — hidden in plain sight by debasing it as a substitute for prediction. This latter trick has two profitable effects for the authors’ argument. Firstly, it sweeps aside the dangerous idea of prediction, lest it remind us of the power and precision of which theories are capable. (Heaven help the authors should the idea of precision medicine based on predictive models cross the reader’s mind whilst reading this chapter!) Secondly, in its thus-debased usage, hypothesis now all too readily suggests the ‘null hypothesis’ central to the authors’ favored RCT methodology. The reader who now takes the authors’ linguistic Monopoly money for philosophical gold, need hardly think (and is this the point?) in order to accept the author’s RCT recipe for medical science.
The similarity is that you both debase the idea of a hypothesis by conflating it with prediction. (In further conflating imagining with predicting, you rise to even greater heights of philosophical villainy! ) It is interesting to see these same maneuvers used repeatedly in the defense of EBM.
I didn’t think that I asserted that hypotheses, theories, guesses and predictions were equivalent, thereby conflating them. The subsequent discussion makes it clear that the situation is more complex and that there are usually rival theories and hypotheses. These are tested by seeking evidence that supports one or more and undermines others. This is complicated by having to bear in mind that some potentially rival hypotheses may not even have been considered yet. My main point is that theories become hypotheses when they are being tested against rival possibilities. This is what happens during the differential diagnostic process, when the evidence sought includes the results of treatments aimed at provisional working diagnosis as well as symptoms, physical signs and test results. How would you have written my section? I would be grateful for your advice.
Consider taking inspiration from the way Bayesianism dispels the muddle that frequentism makes of almost everything. In place of frequentism’s unclear (pseudo-)distinction between ‘fixed’ and ‘random’ effects, hierarchical modeling can offer us a coherent perspective in which everything is a random effect. Even closer to home via-à-vis your own text, consider the Bayesian doctrine that “everything is a parameter”, enabling us to see e.g.
… the limitations of unbiasedness as a general principle: it requires unknown quantities to be characterized either as ‘parameters’ or ‘predictions’, with different implications for estimation but no clear substantive distinction.
[BDA3 p.94]
Likewise, I would encourage you to adopt the radical perspective that “everything is a conjecture”, and then to use more standard language to make the distinction between ‘background knowledge’ that (for the moment, within a given problem-situation) is not regarded as problematic, as against the contrasting theories that are being compared. (Compare how @zad & @Sandersay that “interpretations of statistical outputs … condition on background assumptions.”)
At all costs, avoid writing such things as “[h]ypotheses are guesses or predictions” and “[a] ‘theory’ is a prediction” if you don’t want people to conclude you think Hypotheses \subset Guesses \cup Predictions and Theories \subset Predictions. (My actual interpretation of “guesses or predictions” is that you meant in the usual idiom to convey synonymity, i.e. that Guesses = Predictions; I employed the set union here only out of an abundance of caution.)
Why not adhere to customary usage in philosophy of science here? Why not say simply that theories generate (not ‘become’) predictions, which enable empirical tests? The ‘classic’ examples from physics (cf. gravitational waves as mentioned above) always seem to me the best touchstones, but diagnosis lends itself fully to the same kind of account. The differential diagnosis constitutes a list of alternative theories of the presenting illness, and we wish to criticize and select from among these theories. Each theory generates predictions (about exam findings, lab values, even historical details we might delve for in a subsequent interview), and to the extent that different diagnoses generate different predictions about these observable facts, then we have a basis for selecting from among the theories. Consider that the usual custom in medicine is to ‘cross off’ (or ‘rule out’) possibilities from the differential, in a characteristically falsificationist spirit. I’ve heard it said by some that you should order a test only if you think you know how it will result — that is, only if you have a theory that predicts the result and could potentially be falsified by the test.
Thank you.I will take on board what you have said for the 4th edition. As you say, the difficulty is reconciling traditional concepts (e.g. ‘ruling out’ or ‘confirming’ a diagnosis) with probabilistic concepts. I emphasise that predictions are always probabilistic but decisions are ‘decisive’. One can be decisive without being certain of the outcome. Therefore ‘ruling out’ a diagnosis is a decision not to pursue its implications, not a prediction that it certain to be absent. My understanding is that a theory is something that we have decided to assume for the time being. When we decide to test it against other theories, it becomes one of our hypotheses.
The following analogy (which I understand to have been present already in the original 1934 Logik der Forschung) has much to offer here. In place of the ‘confirmed’ diagnosis, it offers a ‘piling’ that is “firm enough to [guide the treatment of the patient] for the time being.” That is, we consign it to ‘background knowledge’ and bring other problems into the foreground. As you say, a diagnosis can always be revisited [cue discussion of ‘anchoring’, etc].
The way you are using ‘prediction’ suggests to me that we have become accustomed to using it in 2 distinct senses that are in danger of overlapping:
We might speak of the predictions that a scientific theory (including a diagnosis) yields, which we use to test the theory (or diagnosis)
But we might also offer ‘predictions’ as prognoses [1,2] that are inherently probabilistic, and which we use for entirely different purposes, such as end-of-life decision-making
While subjective probabilities may play an absolutely essential role in the latter, there is a very strong argument indeed [3] that deciding between theories is not usefully carried out using Bayesian probabilities.
Popper, K., and D. W. Miller. “Why Probabilistic Support Is Not Inductive.” Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 321, no. 1562 (1987): 569–91. https://doi.org/10.1098/rsta.1987.0033.
Just speaking in general, decisons aren’t ‘decisive’. Physicians often put patients on a treatment in a trial period, or test a drug for a while and change the dosage. Watching waiting can precede surgery. Revisiting decisions occurs most often when the decision was a borderline one in the first place.