As a final comment, I’ll just say that your made-up example is certainly an interesting thought experiment that shows how inferences from RCTs and observational studies could be very different. Maybe this is just a profound failure of imagination on my part, but I have trouble putting much stock in a vision of the future of medicine that is very unlikely to be realized due to insurmountable ethical and logistical/pragmatic barriers. Specifically, I have trouble envisioning a future in which we come to trust that patients will be just “naturally” drawn toward treatments that also happen to be strongly and positively correlated with their prognosis.
And I realize that you (as an MD I think?) are very much aware of the complexity of human physiology and behaviour. But I’m not at all sure that it’s possible to internalize the sheer magnitude of this complexity other than through interacting with thousands of people over many years, discussing in detail the reasons for their various health-related decisions, and seeing case after case that doesn’t conform to the “expected” trajectory (for myriad reasons related to poorly-understood biologic complexity/comorbidity and fickle human behaviour).
Thank you, what an insightful comment. Yes, I’m a junior MD. Rightly or wrongly, I share your scepticism. Is it the case that the only non-toy DAG we are willing to “accept” is the one that shows no arrows pointing to exposure in a randomised experiment? Kind regards, Kuba.
Instead of acceleration of knowledge we have many examples where observational studies either resulted in non-working treatments being adopted into clinical practice, or causing a delay in launching a proper RCT.
Do you view that an inherent feature of observational studies? I always thought there was a large amount of room for improvement in their design and conduct. How does this relate to E.T. Jaynes’ observation in Probability Theory: the Logic of Science that:
Blockquote Whenever there is a randomized way of doing something, there is a nonrandomized way that yields better results for the same data, but requires more thinking. (p. 512 emphasis in the original).
There are at least two types of observational studies:
Those that are prospectively planned with a significant amount of resources used to collect high-quality data without much missing data, for which it is still likely that important confounding by indication can ruin the study, and
Those that use convenient data collected with no funding under no protocols with lots of measurement errors and missing data, for which meaningful results are more guaranteed not to happen
The biggest problem with randomization is that it usually arrives late to the scene. For more controversial thoughts see here.
Blockquote
Those that are prospectively planned with a significant amount of resources used to collect high-quality data without much missing data, for which it is still likely that important confounding by indication can ruin the study,
For the sake of discussion, I’d like to explore the relative merits of random experiments vs observational research from a decision theoretic viewpoint.
Blockquote
We therefore see that randomization can play an important role even In the personalistic, Bayesian view of inference. This is contrary to the opinion resulting from the basic theorem in decision theory, that for any randomized decision procedure there exists a nonrandomized one which is not worse than it, to the effect that randomization is unnecessary in the Bayesian approach. The reason for the difference is that the use of a random mechanism is not necessary, it is merely useful. [my emphasis]
Would it be fair to say that there are certain contexts that arise in medical research that Bayesian purists like ET Jaynes have not accounted for, that make randomization more than a “merely useful” device?
I’d concede that randomization is a very useful device, but I’m not going to insist all causal claims require an RCT. The pragmatic view treats positive assertions of beneficial treatments (those require RCTs, in addition to other preliminary studies) differently than assertions about harms from previously approved treatments (ie. post-marketing surveillance is inherently observational).
For a strong defense of randomization, Senn’s Fisher’s Game with the Devil, (pdf) is worth reading.
For a Bayesian perspective on the role of randomization in a rigorous decision theoretic framework for experimental design and analysis, this paper by Dennis Lindley is essential reading.
Lindley, DV (1982) The Role of Randomization in Inference. Proceedings of the Biennial Meeting of the Philosophy of Science Association (link)
My oversimplified way of thinking about it is this. Whether Bayesian or frequentist, randomization is an important device that assists you in getting the data generating model right, and in making model misspecification less harmful.
Indeed, consider an extreme case where the observational study shows 100% survival for all option-having patients, as if each patient knew in advance where danger lies and managed to avoid it. Assume further that a non-zero fraction of patients in the RCT control arm die. Such a finding, though extreme and unlikely, immediately rules out Model-1 which claims no treatment effect on any individual.
The extensive use of numerical examples in this (now very long) thread has prevented me from putting my finger on the reason why some of the ideas expressed above seem conceptually “not quite right.” But after several reads, I have homed in on the sentences that caused me to pause.
The assertion in the block quote above is not correct. Seeing some patients in the control arm of the RCT die after not being offered treatment, while other patients in the observational study survive after “choosing” to take the drug does NOT rule out Model 1. This statement reflects a fundamental misunderstanding of the inferences we can make from an RCT.
Patients in RCTs of experimental treatments can die for MANY reasons and many of these reasons can be COMPLETELY UNRELATED to the treatment being tested. Inferences of efficacy that are based on RCT results are not made at the level of individual patients, but rather at the group level (as reflected by overall outcome rate differences between arms). Even if between-arm differences in outcome suggest intrinsic efficacy of the therapy being tested, we can’t then “drill down” to the level of individual trial subjects after the trial is done and say “he was saved by the treatment” or “she died because she was not offered treatment.”
In other words, there would be nothing remotely clinically or inferentially compelling about a scenario in which a patient who “chose” to take a treatment in an observational setting happened to survive, while a patient who was “assigned” to the control arm of an RCT used to test that treatment happened to die. These two observations are still perfectly compatible with the possibility that the drug being tested has no intrinsic efficacy in any patient. It does NOT “logically” follow from these observations that there must be some occult qualitative difference between the RCT patient and the observational study patient (e.g., a gene that dictated death or survival in response to drug exposure) that explains their differing outcomes.
I’ve tried to understand whether observational studies are necessary for personalisation of treatment by drawing a directed acyclic graph (DAG):
Randomisation removes all arrows pointing to treatment
Randomisation does not remove any arrows pointing to outcome
Randomisation does not change the common effect of treatment and covariates (all arrows collide on outcome)
It follows that if some features modify the effect of treatment (e.g., different response in men & women), we should be able to identify them in an RCT.
Observational data incorporates individuals’ whims. Whimsy is a proxy for much deeper behavior. This leads to confounding, which is ordinarily problematic for causal inference and leads to spurious conclusions, sometimes completely reversing a treatment’s effect (Pearl, 2014). Confounding then needs to be adjusted for. However, here confounding helps us, exposing the underlying mechanisms its associated whims and desires are a proxy for.
Which I interpret as:
Patients could intuitively know how they will respond to treatment
Underlying mechanisms determine response to treatment
Confounding (choice of treatment ← mechanism → outcome) can help us identify mechanisms that affect response to treatment.
Now that we have demonstrated conceptually how certain combinations of observational and experimental data can provide information on individual behavior that each study alone cannot.
By definition, this statement is true: the difference between observational studies and RCTs is that we don’t let patients behave intuitively and choose their exposure in a controlled experiment. If people knew exactly that they will be either cured or harmed by a drug, an observational study could show a much higher rate of success than an RCT. However, assuming that both studies were conducted in the same population, one could easily expose the same mechanism in an RCT by including an interaction term in the model (e.g. sex as effect modifier).
What am I missing? The only reason I can think of that would make the same subgroup have different outcomes in two studies is unequal distribution of an unobserved effect modifier in these studies.
This looks like an interesting discussion growing here. I’ll try to learn from all of you and clarify my parts in this thread. I’ll start by addressing this post by @HuwLlewelyn and then I’ll follow up with the rest after I sleep.
Thank you @HuwLlewelyn for reading our paper. You mentioned that in order for a drug to both save 10% of the population and kill another 10%, “the drug would have to create both its outcome and its counterfactual outcome at the same time.” Why can’t a drug benefit some people while harming others with the same mechanism? Wouldn’t a drug that lowers blood pressure qualify? Although having multiple causal mechanisms from a treatment works in our analysis too. The key in our analysis is that all study participants (both experimental and observational) receive the same treatment.
We assume consistency, such that a person choosing treatment will be affected in exactly the same way if they had been randomly placed into the treatment group of an RCT. And a person avoiding treatment will be affected in exactly the same way if they had been randomly placed into the control group of an RCT. Of course, it’s possible that people self-medicating without medical knowledge or skills will not receive the same treatment as in an RCT conducted in a clinical setting. The causal graph should then be modeled as such. In this scenario, the observational study participants are essentially receiving a different treatment than the RCT participants.
In our example data, we observed 30% of female drug-avoiders dying. You mentioned that “the proportion of 30% dying with no drug should have been lowered by 28%.” How do you know this? Is the assumption that, because P(benefit) = 28%, 28% of drug-avoiders would die and they would’ve survived had they chosen to take the drug? Doesn’t this assume that the fact of choosing to avoid the drug doesn’t affect the proportion of people who benefit from the drug? What it seems like you’re trying to say is P(yₓ|x’, y’) = 28/30, where yₓ is survival had the drug been taken, x’ is not choosing the drug, and y’ is death. This is known as the Probability of Sufficiency (PS). It turns out that PS = 100% in this case. So for every single female that chose not to take the drug and died, they would’ve survived had they taken the drug. That’s even higher than 28/30.
I thought it was interesting how you split the proportion of individuals dying into those dying from the disease and those dying from adverse drug effects in tables 2, 3, 5, and 6. How did you get only 2% of female patients dying after choosing to take the drug, due to the disease alone in table 2? It should be that 0 of those female patients die from adverse drug effects and 100% of those female patients die from the disease alone since P(harm|female) = 0. The situation is different for males, since P(harm|male) = 0.21. But where are those 21% of men (420 out of 2,000 in observational study) distributed? They would have to be split between drug-choosers dying and drug-avoiders surviving. Your table 3 shows 406 out of 2000 patients surviving from the drug alone and 0 patients dying due to not taking the drug. This means P(benefit|female) ⩽ 20.3%. But we know P(benefit|female) = 27.9%.
I like the idea of having a split of how patients respond to the drug by their choices of taking the drug, but I don’t understand how you arrived at these numbers. It would be helpful if you could shed some light on this.
With expert knowledge and data from outside an RCT that explains a functional mechanism at play, you can certainly learn the effect U has on the outcome. However, from an RCT alone, how would you be able to tell that a certain percentage of participants (and therefore population if the RCT is representative) benefit from the treatment as opposed to having a positive outcome regardless of treatment (without the assumption of monotonicity, which is no harm from treatment)? You can compute bounds from the data, but that may only provide wide, uninformative bounds. Observational data can narrow these bounds, sometimes significantly. In your example of an RCT with 50% vs 50% results, how do you know it’s 50% of the population benefits and 50% is harmed, versus 50% of the population has a positive outcome regardless of treatment and 50% has a negative outcome regardless of treatment? You know it must be the former because there’s 100% survival in the observational study.
What if you can’t get at or be sure of the underlying mechanisms? How do you know there are interesting underlying mechanisms worth investing in understanding?
I’m eager to learn how you used observational data to make individualized inferences! I’ll be reading your paper soon.
What would prevent observational data from providing the insights we claim? Is it a potential problem with the math? Or a problem of finding real-world observational datasets that would showcase similar insights as our examples?
Thank you @ESMD for your informative response regarding observational data’s usefulness in medical studies and the regulatory and clinical processes of following up RCTs with observational studies.
We weren’t suggesting regulators and physicians are blind to observational studies. We’re suggesting that combining observational data with RCTs, in the way we’ve shown, is very useful. I’m curious about the details of how observational data is typically incorporated into treatment decisions.
Regarding follow-up observational studies as opposed to simultaneous or reverse orders of RCT → observational study, a follow-up observational study is fine for our analyses. Did we imply that was a problem?
Having said that, it seems important to realize that a drug suggesting no efficacy in an RCT (equal proportion of success in treatment and control arms) doesn’t negate the possibility that the drug is very effective for some portion of participants.
It’s just that demonstrations of such situations are few and far between. The lack of multiperiod randomized crossover studies contributes to this problem. And there is an abundance of weak or completely ineffective treatments that apparently don’t work for anyone.
This often occurs if a postmarket observational study suggests that one member of a drug class might pose a higher risk than another member for a certain adverse event. For example, several years ago, doctors became concerned that certain types of oral contraceptives might pose higher risks of venous thromboembolism (VTE) than others. Some observational studies comparing different types of oral contraceptives suggested that the risk of VTE might be higher for pills containing ingredients (e.g., drospirenone) that had anti-androgen effects on the skin. Although most combined oral contraceptives can help with acne, pills with this extra ingredient were felt to be more effective at controlling both acne and hirsutism. Even though results of the observational studies were conflicting, many physicians decided, on the basis of this safety signal, that the risk/benefit ratio for drospirenone-containing pills, in the absence of a compelling additional clinical indication (like hirsutism), was probably not worth a potentially higher risk of VTE (beyond that conferred by a regular combined pill). So as a result, we tend to have a slightly higher clinical threshold for prescribing drospirenone-containing pills.
Physicians constantly make these types of assessments. If we have the option to choose between several treatments for a given problem, all of which have demonstrated efficacy, the option we are more likely to recommend is the one whose known side effect profile would pose fewer problems for the patient in front of us (knowing that patient’s unique medical history). So arguably, we are already practising some form of “personalized” medicine every single day.
No, you didn’t imply this anywhere. But my interpretation of your argument is that our reliance on RCTs to establish treatment efficacy might be causing us to abandon therapies that could in fact be very efficacious in properly-chosen subsets of patients. My point is simply that, all math aside, even if such an assertion were true, its ultimate value is fatally undermined by ethical and pragmatic/logistical barriers that are necessary in drug regulation and medical practice. Discussions around exposing patients to treatments are constrained by unique considerations that are not present when discussing other types of “personalization” (e.g., how to target internet advertisements toward consumers most likely to respond positively to them).
Let’s pretend that we live in an imaginary world, in which drug regulators decided to follow up every RCT that failed to show efficacy of a new treatment, with an observational study in which patients got to choose whether or not to try the same treatment. Now let’s pretend, using your example, that everyone who tried the treatment in the observational study said they felt better. Now what should the regulator do? The regulator’s options are as follows:
Approve the drug for anyone who wants to try it, since there is now a signal of efficacy in at least some patient subsets. Maybe the patients who feel better after taking the treatment have some unique/rare genotype that was somehow excluded from our RCT (?) Why not just let patients decide if they want to try it to see if it helps them to feel better or not ? If they don’t, then won’t they reasonably conclude that they are just not one of the lucky “responders” and stop the treatment ? What’s the harm…(see below)?; OR
Demand that the sponsor do more legwork, to try to identify the unique/intrinsic quality of the people who responded positively to the drug in the observational study. After all, we don’t want indiscriminate release of a new drug for which 10,000 patients (or their insurance companies or health systems) will have to be treated in order to find one person with the occult characteristic that makes them respond to the dug (or maybe we do, if we’re the sponsor…lots of $$$ to be made with such an approach…).
But where to start? Do we genotype all the patients who felt better after receiving the drug in the observational study? What if they all had the same copy of a certain allele of the 3rd gene on their 7th chromosome? Or maybe the 15th gene on their 13th chromosome?..What if we notice that all the patients who felt better were born in the month of May? What if all the patients who felt better attended college/university or had a large mole on their left arm? Do we really expect that such a hypothesis-free exercise is going to yield magical new insights into what made these few patients respond? Note that this scenario is not at all analogous to the concept of selecting certain cancer therapies for patients whose tumours harbour certain genetic mutations- this type of practice has a strong foundation in known biology and is very different from the hypothesis-free data-dredging exercise described above.
For the sake of argument, let’s say we could identify a unique copy of a specific gene that is common to everyone who “responded” to the treatment in the observational study. Now what? Do we genotype every patient who wants to try the drug in order to determine if they could potentially be a “responder”? Who will pay for this? The patient? The healthcare system? Do we repeat this process for every drug that fails to show intrinsic efficacy in an RCT?..
Re “what’s the harm?” above: The flip-side to imagining that there could be individual patients who could be helped by every drug, is that we would also have to consider the possibility that there could be individual patients who could be harmed by every drug. And arguably, since virtually every drug has potential side effects, it would be unethical to expose large numbers of patients to a drug for which there is no reasonable expectation of efficacy. So we circle around to the beginning. If there is no logistically feasible way to identify very small subset(s) of potential “responders” within a population, it effectively becomes ethically indefensible to approve such a drug. Many patients could waste their life savings on treatments they have no chance of benefitting from, and many could potentially be harmed.
We generally don’t say that an RCT suggests no efficacy, but rather that the RCT failed to prove efficacy- this is a subtle but important difference. Is it possible that we have inadvertently excluded some phenotypically/genotypically unique subset of patients from our trial who might have benefited from the therapy? Yes. But there are as many potentially unique genotypes as there are people in the world. At the end of the day, we have to make decisions at the population level when approving drugs (not at the level of individual patients). We can’t simply let anybody with a home chemistry lab sell their creations to the public at large. While the home remedy might theoretically help an occasional patient, it could also kill many more…
But my point was that the argument about what type of evidence would effectively “exclude Model 1” seems to have a fatal flaw. I was trying to point out that seeing a different patient outcome arise when a patient has had the opportunity to choose his treatment versus when his treatment is assigned to him really tells me nothing useful about the intrinsic efficacy of the treatment in question.
I recall @Robert_Matthews mentioning in a video (with Jim Berger and @Sander on p-values) that the first 5 RCTs of streptokinase were interpreted as “negative”, but a meta-analysis of them showed a beneficial effect. It has been hard to find the original studies (or the meta-analysis he refers to), but I think this is would make a great case study for applying decision theory to clinical trials and daily medical practice. The best I could find was:
It is mainly a problem of how you appear to use the math to model reality. The math can be perfect but their use and interpretation can be misleading. This is why pure mathematicians are typically so bad at doing statistics. They miss the “science” part of statistical science. In particular, your paper claims that the approach (emphasis mine): “…incorporates individuals’ whims and desires that govern behavior under free-choice settings. And, since such whims and desires are often proxies for factors that also affect outcomes and treatments (i.e., confounders), we gain additional insight hidden by RCTs”.
This free choice assumption is incorrect in the vast majority of settings. Note that most treatments are typically prescribed by clinicians. Patients don’t just choose them freely. But neither do contemporary clinicians paternalistically just force the treatment on a patient. What we strive for is an in-between between free choice and paternalism called shared decision-making.
I do applaud your efforts and will continue to follow them closely. If there is one article from the whole statistics literature on the topic I would absolutely recommend you read and carefully dissect to inform your endeavors, it would be this one.
Thank you @scott for your detailed response to my post 62. I will take each of your questions / points in turn.
My understanding is that ‘benefit’ implies an average change in outcome (e.g. change in probability of death) for a set of patients in a beneficial direction that will be caused by some mechanism. Harm implies a change for the same set of patients in an opposite direction to that of benefit due to another mechanism. Clearly one of these changes and its counterfactual cannot happen for the same set at the same time. However the set might be subdivided into subsets (with different underlying processes) where those in one subset are harmed and those in the other benefit. The average result obtained by combining both subsets into a single set might result in no change, benefit or harm. Harm and benefit may take different forms (e.g. pains in various sites of the body or different restrictions such as breathlessness, weakness, etc.) that can be represented by a number of sets. These harms and benefits can be combined by making many strong assumptions by using the concept of ‘utility’ in decisions analysis. However, in your example harm is represented by death due to disease or death due to some other unspecified process (e.g. an adverse drug effect).
A drug that lowers BP to more normal levels benefits by preventing long or medium term structural damage to blood vessels resulting in rupture or blockage. The drug can cause harm in the short term by causing too low a BP, rapidly depriving tissues of blood flow especially of the brain and heart thus causing damage by a different mechanism (e.g. falls and fractures).
Allowing patients to decide their own treatment immediately changes the way they are selected for treatment or the way they take it thus potentially forming two different subsets of (1) those treated in exactly the same way as in the RCT and (2) those treated differently.
Maybe I have misunderstood, but this assumption appears to be inconsistent with your data. In the RCT there is a beneficial 79-51=28% reduction in the proportion dying on treatment in males and females. However, in the observational study there is a harmful increase of 73-30=43% in the proportion dying in females on treatment and no change in death rate in males (30-30=0%) so they are affected differently by the treatment in the RCT and observational study.
According to your assumption of consistency (or exchangeability) the change in proportion dying due to disease on treatment compared to no treatment should be the same in the RCT and observational study group. The effect of treatment should be to reduce this risk by a difference of 28% in both groups. Therefore, if 30% die on no treatment in the observational study, if there is consistency, then on treatment the risk of dying from disease in the observational group should be 30-28=2%, so the number avoiding death due to disease is 100-2=98% (but many others in the observational study will have died of adverse effects).
What I am saying is that for males and females in the observational study is that for death due to disease and survival from it alone, p(Y|X) = 0.98 and p(Y’|X)=1-0.98 =0.02 and also p(Y|X’) = 0.7 and p(Y’|X’)=1-0.7=0.3. Thus for disease p(Benefit) = p(Y’|X’) - p(Y’|X)=0.3-0.02 = 0.28 or p(Y|X) - p(Y|X’) = 0.98-0.7 = 0.28. This p(Benefit) = 0.28 is based on the RCT result of p(Yx|X) - p(Yx|X’) = 0.49-0.21 = 0.28 or p(Y’x|X’) - p(Y’x|X) = 0.79-0.51 = 0.28.
I’m unclear about the meaning of p(Yx|X’, Y’). Does it mean p(Yx|X’∧Y’)? If so then does not {Yx∧Y’} = ∅ and so{Yx∧X’∧Y’} = ∅ (i.e. both are empty sets as both contain an intersection of an event Yx and its complement or counterfactual Y’). If so, is not p(Yx|X’, Y’) = PS (your Probability of Sufficiency) undetermined? What does ‘The probability of survival HAD the drug been taken conditional on the patient not choosing the drug AND being dead’ mean? Is it equal to the proportion of those who have died and not chosen the drug before death who would have survived if they had chosen the drug? Surely if this was the case, their choice before death is immaterial. The probability of survival from the observational study if they had taken the drug is p(Y|X) = 0.27 for females (see column 5, row 4 in my Table 1) and p(Y|X) = 0.3 for males (see column 5, row 4 in Table 4). when you say that PS = 100% in this case, I have clearly misunderstood what you are doing here. I would be grateful for clarification.
In the observational study where no drug was taken proportion of males and females dying of disease was 30%. You state that such patients would respond in the same way to the drug by reducing the proportion dying of disease by 28%. Therefore in the observation study, the expected proportion of males and females dying from disease on the drug would be 30-28=2%. However, 73% of females actually died in the observation study on the drug, so the excess deaths were 73-2=71%, these requiring a different explanation in terms of mechanism of death rather than disease, e.g. adverse drug effects. However, 30% of males actually died in the observation study on the drug, so the excess deaths were 30-2=28%, these also requiring a different explanation in terms of mechanism of death e.g. adverse drug effects that cause fewer deaths than in females.
How did your calculations arrive at this result of zero female patients dying from adverse drug effects so that 100% of those female patients die from disease alone? Also how did your calculations arrive at this result of 21% of male patients dying from adverse drug effects?
I assume that you are referring to Table 4. In the observational study, 420 men survived and 180 died out of the 600 choosing not to take the drug in the observation study (see column 6, row 4 of my Table 4). As these 600 men were not taking the drug, none could have died from its adverse effects, so 180 had to have died of the disease. In other words 600 men avoided adverse effects by not taking the drug, so that 180/600 ended up dying of the disease and 420 survived the disease. Therefore conditional on the fact that a man had died when not taking the drug there was obviously a ‘retrospective’ (looking back) probability of 180/180 = 1 that he had died of the disease and a probability of 1-1=0 that he had died for an adverse drug effect.
In column 5 row 5 of my Table 4, 420 died out of a total of 1400 on the drug. Out of these 420 who died, 28 were killed by the adverse effects before they had a chance of dying from the disease, the remaining 392/420 dying of disease. Therefore conditional on the fact that a man had died when taking the drug, there was a ‘retrospective’ probability of 28/420= 0.067 that he had died of adverse effects and a probability of 1-0.067=0.933 that he had died from the disease.
In my Table 1 column 5, a total of 1022 out of 1400 females died after choosing to take the drug and from Table 3 column 2, 994 died of adverse drug effect before they had a chance of dying from the disease. This means that 1400-994=406 females on the drug did not die of adverse effects but 28 out of these 406 went on to die of disease. This implies that conditional on being female and having chosen to take the drug and dying, there was a ‘retrospective ‘probability 994/1022= 0.973 that the death was due to the harm an adverse drug effect and a probability of 1-0.973=0.027 that the death had been due to disease. However of the 600 females who chose not to take the drug 180 died and 420 survived. Obviously none of the females who did not take the drug could have died of its adverse effects so conditional on being female and dying when not on the drug the ‘retrospective’ probability of having died of its adverse effect is zero and having died of the disease is 1-0=1.
Comment
The ‘retrospective’ or ‘hindsight’ probabilities in the two preceding sections are unhelpful for making decisions to treat or not to treat. Decisions are based on ‘prospective’ probabilities. Thus if a drug is used in the same way in the observational study as in the RCT, for men and women the probability of death due to disease without treatment is 0.79 and with treatment it is 0.51 (showing 28% benefit of avoiding death). The treatment has a clear advantage over no treatment and barring other unwanted effects, it should be given.
However, when patients self medicate with no supervision so that they often suffer fatal adverse effects, the situation is very different. The probability of a female dying on treatment is now 0.73 and the probability of dying on no treatment is 0.3 so that the treatment should not be given ‘over the counter’ in this unsupervised way. The probability of a male dying on treatment is now 0.3 and the probability of dying on no treatment is the same at 0.3, so that treatment offers no advantage in terms of reducing the risk of death (but will cause inconvenience, may cause expense and minor side effects) and should not be given unsupervised ‘over the counter’.
I know it because I created this extreme example of qualitative interaction. That said, I don’t see why we couldn’t learn this from RCT data alone. For example, researchers could explore heterogeneity of treatment effect in relation to genetic variants based on biobank samples collected during the experiment. Sure, it requires additional resources, but so does conducting high-quality observational studies.
This is a terrific discussion. One general comment related to the value of observational data. By experience we’ve found that observational data are much more useful for understanding drug safety than for helping to understand drug efficacy. The reason is pretty simple. Physicians don’t prescribe drugs with the intention of affecting their patients’ safety profiles. Safety events are not intended consequencies of the treatment. Confounding by indication is far less pronounced. Efficacy, on the other hand, involves outcomes that are intentionally being attempted to be modified, and confounding by indication is much more lethal.
Here is an interesting introduction to statistical methods that attempt to address the challenge of confounding by indication with a case study involving data from traumatic brain injury.
Dr. Harrell’s point is a subtle one I need to think about. I do not hold the “Religious Bayesian” perspective that argues randomization is harmful. Randomized experiments are mini-max strategies to address a situation where there is a high probability of bias dominating variance of an estimator of treatment effect.
But I do believe randomization is given much too great a weight in evidence assessment than a principled Bayesian decision analysis would justify. Those who read studies (as opposed to conducting them) only have a report that the study was randomized without any way to judge how adequate that randomization procedure actually is. If we accept the likelihood principle (as any Bayesian does), design info is irrelevant (strictly speaking), and all information is contained in the likelihood.
As a fundamental matter of first principles, I do not see why a purely observational approach (conditional on reliable, trustworthy data) might not eventually discover causal mechanisms (including valuable treatments), much like evolutionary processes converge to beneficial adaptations.
Why couldn’t a sequence of Bayesian derived and analyzed observational studies, ultimately converge to the appropriate causal model?