Causal inferences from RCTs- could “toy” clinical examples promote understanding?

Misunderstandings about the types of causal inferences we can make from RCTs abound. They make their way into journals, propagating like viruses and lodging in the brains of other researchers and clinicians. Many have tried in vain to dispel them, attempting, futilely, to undo damage that has already been done.

These misunderstandings can impact clinical care. Every day, clinicians address causal questions posed by their patients. The most ubiquitous inferential error made by patients in assessing their own symptoms is “post hoc, ergo propter hoc.” This is not an easy error to explain to patients and some physicians make it too. And when a clinician makes it, he ends up misinforming his patients.

Maybe it could help to provide examples of the types of cause/effect questions that clinicians might face from patients and to reason through the responses that the clinician should provide, in layman’s terms.

The purpose of this post is to promote understanding by eliciting accurate, yet simple, narrative explanations (without statistical jargon) for commonly-misunderstood causal RCT concepts. All discussion is appreciated.

“Toy” examples

Example 1- A pharmaceutical company conducted a randomized, placebo-controlled clinical trial, in which patients with moderate to severe asthma were randomized to add either a new drug inhaler or a placebo inhaler to their usual treatment. Patients were followed for 3 years and all of their asthma exacerbations were recorded by their family physicians. At the end of 3 years, the exacerbation rate in each arm of the trial was compared. The rate was 30% higher in the placebo arm than in the new treatment arm.

Susan was enrolled in this clinical trial and her family physician had a record of all of her asthma exacerbations in the years leading up to the trial. After her trial participation ended and the study’s results had been published, Susan found out that she had been receiving the new drug (i.e. non-placebo) inhaler during the trial. Her family physician noted that the frequency of her asthma exacerbations had been lower during the time she was enrolled in the trial. After the conclusion of the trial, he tells Susan: “That new inhaler worked really well for you!”

Question- Is Susan’s physician’s inference/statement defensible? Why or why not?

Example 2- A pharmaceutical company studied the ability of a new diabetes drug to lower blood sugar, as compared with approved “standard of care” drugs. In order to ensure that the new drug was not having unintended adverse cardiovascular effects, the drug regulator mandated that the trial follow a large number of patients for several years. At the end of the trial, while analyzing adverse events recorded during the trial, the regulator noticed a higher rate of fractures among patients in the new treatment arm. In spite of this safety signal, the drug was ultimately approved, with the product monograph flagging the observed increased fracture rate among patients enrolled in the trial.

Cathy was enrolled in this clinical trial. During the course of the trial, she suffered a painful vertebral compression fracture. After seeing the product monograph labelling for the approved drug, Cathy asks her family physician if the new drug had “caused” her fracture.

Question- How should Cathy’s family physician respond?

Example 3- A global public health crisis caused by a novel infection prompted some physicians to begin prescribing an already-approved medication for “off-label” purposes, on the basis of mechanistic reasoning, hoping that it might work in treating the novel disease. A large RCT studying this question was ongoing but wouldn’t be completed for several months. Arguing that there was “nothing to lose,” Dr.Jones prescribed the drug off-label to any of his patients who asked for it. Other doctors were more circumspect than Dr.Jones, preferring to wait to see the results of the RCT. At the end of the trial, 3% of patients in each arm of the RCT had died. In spite of the RCT results, Dr.Jones continued to prescribe the drug off-label, arguing that:

“All of the patients to whom I have prescribed the drug have survived,” which is a pretty strong signal (to me, at least) that the drug MUST be doing something, in at least SOME people. MAYBE, instead of concluding that the drug is a “dud,” we should consider the possibility that the RCT result appeared neutral because the drug is actually capable of helping some people but harming others, AND that the proportions of those helped and harmed IN EACH ARM of the trial (as a result of being offered or not offered the drug) BALANCED OUT PERFECTLY, to yield the same mortality rate in each arm (?) AND MAYBE, if we were to just leave patients to their own devices and stop standing in their way (like we did during the RCT), those who were “destined” to benefit would somehow INSTINCTIVELY seek out the drug (patients know their bodies best, after all), while those “destined” to be harmed would not seek it. This is a free country after all…who am I to stand in the way of what a patient wants?”

Question- What’s wrong with Dr.Jones’ reasoning?

5 Likes

Interesting question! I’d offer just the following reflections:

  1. The physician here has missed an opportunity to ask about Susan’s experience. Before her unblinding, had she formed any idea whether she was getting placebo or the study drug? What did she notice (or not notice) that shaped her guess about that? Did she experience any adverse effects from the drug? Also, one often hears that participants in trials get exceptional care even in a ‘standard care’ arm. What was Susan’s experience in that respect? Did her participation in the trial change her own self-management of her asthma? Did she re-home her cat during this time?

  2. It would be worth keeping in mind that the current state of science has much to do with whether Cathy’s question can be answered scientifically. Since a risk of fractures was apparently not anticipated (emerging as a mere signal while hunting for CV effects), perhaps there is not yet even a proposed mechanism. One can imagine, however, that at some time in the future a mechanism gets worked out and is linked to some genetic polymorphism. If Cathy proved to have this polymorphism, but no other risk factors on trial entry, then the drug retrospectively would seem the most likely culprit. Also, remarks from the radiology report might be helpful. (How does the rest of the spine look? What about interval change from prior imaging?) If this painful fracture was treated operatively, then the surgeons might have the best-informed opinion on Cathy’s question.

  3. This scenario is totally outlandish, and would never happen in reality.

4 Likes

Hi David

Thanks for taking a stab at this! I’ll give any others who are interested a bit more time to respond before providing my own reasoning, but my answers to the Questions 1-3 in the original post would be:

  1. Susan’s physician’s statement is not defensible.
  2. Cathy’s physician should say that, while it’s not impossible that the new diabetes drug contributed to her compression fracture, there is no way to say, with any certainty, that it did contribute.
  3. I completely agree- Dr.Jones’ reasoning brands him as a clinical menace who should probably lose his medical license.
2 Likes

Okay, here’s an attempt at answering Question 1 using simple language:

The trial result suggests that the new asthma inhaler is not a “dud.” In other words, it can work; it can reduce patients’ asthma exacerbation rates. If the sponsor is lucky, the drug regulator will approve it (though the regulator will usually require more than one positive trial for approval). However, this result doesn’t tell Susan’s physician whether the new inhaler had any effect on Susan’s personal exacerbation rate, nor whether it will reduce her exacerbation rate if she continues to use it after approval. Similarly, the trial result will not allow Susan’s doctor to predict the effect of the inhaler in other future individual patients.

Most clinical trials aiming to demonstrate intrinsic drug efficacy will randomize each patient to receive a single experimental treatment over the course of the trial. Patients are exposed, in a blinded manner, to EITHER the new drug or a placebo for the entire duration of the trial. At the end of the trial, average outcomes among patients in one arm are compared with average outcomes among patients in the other arm.

Susan’s exacerbation rate was lower during the time she was enrolled in the trial but we can’t say why it was lower given this trial design. Many chronic medical conditions wax and wane in severity over time, often for unclear reasons. Therefore, we can’t usually (see below) determine the cause of a patient’s improvement after observing only one exposure period. If a patient improved during treatment, we can’t tell whether the drug made her better or she just got better on her own. We will be much more confident that the drug is causing her improvement if see her improve repeatedly, each time she goes on the drug, and worsen repeatedly, each time she stops the drug.

During any particular time window, there will be many factors in Susan’s life that contribute to her asthma exacerbation rate (e.g., whether she is cat-sitting for a relative, seasonal allergic triggers, whether she has managed to quit smoking, viral exposures…). Her exacerbation rate will not be determined solely by the type of inhaler she happens to be using. If she were to participate in an identical trial 6 months later, and to be randomized again to the new drug inhaler, her personal exacerbation rate might actually be higher during the trial than before the trial. What we wouldn’t know is whether her rate would have been even higher if she hadn’t been using the new inhaler during that particular time window … Other patients might have experienced a lower exacerbation rate during the trial even though they were randomized to placebo; they just got better on their own.

The positive overall trial result implies that the drug plausibly could help Susan. Her physician can feel comfortable prescribing it to her in the postmarket setting. But if Susan wanted to find out whether she is actually (rather than just plausibly) being helped by the new drug, she would need to participate in an “N-of-1” trial during which she was switched, in a pre-planned and blinded fashion, between the new drug inhaler and a placebo inhaler. If she were to experience lower exacerbation rates during several periods of new drug exposure (“positive rechallenges”), as compared with several periods of placebo exposure (“positive dechallenges”), she could infer that the drug is helping her. All those other potential triggers for her exacerbations would not be expected to wane, coincidentally, in concert with her periods of new drug inhaler use.

In short, we usually need to observe periods of both exposure and non-exposure if we want to assess causality at the level of individual patients. However, most clinical trial designs only allow us to observe one intervention for each patient (exposure to new drug OR placebo). In general, we can’t infer causality for individual patients if we have observed neither positive dechallenge nor positive rechallenge. Exceptions would include clinical scenarios where 1) drug exposure results in reversal or dramatic slowing of a condition that otherwise has a highly predictable clinical trajectory (e.g., epinephrine rapidly reversing the symptoms of an anaphylactic reaction or an aggressive tumour “melting away” on imaging after a patient is exposed to a new cancer therapy); 2) drug exposure is followed by rapid occurrence of an adverse event that is unlikely to occur spontaneously (e.g., a multi system hypersensitivity reaction); or 3) drug exposure is later found, through observational studies, to be strongly associated in a population with an otherwise rare clinical condition (e.g., in utero diethylstilbestrol exposure and subsequent development of clear cell vaginal cancer in female offspring).

3 Likes

This seems to me the clearest and most vivid part of the explanation. In an answer to Susan, it might make sense to start with this: “To really answer your question, you would have to do some self-experimentation …”. Having set forth this N-of-1 trial concept, the shortcomings of the RCT vis-à-vis her individualized question could then be explained more easily I think.

3 Likes

Thanks David. Agree it would have been good to lead with the “N-of-1” paragraph, though explaining why the crossover approach is needed, in non-statistical language, required all the other rambling :slight_smile:

It’s interesting how causality assessment is such a huge and reflexive/routine part of clinical practice, yet physicians don’t really recognize the process for what it is. The challenges associated with assessing causality in individual patients are appreciated at a deep level by most physicians, even though most of us would struggle to offer statistics-based explanations for our assessments …There is definitely more than one way to internalize these concepts. Physicians tend to internalize them by applying clinical reasoning after receiving, during our training, a crash-course on the limits of inductive reasoning. We don’t learn these concepts by studying statistical theory.

Having spent many years doing causality assessments for adverse events reported in clinical trials and the post-market setting, I can attest that it was very difficult, if not impossible, to make non-clinicians (without statistical backgrounds) understand clinical causal reasoning. For example, there was a pervasive belief among non-clinicians that seeing an adverse event listed in a drug’s product monograph or in a clinical trial database, implied that the drug should be considered a likely culprit when assessing any future report of that event. For people without either a clinical OR a statistical background, causality assessment is not an easy task.

3 Likes

Excellent thread. This classic article by Rothman and @Sander may also help when thinking through such situations.

In example 1, there is an intriguing asymmetry: if Susan’s symptoms improved with the new inhaler, there is more potential noise compared to if her symptoms worsened or if she experienced unexpected side effects. The reduced exacerbation rate with the new inhaler in the RCT would make it more surprising to see higher exacerbation in Susan’s case — and that surprise is information if validated. While it could very well ultimately be a false lead (and often is), exploring these potential signals can be more fruitful than those confirming our expectations.

2 Likes

My understanding: the frequency and severity of asthmatic events varies - depends on exposures for example. Dr. Jones should have said instead: we don’t know if it helped you for the above reason. Thank for your service to clinical science, which will help others with your condition in the future.

2 Likes

Proposed simple language answer for Question 2 in the original post:

Compression fractures are common clinical events, even in the absence of exposure to drugs. Non-clinicians (like Cathy) probably would not know this. For this particular adverse event, there was no opportunity to “dechallenge” Cathy to see whether her fracture would resolve - compression fractures don’t resolve once they have occurred. The between-arm fracture rate imbalance in the trial constitutes a signal that the drug might increase the risk of fracture, so we can’t exclude the possibility that the drug contributed to Cathy’s fracture. But her physician can’t, with any certainty, blame the drug. If the adverse event in question had been much more rare, clinically speaking, and one that is known to be often associated with drug exposure, and had occurred soon after her exposure (e.g., anaphylactic reaction), the physician could be more confident about the role of the drug.

Neither Susan’s nor Cathy’s physician has ever taken a statistics course. So what allows them to accurately gauge causality in scenarios 1 and 2 ? Answer: over many years of training and practice, both of them have seen that 1) patients’ asthma control often waxes and wanes for unclear reasons, even when they have had no change in their usual inhaler regimens; and 2) patients present commonly with compression fractures and many of these patients are on no medications. Through clinical practice, they have internalized the reason why causality can not usually be determined at the level of individual patients enrolled in a trial.

1 Like

A friend once promoted her belief that idiotype vaccines work - that she benefited from a cancer vaccine given to her after intensive chemotherapy for lymphoma - given to her in a single arm study.

Later when a large randomized trial found no difference in groups that had the vaccine or a placebo vaccine as consolidation following chemotherapy, she argued that some patients benefited. More worrisome to me, the PI in the study, also claimed that a subset benefited based on having a genomic variant of lymphoma. It became an issue for me as an advocate because false beliefs when promoted online can persuade frighted patients to travel abroad to access treatments based on claims that have no real basis.

Patients with life threatening diseases read the literature - or are assisted by their children and friends. Therefore the words we choose in the literature matter - explaining to the pubic what trials can and cannot tells us matters. Plain language sections on the purpose of trials might be added to all reports on the results of studies for example - such as that an RCT can tell us which group - but not which individual - benefited from an intervention. The purpose is to guide future patients; the results cannot explain individual outcomes.

3 Likes

Thanks Karl! You make some great points.

Your story about unscrupulous practitioners (who are becoming increasingly common) has prompted me to offer an answer to Question 3 in the original post.

Dr. Jones is a clinical menace because he is making the same inferential error that patients (without medical training) make- post hoc ergo proper hoc. He sees that his own patients survived after taking a treatment that he prescribed and he concludes that he has “saved” them (i.e., that the only reason they are still alive is because of his prescription). He has inferred individual level causality without grounds. His prescribing is grounded in neither a prior demonstration of intrinsic drug efficacy (as would have been demonstrated by a “positive” RCT) nor demonstration of individual causality (as his rogue “N-of-1” trials did not permit him to observe individual patients’ responses to drug dechallenge then rechallenge). So his practice lacks any possible justification and he is on very shaky grounds, medicolegally speaking.

Showing that a therapy can work is a higher order requirement than showing that it did work or will work for individual patients. In your lymphoma example, no regulatory authority would have approved a vaccine that hadn’t demonstrated efficacy via an RCT. So if there were practitioners, somewhere in the world, offering to provide the vaccine in spite of the non-positive RCT, they would, presumably, have been offering a non-licensed product (?) This type of practice, targeted toward desperate, vulnerable patients, is despicable.

Preventative therapies (presumably like the lymphoma vaccine in your story) would not be amenable to either a crossover design RCT or an N-of-1 trial. We can’t give a vaccine, then take it away repeatedly in order to assess “response” at the level of individual patients. Once it’s been given, it’s in the body and can’t be taken away. In this scenario, an RCT would involve a single period of observation for each patient (patients would be randomized to placebo or vaccine, then monitored to see what proportion in each arm experienced lymphoma recurrence over time). With this design, a “positive” trial would only tell us that the vaccine is not a dud. This is the bare minimum evidentiary requirement that must be met for a prescriber to justify prescribing a preventative therapy.

2 Likes

I’m not sure how common it is, but some investigators offer endless single-arm studies with products that appeal to wishful thinking but with unethical (in my view) designs because they cannot realistically answer the study question. The most egregious of these “investigators” is Burzinski in Texas. Other perhaps well-meaning investigators offer lower risk vaccines with plausible but unproven benefit outside of the US in single-arm studies. So patients may continue to be offered study drugs, not licensed drugs.

1 Like

Wow, that’s…disturbing. Don’t these trials need ethics approval and approval by a regulatory agency before they proceed? Even though I don’t live in the U.S., I’m pretty sure FDA wouldn’t green-light this practice (?)

1 Like

The ethical issue I see in some single-arm studies is only my take or worry on it - I can’t declare most as such especially at the point of IRB review. The feasibility and pilot studies where there’s a questionable intent by the sponsor to take it to the next step to prove or disprove there’s clinical benefit - sometimes done for a cousin indication for which off-label use can be indirectly promoted by the sponsor (again my worry for which I have no proof). That said, the endless Burzynski trials that the patients pay for out of pocket (in the Great state of Texas) are criminal.

1 Like