Treatment effect heterogeneity: or is it?

I frequently see the claim that humans respond differently to the same medication, and I believe this is true.

Can we discuss how one would prove this & quantify it?

You enroll 500 participants with BP > 150/90 mm Hg, randomize them to active drug or placebo, and measure BP later.

I believe some attribute inter-individual differences in change from baseline in the treatment arm to differences in drug effect. This seems completely silly, as highlighted by the fact(?) that you’d find inter-individual differences in change from baseline in the placebo arm. You would also find inter-individual differences in change from baseline if you enrolled, gave no treatment, and simply re-measured BP later, so say I anyway. Life is noisy, measurements are imperfect, and regression toward the mean has a relationship with the initial value.

First, are my intuitions above correct? Second, in your field of interest, has actual treatment effect heterogeneity been measured in a meaningful way? Can you share a reference? It seems a cross-over study of different drugs administered in random order to a group of people would be needed to comment on whether there are actual interindividual differences in drug response? Is that so?



im not sure it answers your Q, but stephen senn used BP as an example in this paper to describe regression to the mean and thus inter-patient differences: Three things that every medical writer should know about statistics Hence he concludes we “should think comparatively. Controlled clinical trials are about comparisons…” There was recent discussion on twitter indicating that this point escapes some people somehow?: discussion re heterogeneity in control response across studies. But senn has a very good presentation here (altho the example is migraine, not BP): is precision medicine terminally ill


Stephen Senn is the authority here so I’m glad Paul provided those links. Demonstration of HTE in general requires multi-period crossover studies as Senn details, and he makes the point that for typical parallel group (non-crossover) designs, results can’t distinguish between larger efficacy in a small number of patients from smaller efficacy in all patients.


Thank you, Paul & Frank. Great answers, and I think my intuition was correct about the issues at hand. I will dig in further.


Brian, the longitudinal experience of the care of a patient must produce a similar faith in any physician, I wager. As a hypertensologist, you will have seen some patients achieve good BP control on low monotherapy doses, and others whose hypertension ably ‘defends itself’ against attack on multiple [mechanistic] fronts with intensive combination therapy.

How long could an anesthesiologist remain an HTE-skeptic in the OR? (Indeed, the inventor of pharmacometrics was Lewis B. Sheiner, MD — an anesthesiologist!) How about an intensivist titrating drips in the ICU? The crossover trial might be appreciated as mirroring this longitudinal dimension in clinical care, and perhaps the N-of-1 trial as an even more faithful scientific (and ethical) image of care.

How did we get ourselves into this fix, where the very existence of HTE must be ‘proven’? Surely, the idea that HTE=0 belongs fully in the category of null hypotheses which Meehl described so well as “quasi-always false.” My guess: we’ve uncritically accepting the RCT (with ‘hard’, univariate endpoints) as a ‘gold standard’. Because such ‘confirmatory’ trials are so ill-adapted to eliciting or demonstrating HTE, we suddenly find ourselves doubting the clear deliverance of the vital experience of clinical care.

Though I like to emphasize judging each study on its own merit, the track record for rigorous demonstration of HTE is so weak that my prior probability is very informative against the belief that HTE exists on an appropriate scale. And often it requires multi-period crossover studies to be convincing about HTE in general.

This is a good paper reporting a study design that should allow estimation of HTE as far as I can tell

1 Like

David, I believe that variation in the human genome & lifestyle and consequent variation in human physiology does mean it’s likely people will respond differently to the same medications. And I agree that one environment in which it should be most manifest is the ICU: potent, rapid-acting drugs with known read-out in terms of blood pressure, etc. But there is always the nagging concern “compared to what?” Yes, BP improves if you start x, y, or z intervention and to a different extent. Would it have improved in some people if you merely watched? The answer might be yes, but people will almost always take some action. Will the BP improve more in people with a particular initial value compared to others with a different initial value? I’m afraid I find measuring the HTE pretty, pretty subtle.


Whether HTE manifests in RCT data is an entirely different question from whether it exists. I bet if you sat beside an anesthesiologist through N=2 OR cases, you’d believe in HTE, Frank! That you could have this “very informative prior against HTE” after having analyzed data from N \sim O(10^6) RCT subjects speaks more to the limits of RCTism than to the existence of HTE.

This is where I was aiming in a recent tweetorial where I identified such biostatistical views as “I have an informative prior against HTE!” with Ernst Mach’s famous “I don’t believe atoms exist!” Like Mach, who rejected atoms because they did not appear an an element of his immediate experience (e.g., the readings on dials of lab equipment), it seems biostatisticians reject HTE because it does not appear in their own immediate experience of tabulated RCT data.

We could let this argument go on forever. I’ll just register strong disagreement and stop at that. To clarify one thing: HTE tends to be either absent or non-measureable in my estimation in chronic or multifactorial disease in which pharmacology/physiology is not the major player.

1 Like

Brian, I wonder whether it may be a fool’s errand to seek ‘HTE’ directly in the sort of data typically tabulated for purposes of medical statistics. Perhaps a productive discussion of the question of HTE in clinical setting X necessarily depends on first advancing one or more mechanistic theories containing realistic parameters which we could confront by bringing multiple lines of evidence into convergence. (This confrontation and convergence might be regarded as similar to what statisticians would call—respectively—‘parameter identification’ and ‘prior elicitation’.)

Maybe ‘the question of HTE’ will remain interminable (like ages-old debate on atoms going back to Democritus) until we begin advancing substantive mechanistic theories like Boltzmann’s statistical mechanics or Einstein’s theory of Brownian motion. Continuing to operate entirely within the fold of biostatistics and its limited methods may amount to the same thing as perseverating over formal manipulation of the same old equations of thermodynamics, hoping in vain for new insight.

I don’t see where biostatistics is limited here. I see that (1) data are limited and (2) we need more basic science to teach us which things to measure.


Here, I agree with you, Frank. I am, after all, pointing to Boltzmann’s statistical mechanics as progress within Physics which we ought to emulate in Medicine! I have previously suggested that incorporating time-series methods like recursive filtering into the biostatistical core would go far in addressing the deficiencies in the current biostatistical outlook. So this problem does not necessarily implicate the very foundations of statistics.

The deepest root of the problem might best be sought in culture and power, although a discussion of that sort goes beyond the bounds of this forum, probably.