Dichotomization

Johannes_Schwenke · June 7, 2026, 10:52pm

Stephen has given great presentations on this topic, I shall try to add a link later. Guyat et al. are, unfortunately, just mistaken. Even if the treatment works a little bit for everyone, dichotomizing can make it look like it works for some and not for others. You then end up with a NNT > 1, even though everyone who receives the drug benefitted. I might be worth recreating Stephen’s analysis in R and sharing it do people can experiment, e.g., in a shiny app.

Sander · June 7, 2026, 11:39pm

It looks to me that the main problem that Guyatt et al. missed and that Senn forcefully pointed out is an example of what is now known as the nonidentifiability of individual causal effects from average causal effects. This nonidentification problem has been analyzed extensively in many contexts, including the common confusion in the attributable-fraction literature of the excess fraction (the fractional increase in caseload from a cause) with the etiologic fraction and the probability of causation (the proportion of cases affected by a cause), reviewed for example here:

Greenland, S. (2015). Concepts and pitfalls in measuring and interpreting causal attribution, preventive potential, and causation probabilities. Annals of Epidemiology, 25, 155-161. https://doi.org/10.1016/j.annepidem.2014.11.005

and the confusion of the so-called “D-value” (the probability that a randomly chosen treated patient has a higher response than a randomly chosen control) with the proportion of patients who have a higher value after the treatment (the probability of harm if a higher value is bad, or the probability of benefit if a higher value is good), which is reviewed here:

Greenland, S., Fay, M.P., Brittain, E.H., Shih, J.H., Follmann, D.A., Gabriel, E.E., and Robins, J.M. (2020). On causal inferences for personalized medicine: how hidden causal assumptions led to erroneous causal claims about the D-value. The American Statistician, 74, 243-248, https://doi.org/10.1080/00031305.2019.1575771, open access version at On Causal Inferences for Personalized Medicine: How Hidden Causal Assumptions Led to Erroneous Causal Claims About the D-Value - PMC

In all these cases the confusion can be seen in works by otherwise well-qualified statisticians, not just pure clinicians. Yet it is easy to illustrate the conceptual mistakes by using two 4-patient examples with two perfectly-matched pairs in which one member of each pair will be treated and the other will remain untreated. Perfectly matched means that within each pair the patient responses to a given treatment will be the same. Given those responses for each pair, we can calculate both average and individual differences in responses.
In the following you might imagine the response is number of months until relapse; then we see that the average differences do not tell us much about the individual differences or the proportions that benefit (responders):
Example 1.

Patients in pair 1 have response 25 when treated, 15 when not treated, a difference of 10.
Patients in pair 2 have response 15 when treated, 5 when not, also a difference of 10.
Then the average responses when treated and when untreated are 20 and 10,
so the average difference is 10 and that equals all the individual differences.
The proportion that benefit from treatment among the treated is 100%.

Example 2.

Patients in pair 1 have response 25 when treated, 5 when not treated, a difference of 20.
Patients in pair 2 have response 15 whether treated or not, a difference of 0.
Then the average responses when treated and when untreated are again 20 and 10,
so the average difference is 10 as before but that equals no individual difference
and the proportion that benefit from treatment is only 50%.

The problem of an ordinary one-period study is that we don’t observe perfectly matched or even well-matched pairs. Instead it is as if we only get to see one member of any such pair (if it ever existed) and then only under one treatment condition (treated or untreated).
Hence we only get to see the averages, which can’t tell us the individual differences or the proportion benefited or harmed (although the basic data can provide bounds on those quantities within the observed population, without further assumptions those bounds tend to be uselessly wide).

This nonidentification problem is more basic than problems from random variation, as it would remain unchanged even if we had a billion patients. It is also distinct from problems due to dichotomization or other coarsenings or degradations of data (such as the mistake of using of percentile categories) as discussed in sources mentioned earlier in the thread and in

Greenland, S. (1995). Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. Epidemiology, 6, 450-454.
Greenland, S. (1995). Problems in the average-risk interpretation of categorical dose-response analyses. Epidemiology, 6, 563-565.

As I see it, the main place statistical methods can help us out of the individual nonidentification problem is by developing models that predict individual responses under different treatments, where accuracy considerations depending on the quality and amount of data come into play. That is one of Frank’s areas of expertise so I’d leave further observations about that to him.

ESMD · June 8, 2026, 3:04am

Indeed. I’ve watched many of the Senn talks. The concept of “NNT” seems to be founded in yet another misconception- the idea that the risk difference obtained from an RCT reflects some sort of stable/intrinsic property of a therapy, rather than being an ephemeral result that reflects the particular convenience sample from which it was generated.

f2harrell · June 9, 2026, 9:41am

Great discussion. It is really too bad that Guyatt writes on statistical topics but does not have the necessary statistical background for doing so.

The simplest error of logic I can think of that fans of responder analysis make is that thresholds for interpretation of results should lead to thresholds and associated dichotomization of the raw response variable. Were that the case, olympic track coaches would not need stopwatches and cars would not need speedometers.

Another simply stated problem with responder analysis is that a treatment may move those above a threshold to be far above the threshold, but the proportion “responding” would miss that.

ESMD · June 10, 2026, 1:09pm

It’s crucial that the historical roots of the responder analysis fiasco be diagnosed accurately- otherwise, proposed treatments to abolish it won’t work. To this end, it feels like its absolute crux is a failure to understand the concept of “causal nonidentifiability,” as explained above by Dr.Greenland.

The question that drug regulators were asking in the 1990s was “what proportion of patients will benefit from this treatment that we are being asked to approve”? Instead of drug sponsors telling regulators that they were asking an unanswerable question, they dutifully tried to provide the answer. But the technique they proposed to answer the question was causally unsound.

The next question to ask is: why didn’t statisticians employed by drug companies at the time rise up en masse against this unreasonable request from regulators? And why didn’t regulators understand causal nonidentifiability well enough that they knew not to make the request in the first place ?? Is it because the concept simply wasn’t widely understood at the time among statisticians? This seems like a reasonable conclusion, given that a widely-cited 2016 publication by a statistician further reinforced the error and the practice of responder analysis remains alive and well today.

It seems to me that without this misunderstanding around causal nonidentifiability, responder analysis never would have caught on in the first place. Therefore, isn’t wide propagation of a plain language explanation of causal nonidentifiability what will be needed to abolish it? If responder analysis is the Death Star, then causal nonidentifiability seems like the reactor core- target that issue and the whole pernicious, fortified sphere of misunderstanding will blow up.

@Pavlos_Msaouel is pulling his hair out over the fact that randomized non-comparative trials (RNCTs) seem to be catching on in oncology. He is witnessing the birth of another Frankensteinian statistical practice. Statisticians are again chucking statistical fundamentals out the window in trying to address an intractable clinical question: how to make reliable inferences from small numbers of patients?

f2harrell · June 10, 2026, 3:11pm

Why indeed. I have been so disappointed over my career with fellow statisticians who would rather remain quiet than create a stink. This relates to the criticism @Stephen once received after having the audacity to suggest that measurement issues are within the domain of statistics.

davidcnorrismd · June 10, 2026, 5:05pm

“The Audacity of Nope”