“In oncology, responder analysis is often used, e.g., to describe features of patients who responded versus those that did not (see example Figure 4 in our recent phase I trial paper here 2). While it does typically involve dichotomization (or some type of categorization), such responder analysis is orthogonal (and far more exploratory) to the main purpose of RCTs which is to compare outcomes between treatment groups.”
Pavlos- these statements highlight just how much nuance is needed when discussing the pitfalls of “responder analysis.” It seems really important to explicate these nuances in language that clinicians will understand. Understanding why this concept is problematic requires not just statistical intuition but also the ability to reason clinically.
As noted in the “Causal inferences from RCTs” thread (linked in the second post above), attempting to assess causality/treatment “response” at the level of individual patients enrolled in an RCT is often a phenomenologically invalid exercise. Reasons include the fact that 1) many medical conditions (e.g., asthma) have a waxing/waning natural history; and 2) many clinical events (e.g., compression fracture) are not amenable to assessment of the effects of drug dechallenge/rechallenge. In these types of clinical scenarios, causality either can’t be determined at the level of individual RCT participants unless the effect is replicated at the level of the individual (e.g., via dechallenge/rechallenge in a crossover or N-of-1 design)- as in the asthma example, or can’t be determined with any certainty AT ALL because the outcome is permanent- as in the compression fracture example.
However, as you note, there IS a subset of clinical phenomena for which clinicians will be able to assign causality at the level of an individual patient, even though we have NOT been able to witness the effect of either drug dechallenge or rechallenge. Malignancies have a directionally predictable trajectory in the absence of treatment- they will only get worse over time (not fluctuate, like asthma does). Sometimes deterioration is quick (e.g., pancreatic cancer) and sometimes deterioration is very slow (e.g., indolent lymphomas or some prostate cancers). The key point is that malignancies don’t spontaneously improve over time. We don’t see them melt away on imaging in the absence of treatment; if this occurs, the diagnosis of malignancy was likely incorrect. Therefore, if we see evidence of tumour burden lessening over time in a patient who has been exposed to a therapy, it’s very reasonable to conclude that it must have been the therapy that caused that improvement, even though we haven’t tested the patient’s response to treatment dechallenge and then rechallenge (criteria which are considered essential for assessing individual-level causality in the context of medical conditions for which a waxing/waning, rather than steadily deteriorating clinical course is the norm). If a malignant tumour shrinks over time after an intervention, it would be valid to infer that the patient had an “objective response” to the therapy. Of course, inferring treatment efficacy in individual patients in the context of a single-arm early phase cancer study requires that the techniques we use to assess tumour burden serially/over time are reliable.
You are pointing out that while certain clinical scenarios (especially oncology) DO allow us to distinguish “responders” from “non-responders” to a therapy, the study design context in which this is done MATTERS. Specifically, Phase 1 is not Phase 3. While “responder analysis” might be reasonable in the context of a single arm Phase 1 oncology study, it is NOT going to be useful in the context of a Phase 3 RCT, for which the main goal is between-arm comparison. Researchers who try to apply an individual-level “responder analysis” in the context of trials with multiple arms (e.g., Phase 3 RCTs) are betraying a fundamental misunderstanding about the entire purpose of Phase 3 trials. In order to understand the purpose of Phase 3 trials, researchers must internalize (deeply) the purpose of concurrent control.
Let’s be explicit about the reason(s) why we would criticize the notion of calling a patient in a multi-arm randomized trial (e.g., pivotal Phase 3 RCT) a “responder” but not criticize the idea of calling a single arm Phase 1 oncology study patient a “responder.” In other words, why is it considered nonsensical to run “randomized non-comparative trials” (Randomized non-comparative trials: an oxymoron?) but okay to assess “Objective Response Rate” in a single arm Phase 1 study of a cancer treatment (?) At first glance, criticizing the former process but not the latter process seems hypocritical. Let’s be absolutely clear about why these two positions are not actually in conflict.
If we agree that tumour shrinkage can, phenomenologically, be causally attributed to therapy at the level of an individual patient in a single arm Phase 1 oncology study, why should we avoid the temptation to “drill down” to individual patients enrolled in a Phase 3 oncology RCT? For that matter, if tumour shrinkage over time can reliably signal that a therapy is “biologically active,” why don’t we just approve ALL such therapies after Phase 1 and completely skip Phases 2 and 3? Why not just approve all drugs for which we can document tumour shrinkage on imaging after exposure? Of course, the answer is that, when approving new therapies of any kind, biological activity (e.g., tumour-shrinking ability) is not the ONLY important feature of the treatment that regulators need to consider. In most disease areas, RCTs are not comparing a new therapy with NO therapy (or inert placebo), but rather a NEW therapy against “standard of care” therapy. Over time, we want: 1) our treatments to become more efficacious than existing therapies so that patient outcomes will improve; AND 2) our treatments to become less toxic (physically and financially), so that 3) benefit/harm ratios for our therapeutic arsenal become more positive. If a new oncology drug shrinks patients’ tumours dramatically over a short time (as initially noted in a Phase 1 study), but, as noted during a Phase 3 pivotal trial, does so at a similar rate to the standard of care drug (an assessment that requires a between-arm/comparative analysis), yet with much higher rates of intolerable side effects (also a comparative analysis), then we might not want to approve that therapy (or at least not as a “first-line” treatment).
In short, while a demonstration of “biologic response” might be a valid goal of a Phase 1 oncology study (and is an important step in the search for therapies with enough promise in humans to advance to later phase studies), it will be an insufficient bar for judging whether or not to approve an oncology drug (except, perhaps, in the case of highly aggressive diseases with, universally, very poor prognoses). By the time we get to Phase 3, we are past the point of being concerned only with demonstrating biological activity of the therapy (and therefore “drilling down” to the level of individual patients in an attempt to assess tumour response)- this is NOT the primary goal of a Phase 3 trial. And this is why “responder analysis,” as conducted using a “randomized non-comparative” design, makes no sense at all. Why have more than one study arm if you’re not going to compare arms in some way, but rather just focus on individual responses within arms? Such designs betray a researcher’s failure to understand the purpose/value of concurrent control. Once researchers have enough information about a new therapy to enrol patients in an RCT (i.e., a 2-arm study), they are saying that their primary goal is no longer the study of individual patients- it is the weighing of risks and benefits comparatively, at the level of groups, to determine whether or not the drug should be approved, and in which clinical context(s).
As you know, I’m not an oncologist OR a statistician, so it’s possible I have all this wrong- happy to be corrected