Even small (clinically insignificant) effects matter, if we expose a lot of people to them

Consider the following statements:

  • “The drug shortens the duration of the symptoms of common cold with 1 hour (from 1 week with placebo). Although it is clinically insignificant, but hey, there are one million illnesses a year, so if everyone would take the drug, it would mean 125 thousand workdays saved!”
  • “The drug reduces the severity of the symptoms of depression only with 2 points on a scale of 50, which is considered clinically insignificant. However, given that we have at least 100 thousand patients suffering in this disease, this means that treating everyone would result in 200 thousand points benefit, which is definitely clinically significant!”
  • “This oral anticoagulant increases the risk of intracranial bleeding only with 0.1%, which seems insignificant, but given the huge number of patients taking the drug (one million, with a baseline risk of 0.01/year), this ‘clinically insignificant’ effect actually means 10 very severe side-effect caused by the drug each year!”

[Let’s assume for simplicity that these effect sizes are sure, i.e. measured in very large, well-designed trials.]

What are your thoughts on the validity, soundness of such reasonings?


My 2 pence:

  • If the drug has a small effect, it is likely that some (small) side-effect will outweigh the benefit. The smaller the effect, the more critically possible side-effects must be taken on consideration.

  • The “significance” of benefit you gave is measured on a “community basis” (the “individual” benefit is insignificant). So it’s fair to balance the possible community benefit with the community costs. If the drug is given to so many people, even relatively cheap drugs will cause considerable costs for the community. How much money is bound by the heath system to save 125 thousand workdays? Or: from which part of the health system do we take away the money to subscribe these drugs and is there a net benefit in doing so?


Each of these examples is a case on its own, and really underlines the importance of understanding clinical context.

Shortening symptoms of the cold by one hour is not going to change patient behaviour. The are not going to go back to work an hour earlier. The appropriate effect size measure would have been working days lost. Symptom duration is a proxy for this, but proxy measures have a long history of backfiring – remember anti-arrhythmia agents and MI?

The antidepressant result is, again, not a real life effect size. Time to remission and one-year outcome are the measures of choice. The scale, of course, is not numeric, and consequently the two point change could be due to reducing scores on a symptom such as poor appetite. Standardised scales are invaluable for making treatment decisions, but do not measure the outcome that the patient and clinician are working towards. Psychiatry is slowly moving from “what’s the matter with you?” to “what matters to you”, so I would wait for a more appropriate measure of treatment outcome.
As to your arithmetic – what would happen if you gave 100,000 potted plants 1 ml of water each? It adds up to a lot of water, but your plants die.

Risks of serious adverse events such as intracranial bleeding are present all around us. Oral contraception increases the risk of deep vein thrombosis and as far as I remember could account for one death per 4,000,000 user-years. Is this an acceptable risk? Pregnancy, by they way, carries a risk of DVT twice as large as that associated with oral contraception.

Risk is also risk in context. If a surgical procedure to remove a cyst from the hand carries a very small risk that the person will lose sensation in the tips of two of their fingers (I’m making this up, but bear with me), they may not mention it to the patient. However, the current ruling from the British Medical Council is that ‘serious’ risk means ‘serious to the patient’, not ‘medically serious’. If the patient were a professional violinist, they would be horrified at the idea of losing sensation in their fingertips, and even though the risk is very small the doctor must tell the patient.

In each case, the results of an analysis of data are only the beginning of a messy process of decision making!


Thank you for the comments!

Yes, I agree, that’s one of the important points here in my opinion as well: it obviously doesn’t matter if we compare one million times the risks to one million times the benefit, or we compare the risks to the benefit. Question is: can this question (these questions) really be reduced to such simple arithmetic…?

That was absolutely intentional, I can totally imagine that answers are different to these questions (that’s why I have chosen these examples).

Well, one could argue that 7 out of 8 patients gets better with one hour earlier in the middle of the day, so it indeed doesn’t matter (from this aspect), 0 work hours saved, but 1 in 8 gets better early in the morning, so the drug was the difference between going to work or not, 8 work hours saved – so on average, we really have 1 work hour saved.

Yes, of course we could argue about these, but let’s put these issues aside, and – as a thought experiment – accept that these are the metrics we use, so that we can focus on the other issues of the questions.

Now, that’s very important I think. I intentionally didn’t want to present my own opinion in the opening post, but I also believe that this is one of the essential issues here: is 5x2 = 1x10…? I.e., is five people with slightly better appetite the same as one with much less anxiety, and much better mood? Because such calculations of “200 thousand point benefit for the population” practically assumes this!

Also note the difference between this and the previous example: work hours are just numbers, so usual arithmetic applies, but here I find that we have much more questions about this logic. (But I am absolutely open to any discussion about this…)

That’s exactly the reason why I’ve written “can this question (these questions) really be reduced to such simple arithmetic” to @Jochen (also see my previous remark).

Very good point indeed. However, if we assume that being a violinist is not associated with different chance of being prescribed oral anticoagulation then this doesn’t matter on the population level (averages will apply).

I realize that your question is primarily trying to get at whether the math behind these types of extrapolations is defensible. But from a clinical standpoint, these types of extrapolations, if used to justify an intervention, are often moot.

Patients make decisions based on the probability that they, not the population at large, will see benefit or harm from a treatment. I suspect this is why we’re starting to see that “positive” RCT results in some areas (e.g, cardiology) are not being adopted on a large scale in the clinic. Cardiovascular trials these days have to be huge in order for a new treatment to show benefit, since we already have several medications that reduce risk. So today we see massive trials that show a “statistically significant” benefit from a treatment, but which ultimately don’t impact practice that much in the real world because absolute risk decreases are so small. In my experience, it’s very unlikely that a patient who is already taking four medications for his heart will agree to add yet another if there is only a small chance he will benefit (particularly if he has to pay for that medication and it’s associated with serious potential harms). In other words, any potentially significant “population” impact of the small RCT effect becomes irrelevant if individual patients are unlikely to accept the treatment.

Your question takes on another meaning in the context of decisions that have to be made at the population level (e.g, decision-making bodies that have to decide whether they will cover the cost of a new treatment or vaccine). Coverage for the cost of vaccines is a good example. Today, certain infections are less common than they used to be (e.g, meningococcus, pneumococcus), likely because children these days are vaccinated. Vaccine manufacturers now promote newer variations on these vaccines that will cover residual strains of these bacteria. The absolute incidence of infections with these residual strains is low, so the chance that any given person will see a benefit from the newer vaccines is also low. Probably as a result of this small expected benefit, the cost of the newer vaccines is not covered by our government and they are effectively used primarily by wealthier patients who can afford them.

Don’t know whether these examples are of any interest, but I think you raise an important question. First, is it mathematically sound for authors to extrapolate small effects to a population level in the way you describe? And second, even if these extrapolations were valid, how do they translate (if at all) to individual patients and decision-makers?


A few remarks by a patient advocate on this very interesting discussion - specific to COVID vaccine trials. The relative individual and population effects for an intervention depends on the indication - and for COVID the population effects are potentially enormous.

The individual risks are more abstract for a prevention intervention. These may effect public buy in, which is needed to have the urgently needed population effects.

The possible side effects might be reported as they are in consent documents in frequency tables. The formats are available here https://ctep.cancer.gov/protocolDevelopment/informed_consent.htm

In 100 people receiving …

All effects (let’s say a 60% reduction in risk of acquiring COVID or needing hospitalization) need to be communicated in plain language for individuals and for the population – return to normal life and economic health (in x months) – such as sporting events, back to school and work.

But winning public trust in the methods of study seems critical if the findings are to help us emerge from the crisis. This will require explaining how randomized trials work in general and the design of the COVID vaccine trials in particular.

My question: can someone point me to a RCT protocol for current study vaccines - or a summary outline of how they work. I have assumptions about this, but would like to be able to cite the sources for the editorials I will write to explain why the RCT vaccine trials can be trusted to tell us about the good and bad effects of the vaccine and why we can trust the results will be reported honestly.

1 Like

Not sure if this is exactly what you’re looking for:


You raise good points. Uptake of coronavirus vaccines will almost certainly vary substantially from country to country. Unfortunately, some countries are likely to have their work cut out for them… As is true for most medical interventions, there will always be a subset of people you will never convince. Best to focus on those “on the fence,” for whom education and reassurance are likely to have the biggest impact. For many patients, their “tipping point” is the prospect of protecting more vulnerable loved ones.


So the FDA guidance for vaccine trials was helpful to me. Thank you. But I did not see criteria for when to do the analysis - to see if the 50% threshold for efficacy has been met.

Initial thought is that it must be event driven: a certain number of COVID cases, hospitalizations, ICU, and deaths in the study population triggers unblinding. Q: If so, what might that magic number be in a 30,000 person RCT study? Shouldn’t every study vaccine sponsor agree to the same criteria? Doing so would be an incentive to enroll persons at higher risk of exposure and the worst outcomes in order to efficiently get to the finish line: mature outcome data.

I suppose that interim safety monitoring by DSMB can call for unblinding at pre-specified intervals which might also inform if there are efficacy signals - that might trigger the company (if informed by the DSMB) to submit the findings to FDA. If this is the roadway to FDA submission, I’m worried.

In summary, the number of events to trigger unblinding should be pre-specified and explained to all stakeholders - in order to depoliticize vaccine research and foster public trust.

What am I missing?

1 Like

You raise good questions- I’m not qualified to answer them but I’d be interested to hear responses from statisticians experienced in clinical trial design.

Like you, I would assume that trialists need to prespecify the number of events (e.g., number of documented COVID cases among trial participants) that would have to be recorded before they could unblind their data to check how many cases involved vaccine-treated versus placebo/control-treated participants. Otherwise, what would stop trialists from checking very frequently for a between-arm imbalance in events that would be sufficiently large to satisfy the predefined efficacy “threshold”? I suspect that multiple non-prespecified peeks at the data might not be a kosher approach, statistically speaking…Specifically, how could we be confident that this degree of between-arm imbalance would be maintained or increase (versus attenuate) if the trial were to continue?

Since this topic is important but not directly related to the original post, you might want to re-post your questions as a new topic- you might get more and better answers.

1 Like

In time-to-event analysis, the effect on overall survival depends on the baseline hazard. Therefore, if the disease is indolent, and there is enough time, small therapeutic effects can drastically change the evolution of a disease. For example, in oncology it is common that RCTs in second and third chemotherapy lines are declared positive with HRs around 0.50, which seems very impressive, but this is translated into a very small difference in medians. In contrast, in the first line, when the tumors are indolent, you can have modest therapeutic effects (e.g., HR 0.85) which translate into large differences in survival medians. I think of this analogously to the idea of changing the trajectory of an asteroid on its way to Earth through small impulses administered constantly, instead of using brute force.


In time-to-event analysis, the effect on overall survival depends on the baseline hazard.

This is a very good point and there is one fascinating nuance that is rarely discussed: the effect of disease prognosis (indolent vs aggressive) goes in opposite ways depending on which measure of “absolute” difference is used. Assuming exponential distribution for survival: for indolent diseases a HR = 0.50 will show large differences in median survival but small differences in absolute risk reduction metrics (e.g., 3-month survival probability). For aggressive diseases, a HR = 0.50 will show small differences in median survival but large differences in absolute risk reduction metrics. Here is a fantastic paper elaborating on this phenomenon. We discuss during this ASCO elearning course the implications of this contradiction for clinical decision making. This is why @f2harrell is such a big fan of relative measures (e.g., HR and OR) that do not depend on base rates to compare treatments in RCTs. We can then transport those into the specific patient context in clinic, e.g., using prognostic risk scores.


Do the impacts of modest treatment effects you describe here also apply to indolent lymphoma?

I wouldn’t think so. Perhaps the difference is that it can start off indolent and turn aggressive; and that durable complete responses are common with combination therapy and are associated with survivor similar to the general population.

This is a great discussion.

In a scenario where (almost) everyone is expected to experience an important outcome (e.g. an aggressive malignancy) in a relatively short timeframe months to a few years, the interpretation of differences in median survival time becomes more intuitive. If a person expects to live 3 years but a new treatment buys them an additional 6 months, most people would at least consider the treatment.

I wonder if, in primary prevention where people may experience an important outcome over a long time frame, it may be harder to appreciate the importance of median event-free time gained. Follow-ups are not long enough to see >50% experience the event so maybe this is why we rarely see median survival estimates in this context.


In a scenario where (almost) everyone is expected to experience an important outcome (e.g. an aggressive malignancy) in a relatively short timeframe months to a few years, the interpretation of differences in median survival time becomes more intuitive

It is even more complicated as shown with practical examples in the ASCO elearning course. We hope to write these considerations up for a journal in the coming months so that all will have access. If someone has an aggressive cancer then focusing on median survival can actually be misleading: in such scenarios median survival differences between two treatments can be miniscule whereas absolute risk reductions at specific time points (e.g., at 3 months) can be large.

You are correct that in situations with low baseline hazard rates (e.g., an indolent malignancy) it may be tough to find enough events to gain enough precision to compare two or more strategies. At the same time, however, the median survival differences become inflated.

Think of it this way: under typical assumptions and all other things being equal if let’s say the HR = 0.5 for two treatments, e.g., in a randomized controlled trial against clear cell renal cell carcinoma (ccRCC) then for poor prognosis (aggressive) ccRCC if the median survival is 2 months with the control treatment then it increases to 4 months with the new treatment (a median survival difference of 2 months). But for a favorable prognosis ccRCC if the median survival is 18 months with the control treatment then it will increase to 36 months with the new treatment (a median survival difference of 18 months).

This is why taking into account patient preferences / utilities is key in clinical decision making.

1 Like


Treatment X looks to do exactly as it claims in both scenarios: slow the hazard to half! Why do you say it is inflated in the indolent scenario? If one were to think in terms of trade-offs (be they side effects or cost), 18 months gained looks like a better deal than 2 months gained.

(Of course, that 2 months is sooner and maybe future value of time is relatively discounted depending on values/preferences, but let’s ignore that pesky detail).

1 Like

lets say that aHR was 0.75 instead of 0.5 in the indolent scenario. It would still buy you more time than in the aggressive scenario. I would tend to think we ought to value the time rather than the hazard ratio.

Perhaps another way to see it is that in the aggressive scenario where the ship is sinking quite fast, you need a really big intervention to move the needle appreciably. In the indolent scenario, less is needed to achieve the same. With this would come the conclusion that gaining life and quality of life in aggressive cancers is a difficult and expensive task, whereas gaining the same duration of life and quality of life in indolent diseases is less so (contingent on all kinds of assumptions about time on intervention, cost, etc)

1 Like

Blockquote Treatment X looks to do exactly as it claims in both scenarios: slow the hazard to half! Why do you say it is inflated in the indolent scenario? If one were to think in terms of trade-offs (be they side effects or cost), 18 months gained looks like a better deal than 2 months gained.

Great question! You are focusing on the statistical estimation scale (the HR) but, as we already discussed, baseline patient risk will inform the prediction (probability) scale and how we choose to focus on this second scale will depend on patient goals/utilities.

Think of it now this way: let’s say that the new treatment is more costly (e.g., more side effects, expensive, logistically intensive) than the control. And let’s say that what a specific patient we are seeing in clinic wants is to make sure that they see their daughter graduate college in 3 months. In that scenario (assuming exponential distribution for survival to simplify calculations here) the 3-month survival probability will be different depending on the ccRCC prognosis: for the favorable risk ccRCC the 3-month survival probability is 89% for the control and 94% for the new drug. Thus the absolute risk reduction is only 5% and the patient will very likely be alive at 3 months regardless of which treatment course we decide to use. Thus, choosing the less costly option is preferable here.

But if the patient had poor prognosis ccRCC the 3-month survival probability will be 35% for the control and 60% for the new drug. Thus the absolute risk reduction here is 25% and the patient will be more likely to be alive at 3 months if we choose the new drug.

The exact opposite conclusions would be made if our patient was more interested in doubling their median survival by at least 6 months or more: for poor risk ccRCC we would choose the control and for favorable risk ccRCC we would choose the new drug.

Similarly, utility considerations change depending on the disease. Even across different cancers.

1 Like

Nice, now I understand. Thank you for taking the time.

1 Like