Debunking myths: intention to treat , missing follow-up and imputation

I’ve seen a general trend (and this comes from mainly speaking to people and random readings, so nothing I can point to) to use blanket statements such as: if you have missing data then “intention to treat requires imputation” and “if you do not use imputation then your analyses are per protocol” - I strongly disagree, but I would like somebody to either tell me I’m wrong or to confirm/adjust my thinking below:

As I see it, ITT and PP are not even on the same spectrum. ITT is the effect of being allocated to treatment, in do-calculus it is p(Y | do(A = 1)) versus p(Y | do(A = 0)). PP is generally undefined, as it is entirely subjective depending on what researchers define as being compliant to the protocol. It is dependent on what we observe rather than what we do, and is therefore potentially confounded (by factors such as patient preference etc.). It would be something like p(Y | C = 1) versus p(Y | C = 0) where C = 1 is when we observe somebody being compliant and C = 0 not.

If we decide that we want to estimate ITT, and have missing data, it may mean that our ITT is biased. If MCAR, we lose some power but the ITT is unbiased. If data is MAR we could be helped by imputation. If data is MNAR we could also be helped by imputation if we allow for assumptions. My point is that in neither case does the ITT estimate suddenly become a PP estimate. It may be a really bad ITT estimate, but it is still an attempt to estimate ITT.

What I’m trying to fight against are blanket statements which suggest that something should “always be done” - surely we must look at each study uniquely and sometimes we may decide that imputation is the way to go, and sometimes not, and yet still talk about ITT estimates. In some cases we may prefer a non-imputed ITT estimate with a thorough attrition/sensitivity analysis to understand the potential risk of bias. Using imputation as a way to “hide bias under the rug” is in my eyes worse, I would rather researchers did a proper assessment of the risk of bias than show me what MICE produced (with all due respect to MICE which is a pretty fantastic).

Tell me I’m wrong (or not).


1 Like

Firstly, the concepts of ITT and PP are not so useful. As you say, PP was for a long time a pretty undefined thing that followed no real logic. Instead, the estimand concept makes a lot more sense. The way you describe ITT actually corresponds really closely to a “treatment policy” estimand. What most people used to do as PP analyses has no exact equivalent (because when you clearly define what you want, what was previously done does not make so much sense). One thing that can come close to what people sometimes said they wanted out of a PP analysis (e.g. “effect of taking the treatment correctly for at least X weeks”) could be a “principal stratum” estimand. It turns out that what was often done in the past came sort of close to targeting a hypothetical estimand (“What would have happened if hypotheticallyeveryone had been able to take their assigned treatment to the end of the trial?”) under certain assumptions.

Secondly, if I target a treatment policy estimand, then I cannot think of many situations where one should not impute missing data for patients that decided to no longer take part in a trial. For example, if a treatment works and a patient quits the trial, they probably stop taking the treatment and their unobserved off-treatment data would presumably be worse than that of those that continue treatment. Assuming MCAR in this scenario or even imputing under MAR from those that continue on treatment would be pretty silly. Imputing based on those that stopped treatment, but continued assessments might be more sensible, or for some treatments with only short-term symptomatic effects it is reasonably common to impute from a placebo group (jump-to-reference imputation).

Saying that one does not need to impute and can just analyze the observed data is a form of imputation with incredibly strong assumptions. These are usually much stronger and more questionable than those made by a sensible imputation targeting a clearly specified estimand. While I do not like blanket statements and a single standard approach will not fit all situations, I’d suspect that a treatment policy estimand will typically be best targeted by using some form of imputation. Such approaches have a chance to be reasonable (and of course sensitivity analyses should be done). MCAR is essentially always a wrong assumption and the main argument I’ve heard for it is that it’s easy and it may not matter too much. I.e. the main scenario where it is “okay” is when the analysis approach just does not matter such as when nearly everyone finished treatment as planned anyway and had all assessments (if an analysis assuming MCAR disagrees with more sensible analyses, I know which analysis I would question more).

A treatment that patients do not wish to take, especially if this stems from a side effect, seems to me to be something to penalize on rather than a setting for imputation. In a longitudinal ordinal analysis, wishing to stop treatment could be considered a mild but not good clinical event.

Thank you Bjoern for taking the time to answer. You have certainly hit on some crucial points which helped me sort out some of my thoughts, and the reference is very useful.

After reading your reply I revisited the explanation and elaboration of SPIRIT - and I’m struck by two things that I haven’t really given that much thought before:

  1. It actually says: “The ambiguous use of labels such as “intention to treat” or “per protocol” should be avoided unless they are fully defined in the protocol.”

  2. Yet it also says: “it is widely recommended that all participants be included in an intention to treat analysis, regardless of adherence”

The second point I assume authors wrote because it is a more common term than treatment policy estimand - yet I think what they meant was that a treatment policy estimand should be primary. The first point I find fascinating, I wish I had reacted properly to this 10 years ago and abolished the use of the term back then.

So, I’m left with the following suggestion: We should stop using the terms ITT and PP altogether - just describe the analysis model in the protocol and that will tell the reader everything that is necessary to judge its appropriateness (no need for ambiguous labels).

But it does not entirely solve the impute or not-impute question (for me at least). I agree that no imputation is a strong assumption (and a form of imputation in its own right). Although I could think of worse strong imputations such as carry forward from baseline or “assume negative outcome of those who did not respond”. But can we really assume that applying for instance MICE will always be better than not imputing at all?

The reason I am stuck in my thinking is that for the imputation model to be reasonable, it will require knowledge about the causal structure behind missingness, and this is rarely (if ever known). An incorrect imputation model risks biasing an estimate or a biased estimate seem more precise. If I am not confident that I have the knowledge/data to define an imputation model, I may be better of not imputing and highlighting this as a limitation (with attrition analyses). I agree with you that MCAR is almost always wrong, but so is MAR! In reality, the safest bet is that data is MNAR. and then imputing under MAR can go really wrong with the result being that a biased estimate ends up with a more narrow credibility/confidence interval suggesting more certainty.

Again, thank you Bjoern for your reply!

1 Like

That one I’m a bit uncertain about. I find that very often non-statisticians and even statisticians struggle with understanding what some particular modeling, imputation and analysis set decisions imply. And then one ends up with rules of thumb, things one always does etc., which can lead to discussions were people are talking past each other (like in my favorite estimand example, the first dapagliflozin FDA advisory committee meeting, that I’ve used in an early estimand paper). That’s what has really convinced me that stating what question you are trying to answer (=estimand) up-front and then justifying in detail why the analysis approach actually is doing that (and what assumptions are being made) is a good approach.

I totally agree that just blindly applying MICE is not what people should do. I have a rather negative reaction to that, but also to analyzing only the patients with complete data. The selection biases from who has missing data can often be striking. It is just such a concern that it could turn a nice randomized trial into an unresolvable tangle of observationally describing populations of patients conditional on post-baseline outcomes and comparing treatments.

One of the most typical responses to that, which I have seen e.g. from regulators is: “Well, in that case, do a MNAR approach that is likely to be conservative. I.e. that is a bit biased towards leading to you not finding what you are trying to show in the trial (such as your treatment is better than placebo). Yes, sure, this will lower your power a bit and perhaps underestimate the treatment effect a bit, but that gives you an incentive to avoid unnecessarily missing data.”
However, it really should depend on the question one should answer.

On another note: When assuming MCAR is appropriate, then imputing under MAR is appropriate, is it not? So, I’m still struggling to think of a case where a complete-case-analysis would be a my first idea.

You are certainly right that if I’m interested in the treatment policy estimand (what happens as a consequence of assigning a treatment irrespective of actual intake/adherence), imputing missing future longitudinal data “as if patients had continued treatment” (when we know that is definitely not the case) does not make sense. In this scenario, you’d want to impute the values we would have observed, which means the off-treatment values the patient had (but which we did not measure). That probably means these values would be worse than if the patients had continued treatment. The trickiest scenario is, if the patients then start a new therapy that is not part of the trial (like in the estimands for diabetes paper I linked) or if you do not know that they did.

That is one of the attractions of the approach you describe: considering stopping the study treatment as a negative outcome. Or in estimand language: a composite estimand strategy. And just to spell it out for some that might stumble on this thread: That does not imply that you must then dichotomize your data into treatment success or failure (aka “responder” vs. “non-responder”). There’s plenty of good proposals for ordinal outcomes and even staying on a continuous scale (e.g. the trimmed means approach).

1 Like

Again, thank you Bjoern for taking the time to answer. You have certainly helped me to put some of my thinking straight - and I will take all your advice in consideration moving forward. I think I will have to do some simulation work to really get my head around what really happens when data is missing under different conditions and different approaches to imputation are used - and I have a feeling that I will end up closer to your approach. Thank you!