This is very interesting and very different from other approaches I’ve seen. The proof is in the pudding. Can you either create realistic simulations with known truths and apply the method, or apply it to observational datasets where we know the truth from randomized studies?
Sort of getting off topic here, but I prefer to think of the proposed method as a window into the development of a treatment policy, rather than a new way to derive a treatment policy.
Essentially, it is a window into how an existing treatment policy would need to be changed to maximize expected reward more, where the latter depends on observational data and the associated assumptions.
If this window shows things that are odd, then the user would know to revise their assumptions. It does not really allow one to test these assumption directly (not possible), but it adds another layer of assessment.
I go into more detail in this post.
I think we need to come back to the original question. The authors used the cloning approach to deal with immortal time bias. They therefore used inverse probability weighting to deal with selection bias introduced by censoring non-adherent clones. They calculated the probability of not being censored, based on factors that might have been considered by the treatment team to decide whether ECMO should be initiated or not, such as physiological characteristics, disease severity, ventilation variables, specific treatments, and time.
In my opinion, this is an extremely opaque way of dealing with the selection bias they introduced by cloning in an attempt to mitigate immortal time bias. Its just trading off immortal time bias with selection bias and there is no certainty that selection bias is mitigated by inverse probability of censoring weighting given that the predictive covariates have no guarantee of being correct.
I think that cloning to get rid of immortal time bias is too opaque to be a useful method and my hunch is that all that is achieved is a burying of the true effect estimate into a null result except (perhaps) when effects are huge.
It seems I was right about this:
This paper examines the Stanford heart transplant data and clearly demonstrates how the different methods operate and the cloning method (also used by the authors of the BMJ paper) moves the results to the null and even beyond if the grace period is set to a small value. When I run this using our methods the HR is 0.19 using iterative time distribution matching (an improvement over prescription time distribution matching) that we recently proposed).
This also explains why other authors found no impact of bariatric surgery on cardiovascular outcomes, in my view this paper gives an impossible result and should raise the red flag regarding the usefulness of the inverse probability of censoring weights after cloning!