Keeping on the theme of teaching statisticians the sources of error from outside their domain here is a recent paper from Critical Care Medicine.
The term I use to describe the adverse effect on RCT reproducibility here is “administrative heterogeneity” .
Administrative heterogeneity is induced by the use of broad consensus criteria (defining a “syndrome”) as entry criteria for an RCT.
Note the overlap between these measurements (criteria) is much less than expected if these were titrated criteria rather than a new consensus each time.
Now sepsis is comprised of a plurality of different diseases (called phenotypes) .
So here we see the concept of “layered heterogeneity” with administrative heterogeneity layered over disease heterogeneity and then over the heterogeneity within each disease and finally over the patient heterogeneity.
Now there has not been any reproducibly positive sepsis RCT for 30yrs. One reason might be that each new RCT has a different mix of responsive vs nonresponsive phenotypes as a function of administrative heterogeneity. There are other considerations but, given the presence of marked administative heterogeneity, maybe we need look no further then that to find sufficient measurement error.