The ANDROMEDA-SHOCK trial discussion on this platform discussed how a Bayesian approach might allow a more nuanced interpretation of smaller RCTs whose results don’t reach “statistical significance,” but where early data seems to be “leaning” in one direction. The gist of the discussion seemed to be that a Bayesian approach to such trials could prevent us from throwing the baby out with the bathwater.

My question complements the one above. There are examples in medicine where small, early RCTs have suggested treatment efficacy (defined as a favourable, sometimes impressive, point estimate with confidence intervals that don’t cross 1). Some of these early “positive” trials have led to changes in practice which later had to be reversed when larger RCT(s) did not corroborate the effect. There are publications which try to explain the root cause(s) for the apparent instability of findings from small RCTs but I don’t find the explanations consistent or easy to follow.

My questions are:

- Is the apparent unreliability of small RCTs (which I’ll define as marked point estimate differences between small and large trials) an intrinsic/insurmountable problem related to the underlying “math”/scarce data considerations OR is the unreliability due primarily to factors related to small trial design/dissemination- examples might include:

- design-related flaws (e.g., poor blinding) which might be more common in smaller trials;
- important differences between patients enrolled in small/early trials compared with later/larger trials. For example, could earlier trials selectively enrol sicker patients whose benefit from treatment might be more easily discerned compared with the more heterogeneous/healthier patients enrolled in a subsequent larger trial?;
- greater chance that treatment-related harms will occur in larger trials and offset any treatment benefit?;
- publication bias (if only the select few “positive” small trials out of the entire universe of conducted small trials gets published)- although many small trials remain unpublished, I suspect this is overall less of a problem with RCTs than observational studies;
- a combination of the above factors?

Regardless of the root cause(s) for small RCT result instability, it seems that there are perils to interpretation of small trials using the “all or nothing” frequentist approach, both when the trial rejects the null hypothesis and when it fails to reject the null hypothesis (?)

- Are there examples where people have tried to re-analyze small “positive” RCTs (defined as p<0.05) using a Bayesian approach, to see whether this approach might have tempered enthusiasm for the early RCT results (e.g., using a skeptical prior and finding an insufficiently high posterior probability to justify a change in practice)?