Can this be a possible statistical solution for a mid-trial realization?



If in a trial - a misinterpretation of inclusion criteria by sites led to 10% of patients enrolled who do not meet the criteria (not a protocol violation per se, but misinterpretation of the intended inclusion criteria). These 10% of the participants were randomized and will be included in the ITT analysis - but have NO chance of benefit from the therapy. Is there any opportunity to suggest that at the end of the trial if

p> xx (‘significantly’ > 0.05) - negative
p between xx-xx (> 0.05 but not ‘too much’) - redefine p criteria for positive trials more strictly and reanalyze the data with excluding these 10% of the patients - and if the rest of the patients meet the redefined more stringent p criteria - conclude positive study

Is there such a statistical option?


Good question. I’d like to hear from some experienced clinical trial statisticians and trialists about the literature or guidelines for exclusion of “baseline protocol violations”. In the meantime I’d like to hear something about the way that the misinterpretation of inclusion criteria as detected, and whether other possible violations were given the chance to be detected.

Since “statistical significance” has caused so much harm to science and all p-value thresholds are arbitrary, I’d hesitate to make any recommendation based on p-values. I would either suffer with including everyone or exclude the 10%, but not using outcomes in any way in making the decision.


Agree, except that there are regulatory bar that also has to be met, not just the scientific lesson learned - hence some analytic standard in such situations would be helpful


I have some preliminary comments:

  1. First, no adjustments to p value threshold. That would all be post-hoc and potentially, data driven. You may not be able to achieve a more rigorous p value threshold in either case, so I don’t see anything gained.

  2. You can do both an ITT analysis with all subjects, and a Per Protocol (PP) analysis with the subset of subjects that had no major protocol violations. The PP analysis would also exclude the subjects described above that should have been screen failures. It is possible that the two results will be in the same direction, however, due to some level of compromise in the end points by the screen failures in the ITT analysis and the effective reduction in power in the PP analysis, neither will achieve a p value of <=0.05. If the a priori study sample size was increased from the calculated, to account for LTFU, drop outs, etc., which is commonly done, then you may still have a sufficient sample size in the PP analysis.

  3. If this is for a regulatory submission, then a discussion with the relevant body would be in order, to understand how they may react and any recommendations that they may have. There may even be some consideration for a decision based upon the preponderance of evidence from the trial, which I have seen done, even when the primary end point was strictly missed.

  4. This is potentially a great example of a situation where an independent DSMB/DMC could have played a critical role. Had the mis-interpretation of the inclusion/exclusion criteria been observed earlier during reviews by an independent panel, that may have avoided the situation involving such large proportion of the sample. That is, there should have been red flags raised earlier, presuming that the mis-interpretation could be identified/quantified by a review of baseline subject characteristics that should not be present. There would have been a detailed process review of what was happening, and a determination that the issue was either limited to one or just a few sites and remedial action taken at those sites, or possibly a recognition that the issue was wide-spread, in which case a protocol revision would have been recommended for expedient approval and implementation.


Excellent ideas Marc. Assuming that the original plan had ITT as the primary analysis, what does everything think about amending the SAP to make PP the primary and ITT the secondary analysis? Here by PP (per protocol) I’m meaning only to consider pre-randomization information in excluding subjects.


Thanks Frank.

If this is for a regulatory submission, then any such changes to the SAP would have to be approved by the relevant body. Certainly for any publication based upon the study results, this would need to be disclosed.

To consider an approach using only pre-randomization criteria would seem reasonable and I would label this group as a Modified ITT (mITT) sample. In this case, unless the sample size was increased as I noted above, there is a good chance that the pre-defined alpha of 0.05 will be missed due to the reduced effective power with a loss of 10% of the sample.

There may still be a desire to conduct a PP analysis and that could consider post-randomization information, which is commonly done. ICH E9 gives some examples of such criteria:

a. The completion of a certain prespecified minimal exposure to the treatment regimen
b. The availability of measurements of the primary variable(s)
c. The absence of any major protocol violations, including the
violation of entry criteria

and it is possible that such an analysis is already defined in the SAP.

There is also additional guidance in ICH E9(R1) relative to PP analyses, bias and other considerations that may be relevant here.


Putting aside this excellent point (that these deviations signal the possibility of other, as-yet undetected problems in trial conduct), what does the Likelihood Principle have to say here?


Marc this is all very excellent advice in my view. The boosting of the sample size by the number of patients ignored in the mITT analysis would be a good signal from the investigators that they are taking the situation seriously.


That’s a good question. As a slight aside, what makes Bayesian sequential analysis work without penalizing for earlier data looks is full conditioning, i.e., the analysis makes use of full previously collected information and does not “uncondition” any of the data out of the analysis. But I think a Bayesian analysis of randomized females would be valid; it conditions on even more information (sex) and the conditioning respects the flow of time, i.e., the sex of each patient is predetermined and not affected by treatment. Maybe some of this thinking also applies to likelihood inference.