Randomization in a diagnostic study - necessary?

Ivo1 · March 14, 2023, 9:45pm

Study-Design:

Rationale: Step-wise implementation of a gatekeeper tests before expensive imaging tests to reduce costs while maintaining patient safety. Diagnostic algorithms stratify patients according to the gatekeeper test (provides continuous values from 0 to >2000). 0 is regarded as a negative gatekeeper test. Recruitment: 2 phases with each 1 year.

#patients: 1000 patients per year, ¼ of patients are expected to be low risk (score=0)
First Phase: All patients get imaging
Second Phase: Patients with score=0 get no further imaging, all others do
Event of interest: All-cause death (0.3% per year, should be similar during both phases)
Claim: Non-inferiority of the second phase with a margin of 0.7% (absolute mortality should be <=1.0%/year)

Events: Low event rates,
The only patients affected are the ones with score=0
¼ * 1000 patients/year=250patients/year
Expected event rate 2500.003=0.75 events
Expected event rate 2500.007=1.75 events

Question: 1) Does (temporal, nested controlled trial) randomization add any value to this diagnostic study? Could such a design reduce study sample size?
2) If 1) yes, how can it be done. We cannot change procedures daily, but eventually every 3-6 months.
3) What’s the drawback to just analyze retrospectively. All patients had an imaging, I could therefore just calculate the score for all patients and compare score against imaging.
4) What’s the best way to calculate sample size if the event rates are so low

JosieS · March 15, 2023, 12:10am

I’m not 100% sure I understand your proposed design. Is Phase 1 a prospective historical control group for Phase 2? If so, it’s a weak design because there may be changes over time which have nothing to do with the diagnostic strategy and you can’t just insist that there won’t be, or prove after the fact that there weren’t. Especially when we’re in an ongoing pandemic and your outcome is all-cause mortality. That also means you can’t use retrospective controls to beef up the numbers because all-cause mortality has fluctuated wildly in recent years.

What age group has an all-cause mortality of 0.3% per year? They must be quite young. An increase to up to 1% is a very big difference to declare trivial. It might help with the sample size calculations but it won’t help you convince anyone that a difference of that size is, in fact, trivial.

You typically need very large sample sizes (hundreds of thousands) to measure the effects of a screening test on all-cause mortality (but it’s context-dependent and I do not understand your context). This is (probably) why very few cancer screening programmes have demonstrated benefits in terms of all-cause mortality despite demonstrating differences in cancer-specific mortality. It’s possible that the screening tests are causing as many deaths as they prevent but the most obvious reason is that they’re trying to detect the same size drop in a much, much larger ocean.

I don’t understand why individuals can’t be randomised in this context (that I don’t understand). Is it not possible to randomise them to go straight to a scan or to undergo the gatekeeper tests and conditional scans? It can be complicated to get institutional buy-in if many different clinicians/departments need to sign up to the protocol but it’s the strongest design if it’s feasible to implement.

If it’s impossible to randomise individually you can use a cluster design but you need a lot more than two clusters, and you will need a larger total sample size than for individual randomisation (methods for sample size calculations for cluster trials discussed here).

To get a realistic (useful) sample size you will probably need to recruit more centres, which helps with credibility and generalisation too. If you do cluster randomise, pay attention to seasonality. You can’t run one strategy for six months and compare it to another strategy run for the next six months. You either need to run all clusters for one calendar year, or run more clusters for 3 or 6 months each and ensure that they are balanced for seasonality (ideally within years across many centres and many years).

Ivo1 · March 15, 2023, 2:38pm

Thank you for the detailed reply. The setup would be

Prospective Study

Phase1: All get imaging
Phase2: Score=0 get no imaging, all others do

Randomization: The hospital argues, that randomly assigning Score=0 patients to one branch is not feasible because it would affect clinical flow too much.

I understood from your reply, that cluster randomization increases the sample size additionally.

We will reach out to you for further advise

JosieS · March 16, 2023, 1:20am

So, if you randomised individually you’d only be randomising the Score=0 patients after the score was known? Does that mean that all the gatekeeping tests are done for everyone anyway even when they will all end up getting imaging (ie your Phase 1)? If not, then you would (probably) need to randomise everyone upfront, to immediate scan OR gatekeeping tests and scan conditional on Score>0. “Probably” because it depends a bit on what these tests are and whether doing them delays the scan for those who need it, have the potential to introduce additional risk, or influence later clinical management.

I do not know the clinical or institutional context but is it really that hard to just not refer some people for a scan, given that they can manage this in the context of your proposed Phase 2? Or is it the need for informed consent that they’re concerned about? It undeniably is hard to get informed consent; it’s a lot of extra resource and not everyone will consent. Cluster trials usually can’t feasibly ask for consent so, if it gets past the ethics committee, you might be able to reach target recruitment faster even if you need a larger sample size.

But you would still need a lot more than two clusters and you will likely need more than 1000 patients/year to reach a realistic sample size within a reasonable timeframe. More than trebling the death rate does not look like a reasonable definition for the largest difference that would still be acceptable, even if the baseline risk is very small. Equivalence designs need very large sample sizes and there’s no way around that.

It’s also very hard to see how you could achieve demonstrably comparable clusters with only one centre, because you can’t balance seasons within years and the population death rate varies considerably between years as well as between seasons. You’ll get lost in noise and make it very hard to convince sceptics that your results mean anything.

You can’t really do this with advice off an internet forum. You need an experienced trial statistician to work with. Mistakes you make now cannot be rectified later. If your institution doesn’t have a suitable research unit already, you could try to find a trials unit with experience in your clinical area and see if they’re interested in getting involved. They’d be able to help you put a sound research design together (and find other centres to make it happen sometime this century). Funding allowing, of course. But if funding does not allow you might be better off not doing it until and unless there is sufficient funding to do it well.