CPM performance metrics: Adjustment for repeated measures

arthur_albuquerque · January 16, 2026, 2:05pm

I want to develop and internal-external validate a diagnostic clinical prediction multinomial model with a dataset with repeated measures. The project consists of diagnosing multiresistant bacteria in patients with suspected infection at hospital admission.

These infections are detected in cultures and a patient might have multiple cultures at admission. Our preliminary data showed 10000 patients with around 19000 cultures (median = 1, p25 = 1, p75 = 3, mean = 2.2 culture per patient).

I want to account for these repeated measurements in the performance metrics by running a non-parametric pairs cluster bootstrap (resampling individual patients).

** My question is in what exact point of my framework I should run this bootstrap.** My current framework consists of estimating with the whole dataset:

Apparent performance
Optimism with a another bootstrap procedure
Optimism-adjusted performance

Then, in internal-external cross-validation, I will estimate performance metrics related to the external validarion part for each held-out cluster (hospital).

I was planning to estimate these performance metrics:

c-statistics with hmisc::rcorr.cens
Calibration intercept (a) with a model like: y ~ a + 1*linear_predictors (offset)
Calibration slope (b) with a model like: y ~ a + b*LP

Then, I would run a separate non-parametric pairs cluster bootstrap (resampling individual patients) for each metric. Remember I would do this for each step mentioned above with the whole dataset and within each internal-external loop.

I would love to hear some input from you! Thanks

robinblythe · January 18, 2026, 1:51am

Interesting research question – presumably your outcome is binary presence of bacteria for each patient. What if you treated each culture separately? Then you would have independence, even if it’s the same model.

Re: calibration, the slope and intercept summaries can be problematic IMO as they are not informative over the full range of predicted Y values. You can visually assess the curves using LOESS or a GAM, and Peter Austin and Ewout Steyerberg have developed a calibration index that’s quite useful as a decision aide.

arthur_albuquerque · January 18, 2026, 2:09am

Multinomial, because there are several antibiotic resistance patterns of interest. We assume cultures collected at the same day will be treated together because they most likely correspond to the same infection.

Calibration curves will also be assesed.