Improving the model recalibration

We are developing a prognostic model for cognitive development in very preterm infants using advanced MRI scans. To achieve this, we used three types of MRI information: functional connectivity (FC), structural connectivity (SC), and morphometry. Additionally, we incorporated clinical data to enrich the model. This resulted in high-dimensional data with more than 1,000 features for each subject, making model development challenging. Therefore, we needed to perform data dimension reduction.
We selected non-negative matrix factorization (NMF) for feature extraction because it facilitates the identification of statistically redundant subgraphs, allowing for overlapping and flexible co-occurrence of components. NMF’s non-negative constraints also ease the interpretability of the subgraphs as additive, positively contributing elements. We applied NMF separately to the SC and FC graph measures. Prior to modeling with NMF, the morphometry data (which included some variables with negative values) were logarithmically transformed and subsequently subjected to Min-Max scaling to meet the algorithmic assumptions of NMF.
After applying NMF to the SC, FC, and morphometry data separately, the dimensions were reduced to 31 for SC variables, 29 for FC variables, and 26 for morphometry variables. We then added 6 clinical biomarkers, 1 cMRI injury, and age at scan (a total of 94 variables).
To develop and evaluate the models, we used bootstrap optimism correction validation and kernel-based SVM. For this model, we used the ANOVA kernel. The bootstrap optimism correction evaluation involved the following steps:

  1. Develop the model M1 using whole data
  2. Evaluate the performance of M1 using whole data and ascertain the apparent performance (AppPerf).
  3. Generate a bootstrapped dataset with replacement
  4. Develop model M2 using bootstrapped dataset (applying the same modeling and predictor selection methods, as in step 1).
  5. Evaluate M2 using bootstrapped dataset and determine the bootstrap performance (BootPerf).
  6. Evaluate M2 using whole data and determine the test performance (TestPerf).
  7. Calculate the optimism (Op) as the difference between the bootstrap performance and test performance: ( Op=BootPerf-TestPerf) .
  8. Repeat Steps 3 through 7 for nboot times (n=500 ).
    Average the estimates of optimism in step 8 and subtract the value from the apparent performance (step 1) to calculate the optimism-corrected performance estimate for all relevant prognostic test properties.

For recalibration this model, we did the following steps:

  • Develop and evaluate the Apparent model using Whole data
  • Resample the data with replacement to generate the bootstrap data
  • Develop and evaluate the bootstrap model using bootstrap data
  • Develop and apply the recalibration machine of outcome of bootstrap model
  • Evaluate the bootstrap model using whole data to achieve the test outcome.
  • Apply the recalibration machine that developed in step 4 on test outcome
  • Do the steps 2 to 6 for number of bootstrapping (n=500)

To develop the recalibration machines, we used the following methos:

  • Simple linear recalibration
  • Piece wise linear recalibration
  • Nonlinear recalibration (using Generalized additive models)
  • Nonlinear recalibration (using Generalized Linear Models)
  • Isotonic recalibration
  • Quantile mapping

The calibration plot of this model without recalibration was as bellow:
image

After applying the above recalibration methods, we got the following results:

Table 1. Comparing slope and intercept of test outcome after recalibration using different methods

method slope intercept
Simple linear recalibration 1.37 -34.43
Piece wise linear recalibration 1.27 -24.76
Nonlinear recalibration (using Generalized additive models) 1.29 -27.11
Nonlinear recalibration (using Generalized Linear Models) 1.30 -27.91
Isotonic recalibration 1.38 -35.29
Quantile mapping 1.29 -26.66

I wondered if there is a method to improve this recalibration and make the model closer to ideal model (intercept=0 and slope=1)?

I may have missed it but I didn’t see the number of observations or the distribution of Y. Those are all-important. If the effective sample size is not at least around 2000, the amount of overfitting expected may easily make the bootstrap underestimate overfitting. 100 repeats of 10-fold CV may be called for.

You are right to use data reduction; I’m not familiar with the matrix factorization approach. But it does strike me that the chances of all this being meaningful are low without injection of biological knowledge into the data reduction process.

Dear Dr. Harrell,
Thank you for your response. I was wondering if you could kindly share a reference that provides a mathematical proof showing that a small sample size causes the bootstrap optimism correction to underestimate overfitting?

It’s not the small sample size on its own. It’s when N << p. Sorry I don’t have a reference handy but there is a good one showing that intensive cross-validation is still unbiased in that case.