We are developing a prognostic model for cognitive development in very preterm infants using advanced MRI scans. To achieve this, we used three types of MRI information: functional connectivity (FC), structural connectivity (SC), and morphometry. Additionally, we incorporated clinical data to enrich the model. This resulted in high-dimensional data with more than 1,000 features for each subject, making model development challenging. Therefore, we needed to perform data dimension reduction.
We selected non-negative matrix factorization (NMF) for feature extraction because it facilitates the identification of statistically redundant subgraphs, allowing for overlapping and flexible co-occurrence of components. NMF’s non-negative constraints also ease the interpretability of the subgraphs as additive, positively contributing elements. We applied NMF separately to the SC and FC graph measures. Prior to modeling with NMF, the morphometry data (which included some variables with negative values) were logarithmically transformed and subsequently subjected to Min-Max scaling to meet the algorithmic assumptions of NMF.
After applying NMF to the SC, FC, and morphometry data separately, the dimensions were reduced to 31 for SC variables, 29 for FC variables, and 26 for morphometry variables. We then added 6 clinical biomarkers, 1 cMRI injury, and age at scan (a total of 94 variables).
To develop and evaluate the models, we used bootstrap optimism correction validation and kernel-based SVM. For this model, we used the ANOVA kernel. The bootstrap optimism correction evaluation involved the following steps:
- Develop the model M1 using whole data
- Evaluate the performance of M1 using whole data and ascertain the apparent performance (AppPerf).
- Generate a bootstrapped dataset with replacement
- Develop model M2 using bootstrapped dataset (applying the same modeling and predictor selection methods, as in step 1).
- Evaluate M2 using bootstrapped dataset and determine the bootstrap performance (BootPerf).
- Evaluate M2 using whole data and determine the test performance (TestPerf).
- Calculate the optimism (Op) as the difference between the bootstrap performance and test performance: ( Op=BootPerf-TestPerf) .
- Repeat Steps 3 through 7 for nboot times (n=500 ).
Average the estimates of optimism in step 8 and subtract the value from the apparent performance (step 1) to calculate the optimism-corrected performance estimate for all relevant prognostic test properties.
For recalibration this model, we did the following steps:
- Develop and evaluate the Apparent model using Whole data
- Resample the data with replacement to generate the bootstrap data
- Develop and evaluate the bootstrap model using bootstrap data
- Develop and apply the recalibration machine of outcome of bootstrap model
- Evaluate the bootstrap model using whole data to achieve the test outcome.
- Apply the recalibration machine that developed in step 4 on test outcome
- Do the steps 2 to 6 for number of bootstrapping (n=500)
To develop the recalibration machines, we used the following methos:
- Simple linear recalibration
- Piece wise linear recalibration
- Nonlinear recalibration (using Generalized additive models)
- Nonlinear recalibration (using Generalized Linear Models)
- Isotonic recalibration
- Quantile mapping
The calibration plot of this model without recalibration was as bellow:
After applying the above recalibration methods, we got the following results:
Table 1. Comparing slope and intercept of test outcome after recalibration using different methods
method | slope | intercept |
---|---|---|
Simple linear recalibration | 1.37 | -34.43 |
Piece wise linear recalibration | 1.27 | -24.76 |
Nonlinear recalibration (using Generalized additive models) | 1.29 | -27.11 |
Nonlinear recalibration (using Generalized Linear Models) | 1.30 | -27.91 |
Isotonic recalibration | 1.38 | -35.29 |
Quantile mapping | 1.29 | -26.66 |
I wondered if there is a method to improve this recalibration and make the model closer to ideal model (intercept=0 and slope=1)?