This script assumes that you have already run mgus_bm_prediction_20231125. This analysis has been completed in response to the revision request by the Annals of Internal Medicine, and will explore possible interaction effects between MGUS isotype and total IgG, IgA and IgM and compare this to the
First we fit the newly hypothesized ‘full’ model that includes interactions between total IgG, IgA and IgM and MGUS isotype.
fit_pred_inter <- lrm(outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) +
rcs(m_protein, 4) + rcs(flc_ratio, 4) +
rcs(igg, 4) * isotype + rcs(iga, 4) * isotype +
rcs(igm, 4) * isotype, data = df_pred,
x = TRUE, y = TRUE)
As seen below the model likelihood ratio chi-squared statistic is 441.17 on 45 degrees of freedom, with an R squared of 0.386 and a concordance index of 0.779. It should be noted that this is more than double the degrees of freedom of the current model and much larger than any sample size calculations could possibly justify.
fit_pred_inter
## Logistic Regression Model
##
## lrm(formula = outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) +
## rcs(igm, 4) + rcs(m_protein, 4) + rcs(flc_ratio, 4) + rcs(igg,
## 4) * isotype + rcs(iga, 4) * isotype + rcs(igm, 4) * isotype,
## data = df_pred, x = TRUE, y = TRUE)
##
##
## Frequencies of Responses
##
## 0-4% 5-9% 10-14% >15%
## 455 430 91 67
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 1043 LR chi2 441.17 R2 0.386 C 0.779
## max |deriv| 6e-07 d.f. 45 R2(45,1043)0.316 Dxy 0.558
## Pr(> chi2) <0.0001 R2(45,882.4)0.362 gamma 0.558
## Brier 0.082 tau-a 0.351
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=5-9% 3.1230 2.3381 1.34 0.1816
## y>=10-14% 0.3675 2.3357 0.16 0.8750
## y>=>15% -1.0835 2.3379 -0.46 0.6430
## isotype=IgA 13.8577 12.7418 1.09 0.2768
## isotype=IgG -2.9444 2.7949 -1.05 0.2921
## isotype=Light-chain 7.7250 3.3305 2.32 0.0204
## igg -0.2488 0.2392 -1.04 0.2983
## igg' 1.8215 1.0768 1.69 0.0907
## igg'' -5.6500 3.2514 -1.74 0.0823
## iga -0.0053 0.7176 -0.01 0.9941
## iga' 2.2698 4.4861 0.51 0.6129
## iga'' -4.8554 10.2268 -0.47 0.6349
## igm 0.0392 2.7513 0.01 0.9886
## igm' -0.9653 28.5590 -0.03 0.9730
## igm'' 1.7419 57.0128 0.03 0.9756
## m_protein -0.6831 0.1873 -3.65 0.0003
## m_protein' 45.2439 11.5998 3.90 <0.0001
## m_protein'' -56.8985 14.7544 -3.86 0.0001
## flc_ratio -2.0197 0.3664 -5.51 <0.0001
## flc_ratio' 37.0560 6.7460 5.49 <0.0001
## flc_ratio'' -87.3404 15.9932 -5.46 <0.0001
## isotype=IgA * igg -0.0185 0.2925 -0.06 0.9496
## isotype=IgG * igg 0.2467 0.2992 0.82 0.4096
## isotype=Light-chain * igg -0.8803 0.4241 -2.08 0.0379
## isotype=IgA * igg' -1.1361 1.4319 -0.79 0.4275
## isotype=IgG * igg' -1.1851 1.2699 -0.93 0.3507
## isotype=Light-chain * igg' 2.1659 1.7466 1.24 0.2150
## isotype=IgA * igg'' 4.0980 4.4503 0.92 0.3571
## isotype=IgG * igg'' 3.7314 3.7537 0.99 0.3202
## isotype=Light-chain * igg'' -4.8864 5.0425 -0.97 0.3325
## isotype=IgA * iga -8.0856 7.9991 -1.01 0.3121
## isotype=IgG * iga -0.1215 0.7904 -0.15 0.8778
## isotype=Light-chain * iga -1.9424 1.2368 -1.57 0.1163
## isotype=IgA * iga' 35.2880 35.1642 1.00 0.3156
## isotype=IgG * iga' 0.6161 5.0823 0.12 0.9035
## isotype=Light-chain * iga' 11.8309 7.6353 1.55 0.1213
## isotype=IgA * iga'' -72.6822 72.3738 -1.00 0.3153
## isotype=IgG * iga'' -2.1411 11.7368 -0.18 0.8552
## isotype=Light-chain * iga'' -27.9336 17.3857 -1.61 0.1081
## isotype=IgA * igm -1.4036 3.3699 -0.42 0.6770
## isotype=IgG * igm 2.2873 2.9481 0.78 0.4378
## isotype=Light-chain * igm 4.0732 3.7254 1.09 0.2742
## isotype=IgA * igm' 15.3341 36.7361 0.42 0.6764
## isotype=IgG * igm' -32.8379 31.1566 -1.05 0.2919
## isotype=Light-chain * igm' -49.8770 42.3164 -1.18 0.2385
## isotype=IgA * igm'' -32.8011 74.9076 -0.44 0.6615
## isotype=IgG * igm'' 68.5498 62.5222 1.10 0.2729
## isotype=Light-chain * igm'' 99.4476 86.8224 1.15 0.2520
##
Importantly for all further comparisons, both models must be shown to be equally well calibrated, as we would heavily favor a calibrated over a poorly calibrated model. What constitutes ‘equally well calibrated’ is subjective, and is best judged by examining calibration plots for both the interaction model and the non-interaction models. The interaction model has a fairly noticeable dip in calibration over moderate risk estimates. This immediately raises considerable doubt that a model that includes interaction terms is useful, compared the the pre-specified models without.
plot(calibrate(fit_pred_inter))
##
## Divergence or singularity in 2 samples
##
## n=1043 Mean absolute error=0.009 Mean squared error=0.00016
## 0.9 Quantile of absolute error=0.024
Next we fit the non-interaction model (the pre-specified model).
fit_pred <- lrm(outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) + rcs(m_protein, 4) +
rcs(flc_ratio, 4) + isotype, data = df_pred, x = TRUE, y = TRUE)
As seen below the model likelihood ratio chi-square statistic is 401.11 on 18 degrees of freedom, with an R squared of 0.358 and a concordance index of 0.766.
fit_pred
## Logistic Regression Model
##
## lrm(formula = outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm,
## 4) + rcs(m_protein, 4) + rcs(flc_ratio, 4) + isotype, data = df_pred,
## x = TRUE, y = TRUE)
##
##
## Frequencies of Responses
##
## 0-4% 5-9% 10-14% >15%
## 455 430 91 67
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 1043 LR chi2 401.11 R2 0.358 C 0.766
## max |deriv| 4e-08 d.f. 18 R2(18,1043)0.307 Dxy 0.533
## Pr(> chi2) <0.0001 R2(18,882.4)0.352 gamma 0.533
## Brier 0.085 tau-a 0.335
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=5-9% 4.2561 0.8788 4.84 <0.0001
## y>=10-14% 1.5889 0.8686 1.83 0.0674
## y>=>15% 0.1966 0.8704 0.23 0.8213
## igg -0.3544 0.0937 -3.78 0.0002
## igg' 1.5930 0.4062 3.92 <0.0001
## igg'' -4.4209 1.1912 -3.71 0.0002
## iga -0.1330 0.2827 -0.47 0.6380
## iga' 2.7083 1.7328 1.56 0.1181
## iga'' -6.0916 3.9008 -1.56 0.1184
## igm 1.0890 0.7754 1.40 0.1602
## igm' -16.7238 8.5061 -1.97 0.0493
## igm'' 34.2726 17.1540 2.00 0.0457
## m_protein -0.6602 0.1832 -3.60 0.0003
## m_protein' 46.0506 11.3215 4.07 <0.0001
## m_protein'' -57.9203 14.3889 -4.03 <0.0001
## flc_ratio -2.1258 0.3555 -5.98 <0.0001
## flc_ratio' 38.5632 6.5415 5.90 <0.0001
## flc_ratio'' -90.7608 15.5093 -5.85 <0.0001
## isotype=IgA -0.2766 0.2610 -1.06 0.2892
## isotype=IgG -0.5350 0.2038 -2.63 0.0087
## isotype=Light-chain -0.5640 0.3257 -1.73 0.0833
##
The claibration of the non-interaction model is noticeably better.
plot(calibrate(fit_pred))
##
## n=1043 Mean absolute error=0.011 Mean squared error=0.00016
## 0.9 Quantile of absolute error=0.019
ggplot(
Predict(
fit_pred_inter,
kint = 2,
igg, isotype,
ref.zero = TRUE, fun = exp
), adj.subtitle = FALSE) +
geom_hline(yintercept = 1, lty = 2) +
geom_vline(xintercept = 6.1, lty = 2) +
geom_vline(xintercept = 15.7, lty = 2) +
geom_rug(data = df_pred, aes(x = igg, y = NULL)) +
scale_y_continuous(breaks = c(0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
scale_x_continuous(limits = c(5, 22)) +
coord_trans(y = "log10", ylim = c(0.1, 32)) +
labs(x = "IgG (g/L)", y = "OR of 10% BMPC or more", title = "Total IgG") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(
Predict(
fit_pred_inter,
kint = 2,
igg, isotype,
fun = plogis
), adj.subtitle = FALSE) +
coord_cartesian(ylim = c(0, 0.25)) +
geom_vline(xintercept = 6.1, lty = 2) +
geom_vline(xintercept = 15.7, lty = 2) +
geom_rug(data = df_pred, aes(x = igg, y = NULL)) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(limits = c(5, 22)) +
labs(x = "IgG (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgG") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(
Predict(
fit_pred_inter,
kint = 2,
igm, isotype,
ref.zero = TRUE, fun = exp
)) +
geom_hline(yintercept = 1, lty = 2) +
geom_vline(xintercept = 0.4, lty = 2) +
geom_vline(xintercept = 2.3, lty = 2) +
geom_rug(data = df_pred, aes(x = igm, y = NULL)) +
scale_y_continuous(breaks = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
scale_x_continuous(limits = c(NA, 8)) +
coord_trans(y = "log10", ylim = c(0.01, 32)) +
labs(x = "IgM (g/L)", y = "OR of 10% BMPC or more", title = "Total IgM") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(
Predict(
fit_pred_inter,
kint = 2,
igm, isotype,
fun = plogis
)) +
coord_cartesian(ylim = c(0, 0.1)) +
geom_vline(xintercept = 0.4, lty = 2) +
geom_vline(xintercept = 2.3, lty = 2) +
geom_rug(data = df_pred, aes(x = igm, y = NULL)) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(limits = c(NA, 8)) +
labs(x = "IgM (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgM") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(
Predict(
fit_pred_inter,
kint = 2,
iga, isotype,
ref.zero = TRUE, fun = exp
), adj.subtitle = FALSE) +
geom_hline(yintercept = 1, lty = 2) +
geom_vline(xintercept = 0.7, lty = 2) +
geom_vline(xintercept = 3.7, lty = 2) +
geom_rug(data = df_pred, aes(x = iga, y = NULL)) +
scale_y_continuous(breaks = c(0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
scale_x_continuous(limits = c(NA, 10)) +
coord_trans(y = "log10", ylim = c(0.1, 32)) +
labs(x = "IgA (g/L)", y = "OR of 10% BMPC or more", title = "Total IgA") +
theme_bw() +
theme(legend.position = "bottom")
ggplot(
Predict(
fit_pred_inter,
kint = 2,
iga, isotype,
fun = plogis
), adj.subtitle = FALSE) +
coord_cartesian(ylim = c(0, 0.2)) +
geom_vline(xintercept = 0.4, lty = 2) +
geom_vline(xintercept = 2.3, lty = 2) +
geom_rug(data = df_pred, aes(x = iga, y = NULL)) +
scale_y_continuous(labels = scales::percent) +
scale_x_continuous(limits = c(NA, 6)) +
labs(x = "IgA (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgA") +
theme_bw() +
theme(legend.position = "bottom")
Because the non-interaction model is nested within the interaction model, we can use the likelihood ratio test to test the null hypothesis that both models fit the data equally well, and therefore that the interaction terms are redundant. As seen below, we do not find strong evidence against the null hypothesis.
lrtest(fit_pred, fit_pred_inter)
##
## Model 1: outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) + rcs(m_protein,
## 4) + rcs(flc_ratio, 4) + isotype
## Model 2: outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) +
## rcs(m_protein, 4) + rcs(flc_ratio, 4) + rcs(igg, 4) * isotype +
## rcs(iga, 4) * isotype + rcs(igm, 4) * isotype
##
## L.R. Chisq d.f. P
## 40.06057799 27.00000000 0.05057213
Although not ‘significant’, we can quantify the degree to which the interaction model fits the data better, by calculating the adequacy of the non-interaction model compared to the non-linear model using the models’ likelihood ratio Chi-squared statistics.
pre_test_lr <- fit_pred$stats["Model L.R."] # non-interaction likelihood chi-squared statistic
post_test_lr <- fit_pred_inter$stats["Model L.R."] # interactoni likelihood chi-suqared statistic
Below is a measure of the adequacy of the linear model compared to the non-linear model, 0.909 out of a maximum of 1
add_base <- pre_test_lr/post_test_lr
add_base
## Model L.R.
## 0.9091952
In other words, the fraction of new information provided by interaction model by this metric is 0.09, which is substantial.
frac_new <- 1 - add_base
frac_new
## Model L.R.
## 0.09080478
The contribution of the interaction components of total IgG, IgA and IgM (or alternatively MGUS isotype) can be shown for each immunoglobulin individually, with the overall interaction likelihood ratio Chi-squared statistic of 38.69 on 27 degrees of freedom approximating the overall likelihood ratio test above.
anova(fit_pred_inter)
## Wald Statistics Response: outcome
##
## Factor Chi-Square d.f. P
## isotype (Factor+Higher Order Factors) 45.66 30 0.0335
## All Interactions 38.69 27 0.0675
## igg (Factor+Higher Order Factors) 30.33 12 0.0025
## All Interactions 14.99 9 0.0911
## Nonlinear (Factor+Higher Order Factors) 15.68 8 0.0472
## iga (Factor+Higher Order Factors) 48.08 12 <.0001
## All Interactions 9.14 9 0.4246
## Nonlinear (Factor+Higher Order Factors) 8.73 8 0.3652
## igm (Factor+Higher Order Factors) 21.85 12 0.0393
## All Interactions 7.14 9 0.6221
## Nonlinear (Factor+Higher Order Factors) 12.01 8 0.1508
## m_protein 42.44 3 <.0001
## Nonlinear 23.27 2 <.0001
## flc_ratio 49.17 3 <.0001
## Nonlinear 31.82 2 <.0001
## isotype * igg (Factor+Higher Order Factors) 14.99 9 0.0911
## Nonlinear 8.17 6 0.2257
## Nonlinear Interaction : f(A,B) vs. AB 8.17 6 0.2257
## isotype * iga (Factor+Higher Order Factors) 9.14 9 0.4246
## Nonlinear 5.10 6 0.5313
## Nonlinear Interaction : f(A,B) vs. AB 5.10 6 0.5313
## isotype * igm (Factor+Higher Order Factors) 7.14 9 0.6221
## Nonlinear 6.76 6 0.3435
## Nonlinear Interaction : f(A,B) vs. AB 6.76 6 0.3435
## TOTAL NONLINEAR 91.42 28 <.0001
## TOTAL INTERACTION 38.69 27 0.0675
## TOTAL NONLINEAR + INTERACTION 129.43 37 <.0001
## TOTAL 339.47 45 <.0001
The question remains whether the potentially increased fit of the interaction model to the data compared to the non-interaction model is worthwhile, given its added complexity. Akaike’s information criterion provides a method for penalizing the log likelihood achieved by a given model for its complexity to obtain a more unbiased assessment of the model’s worth. The penalty is to add twice the number ofparameters to the −2 log likelihood. On this metric, the model with the lower AIC should generally be favored. In this case, the non-interaction model has a considerably lower AIC.
AIC(fit_pred_inter)
## [1] 1983.497
AIC(fit_pred)
## [1] 1969.557
Next we compare the variances of the predicted risks from the non-interaction and interaction models. In the prediction setting, increased variance in risk estimates is generally desirable. As shown below, the variance is almost identical between the linear and non-linear models and the high-resolution histogram of predicted risks is visually identical.
pre <- predict(fit_pred, type='fitted')
pre <- pre[,2]
post <- predict(fit_pred_inter, type='fitted')
post <- post[,2]
par(mgp=c(4,1,0), mar=c(6,5,2,2))
xlab <- c(
paste0("Non-interaction\nvariance =", round(var(pre), 3)),
paste0("Interaction\nvariance=",round(var(post), 3))
)
histbackback(pre, post, brks=seq(0, 1, by=0.005), xlab=xlab, ylab= "Proportion (%) having \n≥10% BMPC")
We show that the relative explained variation of predicted risk of the non-interaction model compared to the interaction model is 0.907 (theoretically out of a maximum of 1)
var_pre <- round(var(pre), 3) #linear
var_post <- round(var(post), 3) #non-linear
relativ_expl <- var_pre/var_post
relativ_expl
## [1] 0.9069767
In other words, the fraction of new information contained in the interaction model by this metric is only 0.093
frac_new2 <- 1 - relativ_expl
frac_new2
## [1] 0.09302326
We can show that the predicted risks between the non-interaction and interaction models differ most at moderate predicted risks.
plot(pre, post-pre, xlab='Predicted risk (non-interaction)',
ylab='Difference in predicted risks (interaction - non-interaction)', pch=46)
lo <- Rq(post-pre ~ rcs(pre, 7), tau=0.1) # 0.1 quantile
hi <- Rq(post-pre ~ rcs(pre, 7), tau=0.9) # 0.9 quantile
at <- seq(0, 1, length=200)
lines(at, Predict(lo, pre=at)$yhat, col='red', lwd=1.5)
lines(at, Predict(hi, pre=at)$yhat, col='red', lwd=1.5)
abline(h=0, col='red', lty = 2)
Finally, as this is a prediction model that we intend for clinical use, we should be satisfied that the decision curves associated with both models are similar. If not, we should heavily favor the model that leads to increased net-benefit over plausible risk thresholds.
df_interaction <- df_pred # temporary dataframe only used for this script
Function(fit_pred) # we extract the linear predictor from the non-linear model first
## function(igg = 10.21,iga = 2.167,igm = 0.718,m_protein = 1.91,flc_ratio = 1.237794,isotype = "IgG") {-0.35437745* igg+0.012363328*pmax(igg-6.111,0)^3-0.034311919*pmax(igg-9.2,0)^3+0.023794703*pmax(igg-11.446,0)^3-0.0018461125*pmax(igg-17.462,0)^3-0.13301362* iga+0.098986509*pmax(iga-0.5914,0)^3-0.22264315*pmax(iga-1.641,0)^3+0.13057034*pmax(iga-2.6581,0)^3-0.006913695*pmax(iga-5.8221,0)^3+1.0889587* igm-3.1076573*pmax(igm-0.2561,0)^3+6.3686218*pmax(igm-0.5751,0)^3-3.3346563*pmax(igm-0.9166,0)^3+0.073691742*pmax(igm-2.5759,0)^3-0.66018976* m_protein+0.44395199*pmax(m_protein-0.481261,0)^3-0.55838218*pmax(m_protein-0.881164,0)^3+0.11511927*pmax(m_protein-2.48194,0)^3-0.00068906911*pmax(m_protein-10.665996,0)^3-2.1257973* flc_ratio+0.73647737*pmax(flc_ratio-0.2041965,0)^3-1.7333446*pmax(flc_ratio-1.0186109,0)^3+1.0005139*pmax(flc_ratio-1.6415064,0)^3-0.0036466553*pmax(flc_ratio-7.440332,0)^3-0.27663453*(isotype=="IgA")-0.53499524*(isotype=="IgG")-0.56401632*(isotype=="Light-chain") }
## <environment: 0x7fe9fdb959a0>
df_interaction$noninteraction_pred <- with(
df_interaction,
expit(
1.5888896 -0.35437745 * igg + 0.012363328 * pmax(igg - 6.111, 0) ^ 3 - 0.034311919 *
pmax(igg - 9.2, 0) ^ 3 + 0.023794703 * pmax(igg - 11.446, 0) ^ 3 - 0.0018461125 *
pmax(igg - 17.462, 0) ^ 3 - 0.13301362 * iga + 0.098986509 * pmax(iga - 0.5914, 0) ^ 3
- 0.22264315 * pmax(iga - 1.641, 0) ^ 3 + 0.13057034 * pmax(iga - 2.6581, 0) ^ 3
- 0.006913695 * pmax(iga - 5.8221, 0) ^ 3 + 1.0889587 * igm -
3.1076573 * pmax(igm - 0.2561, 0) ^ 3 + 6.3686218 * pmax(igm - 0.5751, 0) ^
3 - 3.3346563 * pmax(igm - 0.9166, 0) ^ 3 + 0.073691742 * pmax(igm - 2.5759, 0) ^
3 - 0.66018976 * m_protein + 0.44395199 * pmax(m_protein - 0.481261, 0) ^
3 - 0.55838218 * pmax(m_protein - 0.881164, 0) ^ 3 + 0.11511927 * pmax(m_protein - 2.48194, 0) ^ 3
- 0.00068906911 * pmax(m_protein - 10.665996, 0) ^ 3 - 2.1257973 * flc_ratio + 0.73647737 *
pmax(flc_ratio - 0.2041965, 0) ^ 3 - 1.7333446 * pmax(flc_ratio - 1.0186109, 0) ^ 3
+ 1.0005139 * pmax(flc_ratio - 1.6415064, 0) ^ 3 - 0.0036466553 *
pmax(flc_ratio - 7.440332, 0) ^ 3 - 0.27663453 * (isotype == "IgA") - 0.53499524 *
(isotype == "IgG") - 0.56401632 * (isotype == "Light-chain")
)
)
Function(fit_pred_inter) # next we extract the linear predictor from the linear model
## function(isotype = "IgG",igg = 10.21,iga = 2.167,igm = 0.718,m_protein = 1.91,flc_ratio = 1.237794) {+13.857739*(isotype=="IgA")-2.9443755*(isotype=="IgG")+7.7250492*(isotype=="Light-chain")-0.24880468* igg+0.014136849*pmax(igg-6.111,0)^3-0.043850706*pmax(igg-9.2,0)^3+0.033548396*pmax(igg-11.446,0)^3-0.00383454*pmax(igg-17.462,0)^3-0.0052641133* iga+0.082960416*pmax(iga-0.5914,0)^3-0.17746379*pmax(iga-1.641,0)^3+0.097361815*pmax(iga-2.6581,0)^3-0.002858446*pmax(iga-5.8221,0)^3+0.039156474* igm-0.17936969*pmax(igm-0.2561,0)^3+0.32367647*pmax(igm-0.5751,0)^3-0.13952274*pmax(igm-0.9166,0)^3-0.0047840447*pmax(igm-2.5759,0)^3-0.68313533* m_protein+0.43617459*pmax(m_protein-0.481261,0)^3-0.5485312*pmax(m_protein-0.881164,0)^3+0.11302013*pmax(m_protein-2.48194,0)^3-0.00066351445*pmax(m_protein-10.665996,0)^3-2.0197183* flc_ratio+0.70769306*pmax(flc_ratio-0.2041965,0)^3-1.6680227*pmax(flc_ratio-1.0186109,0)^3+0.96409409*pmax(flc_ratio-1.6415064,0)^3-0.0037644823*pmax(flc_ratio-7.440332,0)^3+(isotype=="IgA")*(-0.018482177* igg-0.0088176783*pmax(igg-6.111,0)^3+0.031805606*pmax(igg-9.2,0)^3-0.027042627*pmax(igg-11.446,0)^3+0.0040547002*pmax(igg-17.462,0)^3)+(isotype=="IgG")*(0.24670448* igg-0.0091979458*pmax(igg-6.111,0)^3+0.028960155*pmax(igg-9.2,0)^3-0.022417373*pmax(igg-11.446,0)^3+0.002655164*pmax(igg-17.462,0)^3)+(isotype=="Light-chain")*(-0.88031872* igg+0.016809742*pmax(igg-6.111,0)^3-0.037924311*pmax(igg-9.2,0)^3+0.020366237*pmax(igg-11.446,0)^3+0.0007483327*pmax(igg-17.462,0)^3)+(isotype=="IgA")*(-8.0855649* iga+1.289755*pmax(iga-0.5914,0)^3-2.6564934*pmax(iga-1.641,0)^3+1.3782373*pmax(iga-2.6581,0)^3-0.011498959*pmax(iga-5.8221,0)^3)+(isotype=="IgG")*(-0.12151831* iga+0.022518773*pmax(iga-0.5914,0)^3-0.078256398*pmax(iga-1.641,0)^3+0.066184855*pmax(iga-2.6581,0)^3-0.01044723*pmax(iga-5.8221,0)^3)+(isotype=="Light-chain")*(-1.9424433* iga+0.43241092*pmax(iga-0.5914,0)^3-1.0209564*pmax(iga-1.641,0)^3+0.63429489*pmax(iga-2.6581,0)^3-0.045749404*pmax(iga-5.8221,0)^3)+(isotype=="IgA")*(-1.4035868* igm+2.8494307*pmax(igm-0.2561,0)^3-6.0951884*pmax(igm-0.5751,0)^3+3.3659637*pmax(igm-0.9166,0)^3-0.12020603*pmax(igm-2.5759,0)^3)+(isotype=="IgG")*(2.2872933* igm-6.1020261*pmax(igm-0.2561,0)^3+12.738116*pmax(igm-0.5751,0)^3-6.828749*pmax(igm-0.9166,0)^3+0.19265866*pmax(igm-2.5759,0)^3)+(isotype=="Light-chain")*(4.0731984* igm-9.2682911*pmax(igm-0.2561,0)^3+18.479621*pmax(igm-0.5751,0)^3-9.3252841*pmax(igm-0.9166,0)^3+0.11395424*pmax(igm-2.5759,0)^3) }
## <environment: 0x7fe983a60af0>
df_interaction$inter_pred <- with(
df_interaction,
expit(
0.367516938 +13.857739 * (isotype == "IgA") - 2.9443755 * (isotype == "IgG") + 7.7250492 *
(isotype == "Light-chain") - 0.24880468 * igg + 0.014136849 * pmax(igg - 6.111, 0) ^ 3
- 0.043850706 * pmax(igg - 9.2, 0) ^ 3 + 0.033548396 * pmax(igg - 11.446, 0) ^ 3
- 0.00383454 * pmax(igg - 17.462, 0) ^ 3 - 0.0052641133 * iga +
0.082960416 * pmax(iga - 0.5914, 0) ^ 3 - 0.17746379 * pmax(iga - 1.641, 0) ^
3 + 0.097361815 * pmax(iga - 2.6581, 0) ^ 3 - 0.002858446 * pmax(iga - 5.8221, 0) ^
3 + 0.039156474 * igm - 0.17936969 * pmax(igm - 0.2561, 0) ^ 3 + 0.32367647 *
pmax(igm - 0.5751, 0) ^ 3 - 0.13952274 * pmax(igm - 0.9166, 0) ^ 3 - 0.0047840447 *
pmax(igm - 2.5759, 0) ^ 3 - 0.68313533 * m_protein + 0.43617459 * pmax(m_protein - 0.481261, 0) ^ 3
- 0.5485312 * pmax(m_protein - 0.881164, 0) ^ 3 + 0.11302013 * pmax(m_protein - 2.48194, 0)
^ 3 - 0.00066351445 * pmax(m_protein - 10.665996, 0) ^3 - 2.0197183 * flc_ratio +
0.70769306 * pmax(flc_ratio - 0.2041965, 0) ^3 - 1.6680227 * pmax(flc_ratio - 1.0186109, 0) ^ 3
+ 0.96409409 * pmax(flc_ratio -1.6415064, 0) ^ 3 - 0.0037644823 * pmax(flc_ratio - 7.440332, 0) ^ 3
+ (isotype =="IgA") * (-0.018482177 * igg - 0.0088176783 * pmax(igg - 6.111, 0) ^ 3 +
0.031805606 * pmax(igg - 9.2, 0) ^ 3 - 0.027042627 * pmax(igg - 11.446, 0) ^ 3
+ 0.0040547002 *pmax(igg - 17.462, 0) ^ 3) + (isotype == "IgG") *
(0.24670448 * igg - 0.0091979458 * pmax(igg - 6.111, 0) ^ 3 + 0.028960155 * pmax(igg - 9.2, 0) ^ 3
- 0.022417373 * pmax(igg - 11.446, 0) ^ 3 + 0.002655164 * pmax(igg - 17.462, 0) ^ 3)
+ (isotype == "Light-chain") * (-0.88031872 * igg + 0.016809742 * pmax(igg - 6.111, 0) ^ 3 - 0.037924311 *
pmax(igg - 9.2, 0) ^ 3 + 0.020366237 * pmax(igg - 11.446, 0) ^ 3 + 0.0007483327 * pmax(igg - 17.462, 0) ^ 3
) + (isotype == "IgA") * (-8.0855649 * iga + 1.289755 * pmax(iga - 0.5914, 0) ^ 3
- 2.6564934 * pmax(iga -1.641, 0) ^ 3 + 1.3782373 * pmax(iga - 2.6581, 0) ^ 3 - 0.011498959 *
pmax(iga -5.8221, 0) ^ 3) + (isotype == "IgG") * (
-0.12151831 * iga + 0.022518773 * pmax(iga - 0.5914, 0) ^ 3 - 0.078256398 *
pmax(iga - 1.641, 0) ^ 3 + 0.066184855 * pmax(iga - 2.6581, 0) ^ 3 - 0.01044723 *
pmax(iga - 5.8221, 0) ^ 3
) + (isotype == "Light-chain") * (
-1.9424433 * iga + 0.43241092 * pmax(iga - 0.5914, 0) ^ 3 - 1.0209564 *
pmax(iga - 1.641, 0) ^ 3 + 0.63429489 * pmax(iga - 2.6581, 0) ^ 3 - 0.045749404 *
pmax(iga - 5.8221, 0) ^ 3
) + (isotype == "IgA") * (
-1.4035868 * igm + 2.8494307 *
pmax(igm - 0.2561, 0) ^ 3 - 6.0951884 * pmax(igm -0.5751, 0) ^ 3
+ 3.3659637 * pmax(igm - 0.9166, 0) ^ 3 - 0.12020603 * pmax(igm -2.5759, 0) ^ 3
) + (isotype == "IgG") * (2.2872933 * igm - 6.1020261 * pmax(igm - 0.2561, 0) ^ 3
+ 12.738116 * pmax(igm -0.5751, 0) ^ 3 - 6.828749 * pmax(igm - 0.9166, 0) ^ 3 + 0.19265866 * pmax(igm -2.5759, 0) ^ 3
) + (isotype == "Light-chain") * (4.0731984 * igm - 9.2682911 * pmax(igm - 0.2561, 0) ^ 3
+ 18.479621 * pmax(igm -0.5751, 0) ^ 3 - 9.3252841 * pmax(igm - 0.9166, 0) ^ 3 + 0.11395424 * pmax(igm -2.5759, 0) ^ 3)))
We create the decision curve analysis models using the dca pacakge. First we examine the net-benefit of the non-interaction and interaction models in a opt-out strategy, in which net benefit is expressed in the relative proportion of true negatives. As demonstrated by the resulting decision curve analysis figure, there is no perceivable difference in net benefit between the two models in this setting.
noninteraction_model <- decision_curve(perc_10 ~ noninteraction_pred,
data = df_interaction,
policy = 'opt-out',
fitted = TRUE,
thresholds = seq(0, 1, by = .001),
bootstraps = 500)
interaction_model <- decision_curve(perc_10 ~ inter_pred,
data = df_interaction,
policy = 'opt-out',
fitted = TRUE,
thresholds = seq(0, 1, by = .001),
bootstraps = 500)
noninteraction_model$derived.data %>%
select(thresholds, NB, NB_lower, NB_upper, model) %>%
union_all(
interaction_model$derived.data %>%
select(thresholds, NB, NB_lower, NB_upper, model) %>%
filter(model == "perc_10 ~ inter_pred")
) %>%
mutate(model = factor(model, levels = c("perc_10 ~ noninteraction_pred", "perc_10 ~ inter_pred", "All", "None"))) %>%
ggplot(aes(x = thresholds, y = NB, color = model, fill = model)) +
geom_line(size = 1.1, position = position_dodge(width = 0.005)) +
geom_vline(aes(xintercept = 0.05), lty = 2) +
geom_vline(aes(xintercept = 0.10), lty = 2) +
scale_color_manual(
name = "Follow-up strategy:",
labels = c(
"Defer based on non-interaction model",
"Defer based on interaction model",
"Obtain sample from all",
"Obtain sample from none"
),
values = c(nejm_palette[c(2, 3)], "black", nejm_palette[c(1)])
) +
scale_x_continuous(breaks = seq(0, 1, by = 0.02), labels = scales::percent_format(accuracy = 1)) +
coord_cartesian(xlim = c(0, 0.25), ylim = c(-0.2, 0.7)) +
labs(x = NULL, y = "Net Benefit") +
theme_bw() +
theme(
legend.title = element_blank(),
legend.background = element_rect(fill = NA),
panel.grid.minor.x = element_blank(),
legend.position = c(0.2, 0.82)
)
This non-difference holds in the setting of an opt-in strategy, in which the net-benefit is expressed in the units of true positives.
noninteraction_model <- decision_curve(perc_10 ~ noninteraction_pred,
data = df_interaction,
policy = 'opt-in',
fitted = TRUE,
thresholds = seq(0, 1, by = .001),
bootstraps = 500)
interaction_model <- decision_curve(perc_10 ~ inter_pred,
data = df_interaction,
policy = 'opt-in',
fitted = TRUE,
thresholds = seq(0, 1, by = .001),
bootstraps = 500)
noninteraction_model$derived.data %>%
select(thresholds, NB, NB_lower, NB_upper, model) %>%
union_all(
interaction_model$derived.data %>%
select(thresholds, NB, NB_lower, NB_upper, model) %>%
filter(model == "perc_10 ~ inter_pred")
) %>%
mutate(model = factor(model, levels = c("perc_10 ~ noninteraction_pred", "perc_10 ~ inter_pred", "All", "None"))) %>%
ggplot(aes(x = thresholds, y = NB, color = model, fill = model)) +
geom_line(size = 1.1, position = position_dodge(width = 0.005)) +
geom_vline(aes(xintercept = 0.05), lty = 2) +
geom_vline(aes(xintercept = 0.10), lty = 2) +
scale_color_manual(
name = "Follow-up strategy:",
labels = c(
"Defer based on non-linear model",
"Defer based on linear model",
"Obtain sample from all",
"Obtain sample from none"
),
values = c(nejm_palette[c(2, 3)], "black", nejm_palette[c(1)])
) +
scale_x_continuous(breaks = seq(0, 1, by = 0.02), labels = scales::percent_format(accuracy = 1)) +
coord_cartesian(xlim = c(0, 0.25), ylim = c(-0.2, 0.5)) +
labs(x = NULL, y = "Net Benefit") +
theme_bw() +
theme(
legend.title = element_blank(),
legend.background = element_rect(fill = NA),
panel.grid.minor.x = element_blank(),
legend.position = c(0.2, 0.82)
)
We have shown that the model that includes an interaction term between MGUS isotype and total IgG, IgA and IgM: i) is visually less well calibrated compared to the non-interaction model ii) does not fit the data significantly better on the likelihood ratio test at an alpha of 0.05, but provides roughly 9% increase in the fraction of new information on this metric compared to the non-interaction model model. iii) has a higher Akaike’s information criterion than does the non-interaction model iv) slightly increases the variance of predicted risk, providing a roughly 9% increase in the fraction of new information on this metric compared to the non-interaction model v) has similar net-benefit on decision curve analysis.
Given that the interaction model has poorer calibration and does not result in a definitive increase in net-benefit on decision curve analysis, despite more than doabling the number of predictors, we do not find any compelling reason to include interaction terms in the prediction model.