1 Interaction between isotype and total IgG, IgA and IgM

This script assumes that you have already run mgus_bm_prediction_20231125. This analysis has been completed in response to the revision request by the Annals of Internal Medicine, and will explore possible interaction effects between MGUS isotype and total IgG, IgA and IgM and compare this to the

First we fit the newly hypothesized ‘full’ model that includes interactions between total IgG, IgA and IgM and MGUS isotype.

fit_pred_inter <- lrm(outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) +
                        rcs(m_protein, 4) + rcs(flc_ratio, 4) +
                        rcs(igg, 4) * isotype + rcs(iga, 4) * isotype + 
                        rcs(igm, 4) * isotype, data = df_pred, 
                      x = TRUE, y = TRUE)

As seen below the model likelihood ratio chi-squared statistic is 441.17 on 45 degrees of freedom, with an R squared of 0.386 and a concordance index of 0.779. It should be noted that this is more than double the degrees of freedom of the current model and much larger than any sample size calculations could possibly justify.

fit_pred_inter

## Logistic Regression Model
##  
##  lrm(formula = outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) + 
##      rcs(igm, 4) + rcs(m_protein, 4) + rcs(flc_ratio, 4) + rcs(igg, 
##      4) * isotype + rcs(iga, 4) * isotype + rcs(igm, 4) * isotype, 
##      data = df_pred, x = TRUE, y = TRUE)
##  
##  
##  Frequencies of Responses
##  
##    0-4%   5-9% 10-14%   >15% 
##     455    430     91     67 
##  
##                         Model Likelihood       Discrimination    Rank Discrim.    
##                               Ratio Test              Indexes          Indexes    
##  Obs          1043    LR chi2     441.17       R2       0.386    C       0.779    
##  max |deriv| 6e-07    d.f.            45     R2(45,1043)0.316    Dxy     0.558    
##                       Pr(> chi2) <0.0001    R2(45,882.4)0.362    gamma   0.558    
##                                                Brier    0.082    tau-a   0.351    
##  
##                              Coef     S.E.    Wald Z Pr(>|Z|)
##  y>=5-9%                       3.1230  2.3381  1.34  0.1816  
##  y>=10-14%                     0.3675  2.3357  0.16  0.8750  
##  y>=>15%                      -1.0835  2.3379 -0.46  0.6430  
##  isotype=IgA                  13.8577 12.7418  1.09  0.2768  
##  isotype=IgG                  -2.9444  2.7949 -1.05  0.2921  
##  isotype=Light-chain           7.7250  3.3305  2.32  0.0204  
##  igg                          -0.2488  0.2392 -1.04  0.2983  
##  igg'                          1.8215  1.0768  1.69  0.0907  
##  igg''                        -5.6500  3.2514 -1.74  0.0823  
##  iga                          -0.0053  0.7176 -0.01  0.9941  
##  iga'                          2.2698  4.4861  0.51  0.6129  
##  iga''                        -4.8554 10.2268 -0.47  0.6349  
##  igm                           0.0392  2.7513  0.01  0.9886  
##  igm'                         -0.9653 28.5590 -0.03  0.9730  
##  igm''                         1.7419 57.0128  0.03  0.9756  
##  m_protein                    -0.6831  0.1873 -3.65  0.0003  
##  m_protein'                   45.2439 11.5998  3.90  <0.0001 
##  m_protein''                 -56.8985 14.7544 -3.86  0.0001  
##  flc_ratio                    -2.0197  0.3664 -5.51  <0.0001 
##  flc_ratio'                   37.0560  6.7460  5.49  <0.0001 
##  flc_ratio''                 -87.3404 15.9932 -5.46  <0.0001 
##  isotype=IgA * igg            -0.0185  0.2925 -0.06  0.9496  
##  isotype=IgG * igg             0.2467  0.2992  0.82  0.4096  
##  isotype=Light-chain * igg    -0.8803  0.4241 -2.08  0.0379  
##  isotype=IgA * igg'           -1.1361  1.4319 -0.79  0.4275  
##  isotype=IgG * igg'           -1.1851  1.2699 -0.93  0.3507  
##  isotype=Light-chain * igg'    2.1659  1.7466  1.24  0.2150  
##  isotype=IgA * igg''           4.0980  4.4503  0.92  0.3571  
##  isotype=IgG * igg''           3.7314  3.7537  0.99  0.3202  
##  isotype=Light-chain * igg''  -4.8864  5.0425 -0.97  0.3325  
##  isotype=IgA * iga            -8.0856  7.9991 -1.01  0.3121  
##  isotype=IgG * iga            -0.1215  0.7904 -0.15  0.8778  
##  isotype=Light-chain * iga    -1.9424  1.2368 -1.57  0.1163  
##  isotype=IgA * iga'           35.2880 35.1642  1.00  0.3156  
##  isotype=IgG * iga'            0.6161  5.0823  0.12  0.9035  
##  isotype=Light-chain * iga'   11.8309  7.6353  1.55  0.1213  
##  isotype=IgA * iga''         -72.6822 72.3738 -1.00  0.3153  
##  isotype=IgG * iga''          -2.1411 11.7368 -0.18  0.8552  
##  isotype=Light-chain * iga'' -27.9336 17.3857 -1.61  0.1081  
##  isotype=IgA * igm            -1.4036  3.3699 -0.42  0.6770  
##  isotype=IgG * igm             2.2873  2.9481  0.78  0.4378  
##  isotype=Light-chain * igm     4.0732  3.7254  1.09  0.2742  
##  isotype=IgA * igm'           15.3341 36.7361  0.42  0.6764  
##  isotype=IgG * igm'          -32.8379 31.1566 -1.05  0.2919  
##  isotype=Light-chain * igm'  -49.8770 42.3164 -1.18  0.2385  
##  isotype=IgA * igm''         -32.8011 74.9076 -0.44  0.6615  
##  isotype=IgG * igm''          68.5498 62.5222  1.10  0.2729  
##  isotype=Light-chain * igm''  99.4476 86.8224  1.15  0.2520  
##

Importantly for all further comparisons, both models must be shown to be equally well calibrated, as we would heavily favor a calibrated over a poorly calibrated model. What constitutes ‘equally well calibrated’ is subjective, and is best judged by examining calibration plots for both the interaction model and the non-interaction models. The interaction model has a fairly noticeable dip in calibration over moderate risk estimates. This immediately raises considerable doubt that a model that includes interaction terms is useful, compared the the pre-specified models without.

plot(calibrate(fit_pred_inter))

## 
## Divergence or singularity in 2 samples

## 
## n=1043   Mean absolute error=0.009   Mean squared error=0.00016
## 0.9 Quantile of absolute error=0.024

Next we fit the non-interaction model (the pre-specified model).

fit_pred <- lrm(outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) + rcs(m_protein, 4) +
                  rcs(flc_ratio, 4) + isotype, data = df_pred, x = TRUE, y = TRUE)

As seen below the model likelihood ratio chi-square statistic is 401.11 on 18 degrees of freedom, with an R squared of 0.358 and a concordance index of 0.766.

fit_pred

## Logistic Regression Model
##  
##  lrm(formula = outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 
##      4) + rcs(m_protein, 4) + rcs(flc_ratio, 4) + isotype, data = df_pred, 
##      x = TRUE, y = TRUE)
##  
##  
##  Frequencies of Responses
##  
##    0-4%   5-9% 10-14%   >15% 
##     455    430     91     67 
##  
##                         Model Likelihood       Discrimination    Rank Discrim.    
##                               Ratio Test              Indexes          Indexes    
##  Obs          1043    LR chi2     401.11       R2       0.358    C       0.766    
##  max |deriv| 4e-08    d.f.            18     R2(18,1043)0.307    Dxy     0.533    
##                       Pr(> chi2) <0.0001    R2(18,882.4)0.352    gamma   0.533    
##                                                Brier    0.085    tau-a   0.335    
##  
##                      Coef     S.E.    Wald Z Pr(>|Z|)
##  y>=5-9%               4.2561  0.8788  4.84  <0.0001 
##  y>=10-14%             1.5889  0.8686  1.83  0.0674  
##  y>=>15%               0.1966  0.8704  0.23  0.8213  
##  igg                  -0.3544  0.0937 -3.78  0.0002  
##  igg'                  1.5930  0.4062  3.92  <0.0001 
##  igg''                -4.4209  1.1912 -3.71  0.0002  
##  iga                  -0.1330  0.2827 -0.47  0.6380  
##  iga'                  2.7083  1.7328  1.56  0.1181  
##  iga''                -6.0916  3.9008 -1.56  0.1184  
##  igm                   1.0890  0.7754  1.40  0.1602  
##  igm'                -16.7238  8.5061 -1.97  0.0493  
##  igm''                34.2726 17.1540  2.00  0.0457  
##  m_protein            -0.6602  0.1832 -3.60  0.0003  
##  m_protein'           46.0506 11.3215  4.07  <0.0001 
##  m_protein''         -57.9203 14.3889 -4.03  <0.0001 
##  flc_ratio            -2.1258  0.3555 -5.98  <0.0001 
##  flc_ratio'           38.5632  6.5415  5.90  <0.0001 
##  flc_ratio''         -90.7608 15.5093 -5.85  <0.0001 
##  isotype=IgA          -0.2766  0.2610 -1.06  0.2892  
##  isotype=IgG          -0.5350  0.2038 -2.63  0.0087  
##  isotype=Light-chain  -0.5640  0.3257 -1.73  0.0833  
##

The claibration of the non-interaction model is noticeably better.

plot(calibrate(fit_pred))

## 
## n=1043   Mean absolute error=0.011   Mean squared error=0.00016
## 0.9 Quantile of absolute error=0.019

2 Effect differences for the interaction model

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    igg, isotype,
    ref.zero = TRUE, fun = exp
  ), adj.subtitle = FALSE)  +
  geom_hline(yintercept = 1, lty = 2) +
  geom_vline(xintercept = 6.1, lty = 2) +
  geom_vline(xintercept = 15.7, lty = 2) +
  geom_rug(data = df_pred, aes(x = igg, y = NULL)) +
  scale_y_continuous(breaks = c(0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
  scale_x_continuous(limits = c(5, 22)) +
  coord_trans(y = "log10", ylim = c(0.1, 32)) +
  labs(x = "IgG (g/L)", y = "OR of 10% BMPC or more", title = "Total IgG") +
  theme_bw() +
  theme(legend.position = "bottom")

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    igg, isotype,
    fun = plogis
  ), adj.subtitle = FALSE)  +
  coord_cartesian(ylim = c(0, 0.25)) +
  geom_vline(xintercept = 6.1, lty = 2) +
  geom_vline(xintercept = 15.7, lty = 2) +
  geom_rug(data = df_pred, aes(x = igg, y = NULL)) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(5, 22)) +
  labs(x = "IgG (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgG") +
  theme_bw() +
  theme(legend.position = "bottom")

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    igm, isotype,
    ref.zero = TRUE, fun = exp
  ))  +
  geom_hline(yintercept = 1, lty = 2) +
  geom_vline(xintercept = 0.4, lty = 2) +
  geom_vline(xintercept = 2.3, lty = 2) +
  geom_rug(data = df_pred, aes(x = igm, y = NULL)) +
  scale_y_continuous(breaks = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
  scale_x_continuous(limits = c(NA, 8)) +
  coord_trans(y = "log10", ylim = c(0.01, 32)) +
  labs(x = "IgM (g/L)", y = "OR of 10% BMPC or more", title = "Total IgM") +
  theme_bw() +
  theme(legend.position = "bottom")

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    igm, isotype,
    fun = plogis
  ))  +
  coord_cartesian(ylim = c(0, 0.1)) +
  geom_vline(xintercept = 0.4, lty = 2) +
  geom_vline(xintercept = 2.3, lty = 2) +
  geom_rug(data = df_pred, aes(x = igm, y = NULL)) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(NA, 8)) +
  labs(x = "IgM (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgM") +
  theme_bw() +
  theme(legend.position = "bottom")

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    iga, isotype,
    ref.zero = TRUE, fun = exp
  ), adj.subtitle = FALSE)  +
  geom_hline(yintercept = 1, lty = 2) +
  geom_vline(xintercept = 0.7, lty = 2) +
  geom_vline(xintercept = 3.7, lty = 2) +
  geom_rug(data = df_pred, aes(x = iga, y = NULL)) +
  scale_y_continuous(breaks = c(0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
  scale_x_continuous(limits = c(NA, 10)) +
  coord_trans(y = "log10", ylim = c(0.1, 32)) +
  labs(x = "IgA (g/L)", y = "OR of 10% BMPC or more", title = "Total IgA") +
  theme_bw() +
  theme(legend.position = "bottom")

ggplot(
  Predict(
    fit_pred_inter,
    kint = 2,
    iga, isotype,
    fun = plogis
  ), adj.subtitle = FALSE)  +
  coord_cartesian(ylim = c(0, 0.2)) +
  geom_vline(xintercept = 0.4, lty = 2) +
  geom_vline(xintercept = 2.3, lty = 2) +
  geom_rug(data = df_pred, aes(x = iga, y = NULL)) +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(limits = c(NA, 6)) +
  labs(x = "IgA (g/L)", y = "Proportion (%) having \n≥10% BMPC", title = "Total IgA") +
  theme_bw() +
  theme(legend.position = "bottom")

3 Likelihood ratio test

Because the non-interaction model is nested within the interaction model, we can use the likelihood ratio test to test the null hypothesis that both models fit the data equally well, and therefore that the interaction terms are redundant. As seen below, we do not find strong evidence against the null hypothesis.

lrtest(fit_pred, fit_pred_inter)

## 
## Model 1: outcome ~ rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) + rcs(m_protein, 
##     4) + rcs(flc_ratio, 4) + isotype
## Model 2: outcome ~ isotype + rcs(igg, 4) + rcs(iga, 4) + rcs(igm, 4) + 
##     rcs(m_protein, 4) + rcs(flc_ratio, 4) + rcs(igg, 4) * isotype + 
##     rcs(iga, 4) * isotype + rcs(igm, 4) * isotype
## 
##  L.R. Chisq        d.f.           P 
## 40.06057799 27.00000000  0.05057213

Although not ‘significant’, we can quantify the degree to which the interaction model fits the data better, by calculating the adequacy of the non-interaction model compared to the non-linear model using the models’ likelihood ratio Chi-squared statistics.

pre_test_lr <- fit_pred$stats["Model L.R."] # non-interaction likelihood chi-squared statistic
post_test_lr <- fit_pred_inter$stats["Model L.R."] # interactoni likelihood chi-suqared statistic

Below is a measure of the adequacy of the linear model compared to the non-linear model, 0.909 out of a maximum of 1

add_base <- pre_test_lr/post_test_lr
add_base

## Model L.R. 
##  0.9091952

In other words, the fraction of new information provided by interaction model by this metric is 0.09, which is substantial.

frac_new <- 1 - add_base
frac_new

## Model L.R. 
## 0.09080478

The contribution of the interaction components of total IgG, IgA and IgM (or alternatively MGUS isotype) can be shown for each immunoglobulin individually, with the overall interaction likelihood ratio Chi-squared statistic of 38.69 on 27 degrees of freedom approximating the overall likelihood ratio test above.

anova(fit_pred_inter)

##                 Wald Statistics          Response: outcome 
## 
##  Factor                                       Chi-Square d.f. P     
##  isotype  (Factor+Higher Order Factors)        45.66     30   0.0335
##   All Interactions                             38.69     27   0.0675
##  igg  (Factor+Higher Order Factors)            30.33     12   0.0025
##   All Interactions                             14.99      9   0.0911
##   Nonlinear (Factor+Higher Order Factors)      15.68      8   0.0472
##  iga  (Factor+Higher Order Factors)            48.08     12   <.0001
##   All Interactions                              9.14      9   0.4246
##   Nonlinear (Factor+Higher Order Factors)       8.73      8   0.3652
##  igm  (Factor+Higher Order Factors)            21.85     12   0.0393
##   All Interactions                              7.14      9   0.6221
##   Nonlinear (Factor+Higher Order Factors)      12.01      8   0.1508
##  m_protein                                     42.44      3   <.0001
##   Nonlinear                                    23.27      2   <.0001
##  flc_ratio                                     49.17      3   <.0001
##   Nonlinear                                    31.82      2   <.0001
##  isotype * igg  (Factor+Higher Order Factors)  14.99      9   0.0911
##   Nonlinear                                     8.17      6   0.2257
##   Nonlinear Interaction : f(A,B) vs. AB         8.17      6   0.2257
##  isotype * iga  (Factor+Higher Order Factors)   9.14      9   0.4246
##   Nonlinear                                     5.10      6   0.5313
##   Nonlinear Interaction : f(A,B) vs. AB         5.10      6   0.5313
##  isotype * igm  (Factor+Higher Order Factors)   7.14      9   0.6221
##   Nonlinear                                     6.76      6   0.3435
##   Nonlinear Interaction : f(A,B) vs. AB         6.76      6   0.3435
##  TOTAL NONLINEAR                               91.42     28   <.0001
##  TOTAL INTERACTION                             38.69     27   0.0675
##  TOTAL NONLINEAR + INTERACTION                129.43     37   <.0001
##  TOTAL                                        339.47     45   <.0001

4 Akaike’s information criterion

The question remains whether the potentially increased fit of the interaction model to the data compared to the non-interaction model is worthwhile, given its added complexity. Akaike’s information criterion provides a method for penalizing the log likelihood achieved by a given model for its complexity to obtain a more unbiased assessment of the model’s worth. The penalty is to add twice the number ofparameters to the −2 log likelihood. On this metric, the model with the lower AIC should generally be favored. In this case, the non-interaction model has a considerably lower AIC.

AIC(fit_pred_inter)

## [1] 1983.497

AIC(fit_pred)

## [1] 1969.557

5 Comparing variances

Next we compare the variances of the predicted risks from the non-interaction and interaction models. In the prediction setting, increased variance in risk estimates is generally desirable. As shown below, the variance is almost identical between the linear and non-linear models and the high-resolution histogram of predicted risks is visually identical.

pre <- predict(fit_pred, type='fitted')
pre <- pre[,2]
post <- predict(fit_pred_inter, type='fitted')
post <- post[,2]

par(mgp=c(4,1,0), mar=c(6,5,2,2))

xlab <- c(
  paste0("Non-interaction\nvariance =", round(var(pre), 3)),
  paste0("Interaction\nvariance=",round(var(post), 3))
)

histbackback(pre, post, brks=seq(0, 1, by=0.005), xlab=xlab, ylab= "Proportion (%) having \n≥10% BMPC")

We show that the relative explained variation of predicted risk of the non-interaction model compared to the interaction model is 0.907 (theoretically out of a maximum of 1)

var_pre <- round(var(pre), 3) #linear
var_post <- round(var(post), 3) #non-linear
relativ_expl <- var_pre/var_post

relativ_expl

## [1] 0.9069767

In other words, the fraction of new information contained in the interaction model by this metric is only 0.093

frac_new2 <- 1 - relativ_expl
frac_new2

## [1] 0.09302326

We can show that the predicted risks between the non-interaction and interaction models differ most at moderate predicted risks.

plot(pre, post-pre, xlab='Predicted risk (non-interaction)',
     ylab='Difference in predicted risks (interaction - non-interaction)', pch=46)
lo <- Rq(post-pre ~ rcs(pre, 7), tau=0.1)  # 0.1 quantile
hi <- Rq(post-pre ~ rcs(pre, 7), tau=0.9)  # 0.9 quantile
at <- seq(0, 1, length=200)
lines(at, Predict(lo, pre=at)$yhat, col='red', lwd=1.5)
lines(at, Predict(hi, pre=at)$yhat, col='red', lwd=1.5)
abline(h=0, col='red', lty = 2)

6 Decision curve analysis

Finally, as this is a prediction model that we intend for clinical use, we should be satisfied that the decision curves associated with both models are similar. If not, we should heavily favor the model that leads to increased net-benefit over plausible risk thresholds.

df_interaction <- df_pred # temporary dataframe only used for this script
Function(fit_pred) # we extract the linear predictor from the non-linear model first

## function(igg = 10.21,iga = 2.167,igm = 0.718,m_protein = 1.91,flc_ratio = 1.237794,isotype = "IgG") {-0.35437745* igg+0.012363328*pmax(igg-6.111,0)^3-0.034311919*pmax(igg-9.2,0)^3+0.023794703*pmax(igg-11.446,0)^3-0.0018461125*pmax(igg-17.462,0)^3-0.13301362* iga+0.098986509*pmax(iga-0.5914,0)^3-0.22264315*pmax(iga-1.641,0)^3+0.13057034*pmax(iga-2.6581,0)^3-0.006913695*pmax(iga-5.8221,0)^3+1.0889587* igm-3.1076573*pmax(igm-0.2561,0)^3+6.3686218*pmax(igm-0.5751,0)^3-3.3346563*pmax(igm-0.9166,0)^3+0.073691742*pmax(igm-2.5759,0)^3-0.66018976* m_protein+0.44395199*pmax(m_protein-0.481261,0)^3-0.55838218*pmax(m_protein-0.881164,0)^3+0.11511927*pmax(m_protein-2.48194,0)^3-0.00068906911*pmax(m_protein-10.665996,0)^3-2.1257973* flc_ratio+0.73647737*pmax(flc_ratio-0.2041965,0)^3-1.7333446*pmax(flc_ratio-1.0186109,0)^3+1.0005139*pmax(flc_ratio-1.6415064,0)^3-0.0036466553*pmax(flc_ratio-7.440332,0)^3-0.27663453*(isotype=="IgA")-0.53499524*(isotype=="IgG")-0.56401632*(isotype=="Light-chain") }
## <environment: 0x7fe9fdb959a0>

df_interaction$noninteraction_pred <- with(
  df_interaction,
  expit(
    1.5888896 -0.35437745 * igg + 0.012363328 * pmax(igg - 6.111, 0) ^ 3 - 0.034311919 *
      pmax(igg - 9.2, 0) ^ 3 + 0.023794703 * pmax(igg - 11.446, 0) ^ 3 - 0.0018461125 *
      pmax(igg - 17.462, 0) ^ 3 - 0.13301362 * iga + 0.098986509 * pmax(iga - 0.5914, 0) ^ 3 
    - 0.22264315 * pmax(iga - 1.641, 0) ^ 3 + 0.13057034 * pmax(iga - 2.6581, 0) ^ 3 
    - 0.006913695 * pmax(iga - 5.8221, 0) ^ 3 + 1.0889587 * igm -
      3.1076573 * pmax(igm - 0.2561, 0) ^ 3 + 6.3686218 * pmax(igm - 0.5751, 0) ^
      3 - 3.3346563 * pmax(igm - 0.9166, 0) ^ 3 + 0.073691742 * pmax(igm - 2.5759, 0) ^
      3 - 0.66018976 * m_protein + 0.44395199 * pmax(m_protein - 0.481261, 0) ^
      3 - 0.55838218 * pmax(m_protein - 0.881164, 0) ^ 3 + 0.11511927 * pmax(m_protein - 2.48194, 0) ^ 3 
    - 0.00068906911 * pmax(m_protein - 10.665996, 0) ^ 3 - 2.1257973 * flc_ratio + 0.73647737 * 
      pmax(flc_ratio - 0.2041965, 0) ^ 3 - 1.7333446 * pmax(flc_ratio - 1.0186109, 0) ^ 3 
    + 1.0005139 * pmax(flc_ratio - 1.6415064, 0) ^ 3 - 0.0036466553 *
      pmax(flc_ratio - 7.440332, 0) ^ 3 - 0.27663453 * (isotype == "IgA") - 0.53499524 *
      (isotype == "IgG") - 0.56401632 * (isotype == "Light-chain")
  )
)

Function(fit_pred_inter) # next we extract the linear predictor from the linear model

## function(isotype = "IgG",igg = 10.21,iga = 2.167,igm = 0.718,m_protein = 1.91,flc_ratio = 1.237794) {+13.857739*(isotype=="IgA")-2.9443755*(isotype=="IgG")+7.7250492*(isotype=="Light-chain")-0.24880468* igg+0.014136849*pmax(igg-6.111,0)^3-0.043850706*pmax(igg-9.2,0)^3+0.033548396*pmax(igg-11.446,0)^3-0.00383454*pmax(igg-17.462,0)^3-0.0052641133* iga+0.082960416*pmax(iga-0.5914,0)^3-0.17746379*pmax(iga-1.641,0)^3+0.097361815*pmax(iga-2.6581,0)^3-0.002858446*pmax(iga-5.8221,0)^3+0.039156474* igm-0.17936969*pmax(igm-0.2561,0)^3+0.32367647*pmax(igm-0.5751,0)^3-0.13952274*pmax(igm-0.9166,0)^3-0.0047840447*pmax(igm-2.5759,0)^3-0.68313533* m_protein+0.43617459*pmax(m_protein-0.481261,0)^3-0.5485312*pmax(m_protein-0.881164,0)^3+0.11302013*pmax(m_protein-2.48194,0)^3-0.00066351445*pmax(m_protein-10.665996,0)^3-2.0197183* flc_ratio+0.70769306*pmax(flc_ratio-0.2041965,0)^3-1.6680227*pmax(flc_ratio-1.0186109,0)^3+0.96409409*pmax(flc_ratio-1.6415064,0)^3-0.0037644823*pmax(flc_ratio-7.440332,0)^3+(isotype=="IgA")*(-0.018482177* igg-0.0088176783*pmax(igg-6.111,0)^3+0.031805606*pmax(igg-9.2,0)^3-0.027042627*pmax(igg-11.446,0)^3+0.0040547002*pmax(igg-17.462,0)^3)+(isotype=="IgG")*(0.24670448* igg-0.0091979458*pmax(igg-6.111,0)^3+0.028960155*pmax(igg-9.2,0)^3-0.022417373*pmax(igg-11.446,0)^3+0.002655164*pmax(igg-17.462,0)^3)+(isotype=="Light-chain")*(-0.88031872* igg+0.016809742*pmax(igg-6.111,0)^3-0.037924311*pmax(igg-9.2,0)^3+0.020366237*pmax(igg-11.446,0)^3+0.0007483327*pmax(igg-17.462,0)^3)+(isotype=="IgA")*(-8.0855649* iga+1.289755*pmax(iga-0.5914,0)^3-2.6564934*pmax(iga-1.641,0)^3+1.3782373*pmax(iga-2.6581,0)^3-0.011498959*pmax(iga-5.8221,0)^3)+(isotype=="IgG")*(-0.12151831* iga+0.022518773*pmax(iga-0.5914,0)^3-0.078256398*pmax(iga-1.641,0)^3+0.066184855*pmax(iga-2.6581,0)^3-0.01044723*pmax(iga-5.8221,0)^3)+(isotype=="Light-chain")*(-1.9424433* iga+0.43241092*pmax(iga-0.5914,0)^3-1.0209564*pmax(iga-1.641,0)^3+0.63429489*pmax(iga-2.6581,0)^3-0.045749404*pmax(iga-5.8221,0)^3)+(isotype=="IgA")*(-1.4035868* igm+2.8494307*pmax(igm-0.2561,0)^3-6.0951884*pmax(igm-0.5751,0)^3+3.3659637*pmax(igm-0.9166,0)^3-0.12020603*pmax(igm-2.5759,0)^3)+(isotype=="IgG")*(2.2872933* igm-6.1020261*pmax(igm-0.2561,0)^3+12.738116*pmax(igm-0.5751,0)^3-6.828749*pmax(igm-0.9166,0)^3+0.19265866*pmax(igm-2.5759,0)^3)+(isotype=="Light-chain")*(4.0731984* igm-9.2682911*pmax(igm-0.2561,0)^3+18.479621*pmax(igm-0.5751,0)^3-9.3252841*pmax(igm-0.9166,0)^3+0.11395424*pmax(igm-2.5759,0)^3) }
## <environment: 0x7fe983a60af0>

df_interaction$inter_pred <- with(
  df_interaction,
  expit(
    0.367516938 +13.857739 * (isotype == "IgA") - 2.9443755 * (isotype == "IgG") + 7.7250492 *
      (isotype == "Light-chain") - 0.24880468 * igg + 0.014136849 * pmax(igg - 6.111, 0) ^ 3 
    - 0.043850706 * pmax(igg - 9.2, 0) ^ 3 + 0.033548396 * pmax(igg - 11.446, 0) ^ 3 
    - 0.00383454 * pmax(igg - 17.462, 0) ^ 3 - 0.0052641133 * iga +
      0.082960416 * pmax(iga - 0.5914, 0) ^ 3 - 0.17746379 * pmax(iga - 1.641, 0) ^
      3 + 0.097361815 * pmax(iga - 2.6581, 0) ^ 3 - 0.002858446 * pmax(iga - 5.8221, 0) ^
      3 + 0.039156474 * igm - 0.17936969 * pmax(igm - 0.2561, 0) ^ 3 + 0.32367647 *
      pmax(igm - 0.5751, 0) ^ 3 - 0.13952274 * pmax(igm - 0.9166, 0) ^ 3 - 0.0047840447 *
      pmax(igm - 2.5759, 0) ^ 3 - 0.68313533 * m_protein + 0.43617459 * pmax(m_protein - 0.481261, 0) ^ 3 
    - 0.5485312 * pmax(m_protein - 0.881164, 0) ^ 3 + 0.11302013 * pmax(m_protein - 2.48194, 0) 
    ^ 3 - 0.00066351445 * pmax(m_protein - 10.665996, 0) ^3 - 2.0197183 * flc_ratio +
      0.70769306 * pmax(flc_ratio - 0.2041965, 0) ^3 - 1.6680227 * pmax(flc_ratio - 1.0186109, 0) ^ 3 
    + 0.96409409 * pmax(flc_ratio -1.6415064, 0) ^ 3 - 0.0037644823 * pmax(flc_ratio - 7.440332, 0) ^ 3 
    + (isotype =="IgA") * (-0.018482177 * igg - 0.0088176783 * pmax(igg - 6.111, 0) ^ 3 + 
    0.031805606 * pmax(igg - 9.2, 0) ^ 3 - 0.027042627 * pmax(igg - 11.446, 0) ^ 3 
    + 0.0040547002 *pmax(igg - 17.462, 0) ^ 3) + (isotype == "IgG") * 
      (0.24670448 * igg - 0.0091979458 * pmax(igg - 6.111, 0) ^ 3 + 0.028960155 * pmax(igg - 9.2, 0) ^ 3 
       - 0.022417373 * pmax(igg - 11.446, 0) ^ 3 + 0.002655164 * pmax(igg - 17.462, 0) ^ 3) 
    + (isotype == "Light-chain") * (-0.88031872 * igg + 0.016809742 * pmax(igg - 6.111, 0) ^ 3 - 0.037924311 *
      pmax(igg - 9.2, 0) ^ 3 + 0.020366237 * pmax(igg - 11.446, 0) ^ 3 + 0.0007483327 * pmax(igg - 17.462, 0) ^ 3
    ) + (isotype == "IgA") * (-8.0855649 * iga + 1.289755 * pmax(iga - 0.5914, 0) ^ 3 
    - 2.6564934 * pmax(iga -1.641, 0) ^ 3 + 1.3782373 * pmax(iga - 2.6581, 0) ^ 3 - 0.011498959 * 
      pmax(iga -5.8221, 0) ^ 3) + (isotype == "IgG") * (
      -0.12151831 * iga + 0.022518773 * pmax(iga - 0.5914, 0) ^ 3 - 0.078256398 *
        pmax(iga - 1.641, 0) ^ 3 + 0.066184855 * pmax(iga - 2.6581, 0) ^ 3 - 0.01044723 *
        pmax(iga - 5.8221, 0) ^ 3
    ) + (isotype == "Light-chain") * (
      -1.9424433 * iga + 0.43241092 * pmax(iga - 0.5914, 0) ^ 3 - 1.0209564 *
        pmax(iga - 1.641, 0) ^ 3 + 0.63429489 * pmax(iga - 2.6581, 0) ^ 3 - 0.045749404 *
        pmax(iga - 5.8221, 0) ^ 3
    ) + (isotype == "IgA") * (
      -1.4035868 * igm + 2.8494307 * 
        pmax(igm - 0.2561, 0) ^ 3 - 6.0951884 * pmax(igm -0.5751, 0) ^ 3 
      + 3.3659637 * pmax(igm - 0.9166, 0) ^ 3 - 0.12020603 * pmax(igm -2.5759, 0) ^ 3
    ) + (isotype == "IgG") * (2.2872933 * igm - 6.1020261 * pmax(igm - 0.2561, 0) ^ 3 
      + 12.738116 * pmax(igm -0.5751, 0) ^ 3 - 6.828749 * pmax(igm - 0.9166, 0) ^ 3 + 0.19265866 * pmax(igm -2.5759, 0) ^ 3
    ) + (isotype == "Light-chain") * (4.0731984 * igm - 9.2682911 * pmax(igm - 0.2561, 0) ^ 3 
      + 18.479621 * pmax(igm -0.5751, 0) ^ 3 - 9.3252841 * pmax(igm - 0.9166, 0) ^ 3 + 0.11395424 * pmax(igm -2.5759, 0) ^ 3)))

We create the decision curve analysis models using the dca pacakge. First we examine the net-benefit of the non-interaction and interaction models in a opt-out strategy, in which net benefit is expressed in the relative proportion of true negatives. As demonstrated by the resulting decision curve analysis figure, there is no perceivable difference in net benefit between the two models in this setting.

noninteraction_model <- decision_curve(perc_10 ~ noninteraction_pred,
                                  data = df_interaction,
                                  policy = 'opt-out',
                                  fitted = TRUE,
                                  thresholds = seq(0, 1, by = .001),
                                  bootstraps = 500)

interaction_model <- decision_curve(perc_10 ~ inter_pred,
                               data = df_interaction,
                               policy = 'opt-out',
                               fitted = TRUE,
                               thresholds = seq(0, 1, by = .001),
                               bootstraps = 500)

noninteraction_model$derived.data %>%
  select(thresholds, NB, NB_lower, NB_upper, model) %>%
  union_all(
    interaction_model$derived.data %>%
      select(thresholds, NB, NB_lower, NB_upper, model) %>%
      filter(model == "perc_10 ~ inter_pred")
  ) %>%
  mutate(model = factor(model, levels = c("perc_10 ~ noninteraction_pred", "perc_10 ~ inter_pred", "All", "None"))) %>%
  ggplot(aes(x = thresholds, y = NB, color = model, fill = model)) +
  geom_line(size = 1.1, position = position_dodge(width = 0.005)) +
  geom_vline(aes(xintercept = 0.05), lty = 2) +
  geom_vline(aes(xintercept = 0.10), lty = 2) +
  scale_color_manual(
    name = "Follow-up strategy:",
    labels = c(
      "Defer based on non-interaction model",
      "Defer based on interaction model",
      "Obtain sample from all",
      "Obtain sample from none"
    ),
    values = c(nejm_palette[c(2, 3)], "black", nejm_palette[c(1)])
  ) +
  scale_x_continuous(breaks = seq(0, 1, by = 0.02), labels = scales::percent_format(accuracy = 1)) +
  coord_cartesian(xlim = c(0, 0.25), ylim = c(-0.2, 0.7)) +
  labs(x = NULL, y = "Net Benefit") +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    legend.background = element_rect(fill = NA),
    panel.grid.minor.x = element_blank(),
    legend.position = c(0.2, 0.82)
  )

This non-difference holds in the setting of an opt-in strategy, in which the net-benefit is expressed in the units of true positives.

noninteraction_model <- decision_curve(perc_10 ~ noninteraction_pred,
                                       data = df_interaction,
                                       policy = 'opt-in',
                                       fitted = TRUE,
                                       thresholds = seq(0, 1, by = .001),
                                       bootstraps = 500)

interaction_model <- decision_curve(perc_10 ~ inter_pred,
                                    data = df_interaction,
                                    policy = 'opt-in',
                                    fitted = TRUE,
                                    thresholds = seq(0, 1, by = .001),
                                    bootstraps = 500)

noninteraction_model$derived.data %>%
  select(thresholds, NB, NB_lower, NB_upper, model) %>%
  union_all(
    interaction_model$derived.data %>%
      select(thresholds, NB, NB_lower, NB_upper, model) %>%
      filter(model == "perc_10 ~ inter_pred")
  ) %>%
  mutate(model = factor(model, levels = c("perc_10 ~ noninteraction_pred", "perc_10 ~ inter_pred", "All", "None"))) %>%
  ggplot(aes(x = thresholds, y = NB, color = model, fill = model)) +
  geom_line(size = 1.1, position = position_dodge(width = 0.005)) +
  geom_vline(aes(xintercept = 0.05), lty = 2) +
  geom_vline(aes(xintercept = 0.10), lty = 2) +
  scale_color_manual(
    name = "Follow-up strategy:",
    labels = c(
      "Defer based on non-linear model",
      "Defer based on linear model",
      "Obtain sample from all",
      "Obtain sample from none"
    ),
    values = c(nejm_palette[c(2, 3)], "black", nejm_palette[c(1)])
  ) +
  scale_x_continuous(breaks = seq(0, 1, by = 0.02), labels = scales::percent_format(accuracy = 1)) +
  coord_cartesian(xlim = c(0, 0.25), ylim = c(-0.2, 0.5)) +
  labs(x = NULL, y = "Net Benefit") +
  theme_bw() +
  theme(
    legend.title = element_blank(),
    legend.background = element_rect(fill = NA),
    panel.grid.minor.x = element_blank(),
    legend.position = c(0.2, 0.82)
  )

7 Summary

We have shown that the model that includes an interaction term between MGUS isotype and total IgG, IgA and IgM: i) is visually less well calibrated compared to the non-interaction model ii) does not fit the data significantly better on the likelihood ratio test at an alpha of 0.05, but provides roughly 9% increase in the fraction of new information on this metric compared to the non-interaction model model. iii) has a higher Akaike’s information criterion than does the non-interaction model iv) slightly increases the variance of predicted risk, providing a roughly 9% increase in the fraction of new information on this metric compared to the non-interaction model v) has similar net-benefit on decision curve analysis.

Given that the interaction model has poorer calibration and does not result in a definitive increase in net-benefit on decision curve analysis, despite more than doabling the number of predictors, we do not find any compelling reason to include interaction terms in the prediction model.

Including interaction terms with total immunoglobulins

Elias Eythorsson

December 1st, 2023