RMS Discussions

Thanks! Will “group=” work if there are systematically missing covariates (i.e. covariates which are missing for an entire trial)?

I doubt it but you’ll have to try.

Thank you. I was able to get aregImpute to work via the approach you suggested. However, I would like to be able to fit a model which allows for making studyID a random effect (e.g.lmer), which I do not think is compatible with fit.mult.impute. I can theoretically impute the data using MICE instead of aregImpute, but I hesitate because I’d prefer to take advantage of the flexible modeling applied by aregImpute. Do you have an approach or function which is compatible with random effects which I can apply to fit.mult.impute? Do you know of a way to have MICE fit a model combining the information across the imputations performed by aregImpute? Do you have any other suggestions?
Thank you very much!

See if the MICE developer Stef van Buuren’s excellent multiple imputation book covers this.

You can now use aregImpute with Bayesian models using the method of “posterior stacking” as I exemplify in https://hbiostat.org/R/rmsb. But the clustering problem in aregImpute remains. Posterior stacking is more accurate than using Rubin’s rule with MI.

Thank you very much for the tip! It looks that your rmsb package will allow me to account for clustering of StudyID using a bayesian logistic regression model.
In which case, my plan would be to use MICE to make imputations taking into account the clustering of StudyID, then do posterior stacking with bayesian logistic regression and include a random effect for studyID. Does that make sense to you?

That does makes sense, if the aregImpute part works OK.

I think van Buuren has a method to accomodate the clustering in the imputations, in which case I would use MICE to impute the data (although I will force MICE to use PMM as I think you encouraged in your course, but I will be forgoing on the flexible modeling aspect which is offered by aregImpute) but then using stackMI to fit the models. This way I wouldn’t have to rely on aregImpute. Is stackMI compatible with MICE, or is only fit.mult.impute compatible?

It uses mice::complete() so there is a good chance.

Is there a way to “inform” rms (e.g. through datadist) that some columns in your dataframe are derived from others. This comes up for example if I plan on including an interaction between biomarker (let’s call BM) and treatment (ARM) in a model and therefore create a separate derived variable from manually multiplying the main effects (BM_ARM), this way I can impute the interaction directly. However, when I run a model, instead of writing BM*ARM I write BM + ARM + BM_ARM. The problem is, that rms doesn’t know that BM_ARM was derived from BM and ARM. This makes taking advantage of many of the rms functions difficult (e.g. Anova, Predict, etc.).
Another situation where this comes up is if I want to include an interaction only with the linear terms of a continuous covariate and a multi-leveled nominal covariate. The ia function doesn’t seem to work in this situation and therefore I manually create the interactions.
Do you have any advice?

You can go to trouble to connect the terms using e.g. contrast() go get effects, but rms really wants to derive the variables itself so it always knows how to connect the derived variables to give you relevant tests using e.g. anova().

Is there a way to specify offset terms with lrm() ?

To give context, I am trying to externally validate some prediction models according to the suggestions in Steyerberg’s Clinical Prediction Models and this paper: 1) evaluate the model, 2) update model intercept (recalibration in the large), 3) update intercept and slope (recalibration), and 4) update all model coefficients. The latter two are easy to carry out with lrm, but I can’t figure out how to set the offset terms / set intercept to 0

With glm this would look something like
no_recalibration <- glm(outcome ~ -1 + offset(predictor), data = data, family = binomial)
recalibration_in_large <- glm(outcome ~ 1 + offset(predictor), data = data, family = binomial)

Hi Lauren - it’s supposed to work using the same notation you used with glm.

Hi Frank - thanks for the quick reply!

I’m getting this error:

model <- lrm(outcome ~ -1 + offset(predictor), data=data, x=TRUE, y=TRUE)
Error in lrm(outcome ~ -1 + offset(predictor), data = data, x = TRUE, :
object ‘Strata’ not found

model <- lrm(outcome ~ 1 + offset(predictor), data=data, x=TRUE, y=TRUE)
Error in lrm(outcome ~ 1 + offset(predictor), data = data, x = TRUE, y = TRUE) :
object ‘Strata’ not found

Sorry looks like a bug. lrm only seems to support offset if there are other covariates in the model and you are not suppressing the intercept.

1 Like

Hi Dr. Harrell.
Is there a way to have your anova.rms function print out Likelihood Ratio tests instead of Wald? This would be especially useful in the logistic regression setting.

I wish there were. This would have been much better, but it requires multiple model fits. The best that rms offers is lrtest() once you make two fits.

Is the “proportion Chisquare” (from plot.anova function) still correct in logistic regression setting if it’s Wald-based?
Would you recommend to instead use AIC, which is likelihood based, to rank predictors in plot.anova function in order to decide on how many df to allocate to different variables?

AIC or any measure that is a function of -2 log likelihood will be better. Wald is just quicker. The proportion of \chi^2 from Wald tests is OK (anova.rms) but could be improved by instead basing it on the log likelihood.

To clarify, is the AIC printed out in your plot.anova function based on Wald ChiSquare or is it based on -2 log likelihood? It seems to be on a scale that I am not used to seeing (the terms in the output sometimes have positive AIC sometimes negative).

AIC is defined to be negative but sometimes I convert it to a \chi^2 scale, i.e., \chi^2 minus 2 \times d.f. If I do that in the anova context it is in an approximate sense since it’s substituting Wald for LR \chi^2.