RMS Discussions

Thank you @f2harrell for the fast response - I really appreciate it! How are the Factors in Final Model selected when using these functions? I think I have probably got the wrong end of the stick as I had assumed that the selection process was an “ensemble” over multiple bootstraps.

Those factors selected are from the original fit ignoring bootstrapping. So the bootstrap is telling you how likely you are to get those same selected variables in future samples you don’t have.

Ah thank you! That’s really helpful. I just had one last question relating to how stepwise selection based on the residual chi2 works. Using this method, are variables still iteratively removed based on their individual significance until the stopping criterion (in this case the significance of the difference in residuals) is reached? I wondered what type of residuals are used and how the p values in the table of results for deleted factors are calculated? I notice these can vary with the ones produced if I use individual P values as the stopping rule. The former method seems more sensible to me as I would select a model (as a whole) based on how well it explains variance in the data - is this correct?

Thank you so much for the help and apologies for so many questions. My background is in medicine but I have found your Regression Modelling Strategies book and associated package really helpful for some research I have been conducting.

The help file for the rms fastbw function should give you that info. The method based on the residual \chi^2 uses chunk (pooled d.f.) tests. And beware of using any stepwise method; they are usually not recommended, as compared to better model pre-specification.

1 Like

I am trying to impute data for a meta analysis across 5 clinical trials. I had read that it is preferred to treat trial as a random effect in the imputation model. Is there a way to implement this in aregimpute? If not, do you have a preferred approach for adjusting for heterogeneity across the different trials in the imputation model?
Thanks!

I want to add that I tried adding StudyID as a fixed effect in the imputation model and received an error that a bootstrap resample had too few unique values for StudyID. I also don’t think I can set it as a “group” variable because there are covariates which are systematically missing in the dataset.

I don’t have any experience with multiple imputation in the context of clustering. Perhaps it’s not a bad idea to ignore study during imputation, or a little better is to use study as a fixed effect but balance the bootstrap sampling. aregImpute will do the latter when you specify group=.

Thanks! Will “group=” work if there are systematically missing covariates (i.e. covariates which are missing for an entire trial)?

I doubt it but you’ll have to try.

Thank you. I was able to get aregImpute to work via the approach you suggested. However, I would like to be able to fit a model which allows for making studyID a random effect (e.g.lmer), which I do not think is compatible with fit.mult.impute. I can theoretically impute the data using MICE instead of aregImpute, but I hesitate because I’d prefer to take advantage of the flexible modeling applied by aregImpute. Do you have an approach or function which is compatible with random effects which I can apply to fit.mult.impute? Do you know of a way to have MICE fit a model combining the information across the imputations performed by aregImpute? Do you have any other suggestions?
Thank you very much!

See if the MICE developer Stef van Buuren’s excellent multiple imputation book covers this.

You can now use aregImpute with Bayesian models using the method of “posterior stacking” as I exemplify in https://hbiostat.org/R/rmsb. But the clustering problem in aregImpute remains. Posterior stacking is more accurate than using Rubin’s rule with MI.

Thank you very much for the tip! It looks that your rmsb package will allow me to account for clustering of StudyID using a bayesian logistic regression model.
In which case, my plan would be to use MICE to make imputations taking into account the clustering of StudyID, then do posterior stacking with bayesian logistic regression and include a random effect for studyID. Does that make sense to you?
Thanks!

That does makes sense, if the aregImpute part works OK.

I think van Buuren has a method to accomodate the clustering in the imputations, in which case I would use MICE to impute the data (although I will force MICE to use PMM as I think you encouraged in your course, but I will be forgoing on the flexible modeling aspect which is offered by aregImpute) but then using stackMI to fit the models. This way I wouldn’t have to rely on aregImpute. Is stackMI compatible with MICE, or is only fit.mult.impute compatible?
Thanks!

It uses mice::complete() so there is a good chance.

Is there a way to “inform” rms (e.g. through datadist) that some columns in your dataframe are derived from others. This comes up for example if I plan on including an interaction between biomarker (let’s call BM) and treatment (ARM) in a model and therefore create a separate derived variable from manually multiplying the main effects (BM_ARM), this way I can impute the interaction directly. However, when I run a model, instead of writing BM*ARM I write BM + ARM + BM_ARM. The problem is, that rms doesn’t know that BM_ARM was derived from BM and ARM. This makes taking advantage of many of the rms functions difficult (e.g. Anova, Predict, etc.).
Another situation where this comes up is if I want to include an interaction only with the linear terms of a continuous covariate and a multi-leveled nominal covariate. The ia function doesn’t seem to work in this situation and therefore I manually create the interactions.
Do you have any advice?
Thanks.

You can go to trouble to connect the terms using e.g. contrast() go get effects, but rms really wants to derive the variables itself so it always knows how to connect the derived variables to give you relevant tests using e.g. anova().

Is there a way to specify offset terms with lrm() ?

To give context, I am trying to externally validate some prediction models according to the suggestions in Steyerberg’s Clinical Prediction Models and this paper: 1) evaluate the model, 2) update model intercept (recalibration in the large), 3) update intercept and slope (recalibration), and 4) update all model coefficients. The latter two are easy to carry out with lrm, but I can’t figure out how to set the offset terms / set intercept to 0

With glm this would look something like
no_recalibration <- glm(outcome ~ -1 + offset(predictor), data = data, family = binomial)
recalibration_in_large <- glm(outcome ~ 1 + offset(predictor), data = data, family = binomial)

Hi Lauren - it’s supposed to work using the same notation you used with glm.

Hi Frank - thanks for the quick reply!

I’m getting this error:

model <- lrm(outcome ~ -1 + offset(predictor), data=data, x=TRUE, y=TRUE)
Error in lrm(outcome ~ -1 + offset(predictor), data = data, x = TRUE, :
object ‘Strata’ not found

model <- lrm(outcome ~ 1 + offset(predictor), data=data, x=TRUE, y=TRUE)
Error in lrm(outcome ~ 1 + offset(predictor), data = data, x = TRUE, y = TRUE) :
object ‘Strata’ not found