Bayesian Regression Modeling Strategies

f2harrell · October 7, 2022, 3:21pm

This is the place for questions and answers regarding the R rmsb package.

Earlier questions may be found here.

arthur_albuquerque · October 7, 2022, 3:35pm

I just read this great article by you and your colleagues, where you used the rmsb package and stated:

Missing baseline covariate data were multiply imputed using predictive mean matching, and posterior stacking was used to combine bayesian posterior draws of each separately analyzed completed data set.(ref 34)

There is some R code in the Supplement (thanks!) showing the rmsb::stackMI() function, defined in CRAN as:

Runs an rmsb package Bayesian fitting function such as blrm separately for each completed dataset given a multiple imputation result such as one produced by Hmisc::aregImpute. Stacks the posterior draws and diagnostics across all imputations, and computes parameter summaries on the
stacked posterior draws.

In your article, reference 34 is Using Stacking to Average Bayesian Predictive Distributions (with Discussion) by Yao el al., where they recommend “stacking of predictive distributions”. Aki Vehtari explained, in the Stan Forum (here and here), that such method is implemented through loo::loo_model_weights(..., method = "stacking") (or even loo::stacking_weights) and then applying the estimated weights when generating draws from the posterior predictive distribution (with, eg rstanarm::posterior_predict).

However, when looking into rmsb::stackMI source code, I don’t see any reference to these functions from R package loo.

Two questions:

Did you apply the method of stacking to average predictive distributions in your article?
If so, how can one apply it through rmsb?

As far as I understand, it seems rmsb::stackMI applies a very similar approach to what brms::brm_multiple does:

Here, pooling results of multiple imputed data sets is simply achieved by combining the posterior draws of the submodels.

which would be a little bit different from what Yao et al. proposed.

timdisher · October 7, 2022, 4:58pm

My understanding of this version of “stacking” is as an alternative to model averaging, which would be used in a situation where you have multiple candidate models and want to weight them based on their estimated out of sample predictive performance.

In the MI case we have a single model and just pass the k multiply imputed datasets in order to propagate uncertainty in the imputed data. In a frequentist framework you would combine your parameters and SEs with Rubin’s rules but for Bayes you can simply stack (as in rbind) the posteriors together. You can also do this in a model based way (eg place on a model on the missing data) but this can be challenging for many reasons.

Overall I think it’s just an unfortunate convergence of similar terms describing very different end goals.

arthur_albuquerque · October 7, 2022, 5:24pm

I see, it makes sense.

To corroborate this, I think there is no function to extract draws from the posterior predictive distribution with Bayesian models fitted with rmsb. I believe the reason for this is that there isn’t a generated quantities block in rmsb’s Stan codes.

If so, it wouldn’t be even possible to apply Yao et al.'s method with rmsb.

timdisher · October 7, 2022, 5:50pm

I think you could use log posterior density + loo to get the stacking weights and write a function to draw from the posterior predictive yourself based on posterior draws provided by rmsb if you really wanted so it’s probably not fair to say it wouldn’t be possible.

arthur_albuquerque · October 7, 2022, 5:56pm

Yes, good point. The generated quantities approach is the one recommended by the Stan team, but not the only one. Can I say that your approach would be the “Predict outside of Stan” one, as stated here?

timdisher · October 7, 2022, 6:22pm

Yup that sounds right! Just don’t ask me how to do it with ordinal models. You might be able to steal some ideas from the ordered probit meta-analysis models here: https://www.ncbi.nlm.nih.gov/books/NBK310366/pdf/Bookshelf_NBK310366.pdf by looking at how they model the actual likelihood.

arthur_albuquerque · October 8, 2022, 12:21am

This custom function for posterior predictive check is compatible with ordinal models!

f2harrell · October 8, 2022, 11:59am

For multiple imputation the stacking just involves draws from posterior distributions. For the COVID-19 prone positioning paper here are code excepts:

set.seed(21)
#Multiple Imputation
mi <- aregImpute(~ total_outcome + Month + Baseline + Treatment + Age + Sex + Race + BMI + Elix, data = day5_full, n.impute = 5, pr = FALSE)

fitb_mi <- stackMI(total_outcome ~ Treatment + Baseline + Month + Age + Sex + Site + Race + BMI + Elix, blrm, mi, iter = 4000, warmup = 2000, chain = 4, loo = T, cores = 4, data = day5_full)

arthur_albuquerque · October 8, 2022, 2:43pm

Thanks. Similar to brms::brm_multiple() then.

Agnes_Cororaton · October 11, 2022, 10:51pm

Should I use the -2log likelihood from (method=‘optimizing’) to determine the most appropriate standard deviation prior? In your vignette example, the SD=100 did not have the lowest LL, compared to SD=10(rmsb Package Examples). Why did you still use SD=100?

f2harrell · October 12, 2022, 10:39am

Priors represent prior information and are not typically chosen to fit data at hand. Priors are chosen to represent the current state of knowledge and sometimes to make procedures perform well, e.g., converge quickly.

arthur_albuquerque · October 17, 2022, 9:54pm

Is it possible to use CmdStanR as the backend with rmsb?

f2harrell · October 17, 2022, 11:02pm

I would have to reprogram some things. I think it’s an attractive option. I looked at it once and found a restriction that would affect me, but I can’t remember what that restriction was.

arthur_albuquerque · October 18, 2022, 6:54pm

It would be a nice addition. For example, McElreath has shifted his rethinking package entirely to cmdstanr.

f2harrell · October 18, 2022, 7:43pm

I think I remember the issue: I don’t think you can save the compiled Stan code, which makes the start-up time much longer. If I understand it, cmdstanr is for passing “new” Stan source code to Stan. That’s a fatal problem, if I have it right.

arthur_albuquerque · February 14, 2023, 7:03pm

Hi, I am having issues with adequate model sampling in a Proportional Odds with random intercept.
Whenever I add the “cluster(...)” argument, this error shows up:

Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.

Interestingly, I have no issues when fitting the same model with brms. Please see full code here.

How can I solve this problem?

arthur_albuquerque · February 14, 2023, 7:18pm

Ha! Found the culprit.

rmsb uses a “Dirichlet distribution for the cell probabilities, which induce a more complex prior distribution for the intercepts (iprior=0 , the default)”.

Yet, brms applies a student_t(3, 0, 2.5) distribution for the intercepts.

When I mimic this in rmsb, sampling runs smoothly:

PO_rmsb <- rmsb::blrm(y ~ x + cluster(study),
                       # intercept parameters prior: t-distribution with 3 d.f.
                       iprior = 2, 
                       # scale parameter for the t-distribution for priors for the intercepts 
                       ascale = 2.5, 
                       # normal(0, 1.5) prior for "x"
                       priorsd = 1.5, 
                       # random effect prior: half-t distribution with 4 d.f.
                       psigma = 1,
                       # assumed mean of the prior distribution of the RE SD
                       rsdmean = 0,
                       # assumed scale for the half t distribution for the RE SD
                       rsdsd = 2.5,
                       
                       seed = 123,
                       data=d)

However, I wouldn’t expect this behavior with the Dirichlet distribution.

What do you think @f2harrell?

f2harrell · February 17, 2023, 11:54am

Great detective work! I never put this sampling problem (which I’ve experienced myself) with the Dirichlet default. The Dirichlet default makes posterior modes equal MLEs when there are no random effects. I would love for someone to figure out why Dirichlet interferes with random effects which should just be added on top of the induced logit scale intercepts that are created by Dirichlet.

arthur_albuquerque · February 17, 2023, 3:31pm

I wish I had the knowledge to help you with that.