Discrepancies between the rmsb package and RStan

arthur_albuquerque · September 29, 2022, 9:19pm

I’ve been trying to fit and learn how to wrangle results from Bayesian models fitted with the rmsb package.

Below, I fit a cumulative proportional odds model with the blrm function. I then check parameters summaries generated directly by rmsb. I noticed that, if I extract draws from this blrm object, I can reproduce the mean value for each parameter independently.

However, when I (1) extract the RStan object from blrm, (2) extract posterior draws from it, and then (3) generate parameters summaries, they are strikingly different from brlm summaries. Especially the beta (“x=epinephrine”) parameter.

Why does this happen, @f2harrell?

# Install/load package
pacman::p_load(rmsb,
               tidybayes,
               dplyr)


# Load data
a <- c(rep(0,15), rep(1,10), rep(2,29), rep(3,20), rep(4,8), rep(5,8), rep(6,200))
b <- c(rep(0,12), rep(1,17), rep(2,23), rep(3,35), rep(4,12), rep(5,27), rep(6,200))
x <- c(rep('placebo', length(a)), rep('epinephrine', length(b)))
y <- c(a, b)


d = data.frame(x, y)

d$y <-factor(d$y, ordered = T)
d$x<-factor(d$x, levels = c("placebo", "epinephrine"))

# Cumulative Proportional Odds Model
b_adj <- rmsb::blrm(y ~ x, data = d)

# Model output
b_adj
#> Bayesian Proportional Odds Ordinal Logistic Model
#>  
#>  Dirichlet Priors With Concentration Parameter 0.308 for Intercepts
#>  
#>  rmsb::blrm(formula = y ~ x, data = d)
#>  
#>  
#>  Frequencies of Responses
#>  
#>    0   1   2   3   4   5   6 
#>   27  27  52  55  20  35 400 
#>  
#>                     Mixed Calibration/             Discrimination                Rank Discrim.    
#>                 Discrimination Indexes                    Indexes                      Indexes    
#>     Obs616    LOO log L-778.05+/-28.59    g  0.111 [0.002, 0.245]    C    0.541 [0.454, 0.546]    
#>  Draws4000      LOO IC  1556.1+/-57.17    gp     0.005 [0, 0.011]    Dxy 0.082 [-0.093, 0.093]    
#>    Chains4      Effective p6.97+/-0.37    EV     0.001 [0, 0.003]                                 
#>     p    1      B 0.042 [0.042, 0.042]    v       0.018 [0, 0.06]                                 
#>                                           vp             0 [0, 0]                                 
#>  
#>                Mode Beta Mean Beta Median Beta S.E.   Lower   Upper  Pr(Beta>0)
#>  y>=1           3.2033    3.2116    3.2075     0.2172  2.7729 3.6280 1.0000    
#>  y>=2           2.4632    2.4639    2.4628     0.1683  2.1270 2.7969 1.0000    
#>  y>=3           1.6921    1.6903    1.6874     0.1408  1.4208 1.9714 1.0000    
#>  y>=4           1.1603    1.1591    1.1565     0.1284  0.9151 1.4133 1.0000    
#>  y>=5           0.9980    0.9951    0.9925     0.1264  0.7443 1.2328 1.0000    
#>  y>=6           0.7362    0.7321    0.7307     0.1226  0.4987 0.9774 1.0000    
#>  x=epinephrine -0.2156   -0.2179   -0.2172     0.1607 -0.5376 0.0874 0.0862    
#>                Symmetry
#>  y>=1          1.07    
#>  y>=2          1.04    
#>  y>=3          1.00    
#>  y>=4          1.03    
#>  y>=5          1.03    
#>  y>=6          1.03    
#>  x=epinephrine 0.95    
#> 

# Extract draws directly from BLRM object and summarize them (same mean as above)
b_adj$draws |> 
  as_tibble() |> 
  posterior::summarise_draws()
#> # A tibble: 7 × 10
#>   variable        mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
#>   <chr>          <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
#> 1 y>=1           3.21   3.21  0.217 0.210  2.86  3.58    1.00    5132.    2967.
#> 2 y>=2           2.46   2.46  0.168 0.165  2.19  2.74    1.00    5281.    3257.
#> 3 y>=3           1.69   1.69  0.141 0.141  1.46  1.92    1.00    5199.    2735.
#> 4 y>=4           1.16   1.16  0.128 0.131  0.956 1.37    1.00    5153.    3009.
#> 5 y>=5           0.995  0.993 0.126 0.128  0.792 1.20    1.00    5209.    2785.
#> 6 y>=6           0.732  0.731 0.123 0.123  0.533 0.935   1.00    5113.    3101.
#> 7 x=epinephrine -0.218 -0.217 0.161 0.158 -0.484 0.0466  1.00    5035.    3095.

# Extract draws directly from RSTAN object and summarize the parameters of interest (different results!)
b_adj$rstan |> 
  tidybayes::tidy_draws() |> 
  dplyr::select(dplyr::contains(c("alpha","beta"))) |> 
  posterior::summarise_draws()
#> # A tibble: 7 × 10
#>   variable   mean median     sd    mad     q5   q95  rhat ess_bulk ess_tail
#>   <chr>     <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#> 1 alpha[1]  3.10   3.09  0.197  0.189   2.78  3.43   1.00    5123.    2847.
#> 2 alpha[2]  2.35   2.35  0.142  0.138   2.11  2.59   1.00    5270.    3202.
#> 3 alpha[3]  1.57   1.57  0.107  0.107   1.40  1.75   1.00    5113.    3367.
#> 4 alpha[4]  1.04   1.04  0.0900 0.0889  0.897 1.20   1.00    5253.    3061.
#> 5 alpha[5]  0.880  0.878 0.0869 0.0885  0.739 1.02   1.00    5564.    3514.
#> 6 alpha[6]  0.617  0.616 0.0830 0.0820  0.483 0.754  1.00    5855.    3352.
#> 7 beta[1]  -2.70  -2.69  1.99   1.95   -6.00  0.578  1.00    5035.    3095.

^{Created on 2022-09-29 with reprex v2.0.2}

arthur_albuquerque · September 29, 2022, 9:34pm

Just read this in CRAN rmsb page:

The Stan code internally using the qr decompositon on the design matrix so that highly collinear columns of the matrix do not hinder the posterior sampling. The parameters are transformed back to the original scale before returning results to R. Design matrix columns are centered before running Stan, so Stan diagnostic output will have the intercept terms shifted but the results of blrm() for intercepts are for the original uncentered data.

I would still prefer wrangling results from the RStan object. Is there any available function to make these transformations?

f2harrell · September 29, 2022, 9:56pm

That is the explanation. I wrote a little function to QR transform and back-transform that you’ll see called in blrm. You might try to apply that function to Stan draws.