Bayesian Regression Modeling Strategies

cselman · July 23, 2023, 9:38pm

I am running a Mac, 2020 to be exact.

f2harrell · July 24, 2023, 11:42am

Does your Mac have x86 hardware? I can’t produce a binary package for that.

cselman · July 25, 2023, 12:34am

It does not. I do also have access to a Windows computer as well.

f2harrell · July 26, 2023, 8:03pm

Now the Mac builder at https://mac.r-project.org/macbuilder/submit.html is no longer found and I can’t find a new address for it. I do have the new binary version for Windows. Go to howto and note the new argument that allows you to pass MCMC sampler options for both rstan and cmdstanr.

cselman · July 27, 2023, 6:25am

Thanks Frank, this is much appreciated. I can see that sampling.args has been added to blrm. Stupid question, but hypothetically if I wanted to change the target acceptance probability rate (adapt_delta), what code would I specify when using sampling.args? I’ve tried the following but I’m sure it’s wrong:

blrm(y~x, backend = "rstan", 
sampling.args = c(control=list(adapt_delta = 0.9)))

f2harrell · July 27, 2023, 12:15pm

sampling.args=list(adapt_delta=0.9)

f2harrell · July 29, 2023, 11:49am

New binary versions for Linux, Mac, Windows were uploaded to the web site 2023-07-28 around 4pm CDT.

cselman · July 31, 2023, 12:21am

Thanks so much Frank! Everything seems to work now.
Just a note to those who do want to change adapt_delta, the code sampling.args=list(control=list(adapt_delta = 0.9)) seems to do the trick.

cselman · July 31, 2023, 4:04am

One more query - the model no longer runs if I specify the argument keepsep = 'x' (regardless if sampling.args is specified). The following warning message is received:

Error in dimnames(x)$parameters : 
 $ operator is invalid for atomic vectors

If the keepsep argument is removed, everything works as normal. Not sure if this is a bug or something else?

f2harrell · July 31, 2023, 1:19pm

You’re using the default of rstan, right? Can you provide a simple reproducible example using simulated data with a minimum number of variables?

cselman · August 2, 2023, 5:42am

That’s right, yes. The code does work when I use a smaller number of categories. For part of our research we plan to use a 13 category outcome. A (very) simple example is shown below:

## Very simple simulation with one binary variable (eg two-arm trial) and an
# ordinal outcome with 13 categories 
set.seed(02082023)

# Treatment probs 
p0 <- c(rep(1,13)) ## ie uniform when normalised 
p1 <- c(rep(1,7),rep(2,6))

states <- c("A", "B", "C","D","E","F","G","H","I","J", "K","L","M")

# Simulate two arm trial data.
dMulti0  <- rmultinom(1, size = 500, prob = p0)
dMulti1  <- rmultinom(1, size = 500, prob = p1)
sample0  <- rep(states, dMulti0) 
sample1  <- rep(states, dMulti1) 
sample0  <- factor(sample0, levels = states, ordered = T)
sample1  <- factor(sample1, levels = states, ordered = T)

# Munge simulated data.
data           <- rbind(data.frame("x" = 0, "y" = sample0),
                        data.frame("x" = 1, "y" = sample1))
data

## Run an unconstrained PO model 
mod <- blrm(y~x,ppo=~x, data=data, backend = "rstan",keepsep='x', conc = 1, 
            priorsd = 1, priorsdppo = 1, seed = 1234, iter = 2000, 
            chains = 4,sampling.args = list(control=list(adapt_delta=0.99,max_treedepth=12)))```

This returns the following warning message:

[[1]]
Stan model 'lrmcppo' does not contain samples.

[[2]]
Stan model 'lrmcppo' does not contain samples.

[[3]]
Stan model 'lrmcppo' does not contain samples.

[[4]]
Stan model 'lrmcppo' does not contain samples.

Error in dimnames(x)$parameters : 
  $ operator is invalid for atomic vectors
In addition: Warning message:
In .local(object, ...) :
  some chains had errors; consider specifying chains = 1 to debug

When running with just one chain, the warning message is:

[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"
Error in dimnames(x)$parameters : 
  $ operator is invalid for atomic vectors

Thanks in advance for your help!

f2harrell · August 2, 2023, 12:38pm

This is very helpful but will take me some time to get to it. I did not explicitly think of the keepsep implementation when ppo contains the keepsep variable. Does it run without ppo?

cselman · August 9, 2023, 5:08am

Before the latest package update, keepsep worked fine when ppo contained the keepsep variable with any number of categories (i.e. 13). I can also confirm that the model runs without ppo (i.e. running the usual proportional odds model).

Interestingly, the model runs with a 5-category ordinal outcome (keeping everything else the same as before) when using both ppo and keepsep (code below). Could it have to do with the number of categories in the outcome?


## Very simple simulation with one binary variable (eg two-arm trial) and an
# ordinal outcome with 5 categories 
set.seed(02082023)

# Treatment probs 
p0 <- c(rep(1,5))
p1 <- c(rep(1,2),rep(2,3))

states <- c("A", "B", "C","D","E")

# Simulate two arm trial data.
dMulti0  <- rmultinom(1, size = 500, prob = p0)
dMulti1  <- rmultinom(1, size = 500, prob = p1)
sample0  <- rep(states, dMulti0) 
sample1  <- rep(states, dMulti1) 
sample0  <- factor(sample0, levels = states, ordered = T)
sample1  <- factor(sample1, levels = states, ordered = T)

# Munge simulated data.
data           <- rbind(data.frame("x" = 0, "y" = sample0),
                        data.frame("x" = 1, "y" = sample1))
data

## Run an unconstrained PO model 
mod <- blrm(y~x, ppo=~x,data=data, backend = "rstan",keepsep='x', conc = 1, 
            priorsd = 1, priorsdppo = 1, seed = 1234, iter = 2000, 
            chains = 4,sampling.args = list(control=list(adapt_delta=0.99,max_treedepth=12)))

cselman · August 31, 2023, 6:20am

Hi Frank, hope you’re going well. Just wanted to follow up on the above issue - totally understand if you have not had time to get around to this yet.

f2harrell · August 31, 2023, 12:56pm

I ran your test script on the latest binary version (Linux in my case) at Directory Tree and it worked fine.

Note: I’ll be updating rmsb on CRAN soon.

cselman · September 18, 2023, 12:23am

cselman:

## Very simple simulation with one binary variable (eg two-arm trial) and an
# ordinal outcome with 13 categories 
set.seed(02082023)

# Treatment probs 
p0 <- c(rep(1,13)) ## ie uniform when normalised 
p1 <- c(rep(1,7),rep(2,6))

states <- c("A", "B", "C","D","E","F","G","H","I","J", "K","L","M")

# Simulate two arm trial data.
dMulti0  <- rmultinom(1, size = 500, prob = p0)
dMulti1  <- rmultinom(1, size = 500, prob = p1)
sample0  <- rep(states, dMulti0) 
sample1  <- rep(states, dMulti1) 
sample0  <- factor(sample0, levels = states, ordered = T)
sample1  <- factor(sample1, levels = states, ordered = T)

# Munge simulated data.
data           <- rbind(data.frame("x" = 0, "y" = sample0),
                        data.frame("x" = 1, "y" = sample1))
data

## Run an unconstrained PO model 
mod <- blrm(y~x,ppo=~x, data=data, backend = "rstan",keepsep='x', conc = 1, 
            priorsd = 1, priorsdppo = 1, seed = 1234, iter = 2000, 
            chains = 4,sampling.args = list(control=list(adapt_delta=0.99,max_treedepth=12)))```

Hi Frank, I installed the latest version of rmsb and am still receiving the same error (using Windows). Did you run the test script using the 13 category outcome?

f2harrell · September 18, 2023, 5:14pm

The problem is caused by keepsep when using rstan. Attributes of data matrices are a bit different then but can’t figure our how to make rstan not care, so far. I may need to drop support for rstan if this keeps up. If you want to help debug I can send you the R object that offends rstan::sampling().

f2harrell · September 18, 2023, 10:14pm

I think I found the problem. I need to change attributes of matrices handed to rstan and had to change max_treedepth to 30 and seed to 209. Results may not be meaningful. Working on an update to the package now. Looks like keepsep is dangerous for this model/data.

f2harrell · September 18, 2023, 11:54pm

I just realized something I wish I had realized a year ago. I can transform the priors on the original parameters to priors on linear combinations of them that come from the QR orthonormalization. Se we should be able to get rid of keepsep. Another project this week … See this.

By 2023-09-20 I plan to have rmsb 1.0-0 on RMSb which does not require keepsep for the main part of the. I’m still trying to determine when I can use the new pcontrast approach for the non-PO terms. The new approach allows priors to be specified in a more general way, and the QR decomposition is automatically handled. But in running the new version with your test data I’m still getting a sampling error. There may be a particular challenge with the simulated data.

cselman · September 21, 2023, 1:54am

Thanks so much Frank for trying to troubleshoot this anyway. Looking forward to the new package update!