How to model an outcome measure bounded between 0 and 100 using Bayesian analysis

micah · May 22, 2021, 7:19pm

I’m trying to create a Bayesian regression model for the purpose of parameter estimation. The outcome PROM that can take any value between 0 and 100. I initially created a model in jags
mod1_string = " model {
for (i in 1:length(y)) {
y[i] ~ dnorm(mu[i], 1/25)
mu[i] = int +
b_1 * continuous1[i] +
b_2 * continuous2[i] +
b[1] * ordinal1[i] +
b[2] * ordinal2[i] +
b[3] * ordinal3[i] +
b[4] * ordina4[i]
}
int ~ dnorm(-5, 1.0/25.0)

for (j in 1:10) {
    b[j] ~ ddexp(0, sqrt(2)) 
}

b_1 ~ dnorm(0.06, 1/1)
b_2 ~ dnorm(0.03, 1/1)

} "
The values for my continuous priors were taken from the literature. My ordinal priors were set to be double exponentials centered on 0. What I would prefer to have is a model that is also bound by limits 0 to 100. Should I scale my outcome to be between 0 and 1 and then use a logit on mu? In which case how do I choose the distribution of Y?

pmbrown · May 22, 2021, 8:36pm

isn’t it beta regression that you want? https://core.ac.uk/download/pdf/19485102.pdf

f2harrell · May 23, 2021, 2:39am

What about Bayesian semiparametric regression? The R rmsb package blrm function will fit a proportional odds model to a continuous Y. See here for detailed examples.

micah · May 23, 2021, 8:45pm

Thanks. I thought it might be but then I have trouble understanding how to set the shape parameters even after reading the linked pdf.

micah · May 23, 2021, 8:51pm

Thanks. I’ve reviewed your notes on this from both the BBR and RMS course and also run through some of your examples. I ran a model using blrm on my data and got the following warning
“Some Pareto k diagnostic values are too high.”
I realise from the help file that this has something to do with LOO. My knowledge about LOO is pretty much limited to what it is “Introduction to Statistical Learning” by James & Co. Should I be concerned about this? The Rhat values are all good as are the effective sample sizes and graphically the posterior chains demonstrate good convergence.
Also am I correct in understanding that I cannot change the prior other than the sd?

f2harrell · May 23, 2021, 10:25pm

I don’t think you need to worry about that. You have choices for priors for intercepts and SD of random effects. For \beta's you only get to choose the SD and you need to be cautious of when some variables are combined in the orthonormalization phase. There is an argument to keep selected covariates separate so they’ll have identifiable normal priors.

timdisher · May 25, 2021, 11:09am

Best quick resource for interpreting LOO warnings at the moment is here. It is maintained Aki Vehtari who is one of the people who designed the PSIS-LOO method for fast CV from Bayesian models.

micah · August 1, 2021, 1:28am

Hi Frank, I’ve just found a 2010 article where you were the statistician Spindler et al., “The Prognosis and Predictors of Sports Function and Activity at Minimum 6 Years After Anterior Cruciate Ligament Reconstruction.”. It used the same outcome (KOOS) on a similar cohort to the one I’m analyzing. In this case you chose to model the KOOS as a continuous variable. I realise this was a while back but is there any chance you remember why you chose this approach rather than a proportional odds. As a reminder KOOS is a likert scale transformed to give a score 0 and 100. In my cohort and also in the cohort of the referenced paper the score can be heavily left skewed with many subjects reporting a maximum value.

f2harrell · August 1, 2021, 11:45am

KOOS is almost continuous but still should be analyzed with the proportional odds model, which we did use after this paper.

QunnaLi · September 22, 2021, 2:49pm

This is helpful Dr. Harrell. For vaccine coverage as an outcome variable, it ranges from 0 to 100, do you think it is better to use coverage 0 to 100 as outcome or use counts since I have both persons vaccinated and total persons in a facility but only have facility aggregated level data? I am a little bit worried about correlated data within a facility by using marginal negative binomial model if using counts data. A related question, facilities reported cumulative vaccination coverage data every week, I think we can only take the most recent reported data for modeling even though data are available on weekly basis but we only have facility level data and vaccinated persons are reported as cumulative fashion and there will be patient turnover.

Thank you!