Hi everyone - I am pretty new to this site, but I’ve been enjoying reading some of these discussions. Thank you to everyone who is contributing - I’m learning so much.
I’m recently trying to solve a problem - I am a PhD student studing puberty and mental health in a large adolescent cohort. We have ~2000 saliva samples kept in storage years at -20Celsius before being analysed for steroid hormones by our collaborators. I’ve conducted a stability analysis to estimate their degradation in storage. I’d like advice on analysing the data from a Bayesian perspective. I’ve been motivated to this approach because:
- I’m interested estimating the actual rate of degradation, not just null hypothesis testing.
- I want to use a heirarchical modelling and regression for non-linear relations. I’ve had some success using brms for this purpose previously.
Our approach was to collect saliva from 16 healthy volunteers, which have now been assayed for our biomarkers of interest at 6 timepoints after storage in identical conditions to the adolescent cohort for 2 years. I know want to know the probable degradation for each biomarker in this smaller sample of 16.
After some research on this topic, I think there are 3 plausible kinetic processes by which biomarkers could degrade (there are more of course, but here are three):
1) Linear degradation
y = y_0 - kt
where:
- y is the concentration at time t
- y_0 is the baseline concentration
- k is the slope with time (and is most likely negative - all biomarkers should some degree of degradation over the time period)
As volunteers have different baseline values, I’ve decided to use a heirarchical modelling approach (for both this model and the following two classes of model, see later) where y_0 could be modelled as:
y_0 \sim normal(\mu_{y_0}, \sigma_{y_0}). I’d like to compare models where k is fixed and k is allowed to vary between subjects in a hierarchical model. Both models will have y_0 varying between subjects.
My first question is: what kind of prior do you think is appropriate for y_0? Does it make sense to have a Gaussian prior (even though y_0 cannot take on negative values?) In real life, steroid concentrations are lognormally distributed. Should the prior reflect this distribution?
2) Exponential Decay (Single-First Order or SFO)
Of course, it’s more likely that participants that have higher concentrations of biomarker will have higher rates of decay. A simplest model for this is:
\frac{dy}{dt} = -ky
i.e. the rate of degradation is proportional to the concentration of biomarker. This translates to the model:
y = y_0e^{-kt}
where:
- y is the concentration at time t
- y_0 is the baseline concentration
- k is rate constant (per time period)
Again I’d like to use a hierarchical approach (again I’d like to compare models where k is fixed and k is allowed to vary between subjects in a hierarchical model. Both models will have y_0 varying between subjects.)
3) Indeterminate order rate equation (IORE)
y = {y_0}^{(1-N)}-(1-N)k_{IORE}t]^{\frac{1}{1-N}}
Here there is an extra parameter to estimate: N, which is the rate order for the degradation reaction (and is likely to be from 0 to 3 maximum possibly 4 although I think this is next to impossible, although I don’t have a reference for that statement!).
I think N would be fixed of course, but I’d like to compare models that vary the rate constant, k_{IORE} between subjects.
The HELP that I need is setting priors and model selection
PRIORS
For y_0, I am just going to set weakly informative priors with the baseline mean as the location parameter in a wide guassian, although this would allow negative values in my prior. Should I use a lognormal prior instead? And how would I decided the parameters for this?
I think I can set sensible weakly informative priors for k for models (2) and (3), but what would be a sensible prior for N?
APPROACH TO MODEL SELECTION
Here I was simply going to use PSIS-LOO (leave one out cross-validation) and compare the outputs as per the vignette for brms, along with post-predictive checks. I’d be comparing:
-
- linear model, fixed k
-
- inear model, k varies between subects
-
- SFO fixed k
-
- SFO k varies between subjects
-
- IORE fixed k
-
- IORE k varies between subjects
…and I’d be comparing the outputs from the PSIS-LOO. However, any other thoughts would be much appreciated! For instance perhaps it is better to use a semi/nonparametric approach as well?
I don’t really know of anyone in my research group who can help me on this - so any advice would be great. THANK YOU SO MUCH.
Alex