Modelling long-term longitudinal biochemical degradation - a Bayesian approach?

Hi everyone - I am pretty new to this site, but I’ve been enjoying reading some of these discussions. Thank you to everyone who is contributing - I’m learning so much.

I’m recently trying to solve a problem - I am a PhD student studing puberty and mental health in a large adolescent cohort. We have ~2000 saliva samples kept in storage years at -20Celsius before being analysed for steroid hormones by our collaborators. I’ve conducted a stability analysis to estimate their degradation in storage. I’d like advice on analysing the data from a Bayesian perspective. I’ve been motivated to this approach because:

  • I’m interested estimating the actual rate of degradation, not just null hypothesis testing.
  • I want to use a heirarchical modelling and regression for non-linear relations. I’ve had some success using brms for this purpose previously.

Our approach was to collect saliva from 16 healthy volunteers, which have now been assayed for our biomarkers of interest at 6 timepoints after storage in identical conditions to the adolescent cohort for 2 years. I know want to know the probable degradation for each biomarker in this smaller sample of 16.

After some research on this topic, I think there are 3 plausible kinetic processes by which biomarkers could degrade (there are more of course, but here are three):

1) Linear degradation

y = y_0 - kt
where:

  • y is the concentration at time t
  • y_0 is the baseline concentration
  • k is the slope with time (and is most likely negative - all biomarkers should some degree of degradation over the time period)

As volunteers have different baseline values, I’ve decided to use a heirarchical modelling approach (for both this model and the following two classes of model, see later) where y_0 could be modelled as:
y_0 \sim normal(\mu_{y_0}, \sigma_{y_0}). I’d like to compare models where k is fixed and k is allowed to vary between subjects in a hierarchical model. Both models will have y_0 varying between subjects.

My first question is: what kind of prior do you think is appropriate for y_0? Does it make sense to have a Gaussian prior (even though y_0 cannot take on negative values?) In real life, steroid concentrations are lognormally distributed. Should the prior reflect this distribution?

2) Exponential Decay (Single-First Order or SFO)

Of course, it’s more likely that participants that have higher concentrations of biomarker will have higher rates of decay. A simplest model for this is:

\frac{dy}{dt} = -ky

i.e. the rate of degradation is proportional to the concentration of biomarker. This translates to the model:

y = y_0e^{-kt}

where:

  • y is the concentration at time t
  • y_0 is the baseline concentration
  • k is rate constant (per time period)

Again I’d like to use a hierarchical approach (again I’d like to compare models where k is fixed and k is allowed to vary between subjects in a hierarchical model. Both models will have y_0 varying between subjects.)

3) Indeterminate order rate equation (IORE)

y = {y_0}^{(1-N)}-(1-N)k_{IORE}t]^{\frac{1}{1-N}}

Here there is an extra parameter to estimate: N, which is the rate order for the degradation reaction (and is likely to be from 0 to 3 maximum possibly 4 although I think this is next to impossible, although I don’t have a reference for that statement!).

I think N would be fixed of course, but I’d like to compare models that vary the rate constant, k_{IORE} between subjects.

The HELP that I need is setting priors and model selection

PRIORS

For y_0, I am just going to set weakly informative priors with the baseline mean as the location parameter in a wide guassian, although this would allow negative values in my prior. Should I use a lognormal prior instead? And how would I decided the parameters for this?

I think I can set sensible weakly informative priors for k for models (2) and (3), but what would be a sensible prior for N?

APPROACH TO MODEL SELECTION

Here I was simply going to use PSIS-LOO (leave one out cross-validation) and compare the outputs as per the vignette for brms, along with post-predictive checks. I’d be comparing:

    1. linear model, fixed k
    1. inear model, k varies between subects
    1. SFO fixed k
    1. SFO k varies between subjects
    1. IORE fixed k
    1. IORE k varies between subjects

…and I’d be comparing the outputs from the PSIS-LOO. However, any other thoughts would be much appreciated! For instance perhaps it is better to use a semi/nonparametric approach as well?

I don’t really know of anyone in my research group who can help me on this - so any advice would be great. THANK YOU SO MUCH.

Alex

3 Likes

Hi Alex,

Here are a few thoughts:

Model Specification

I am a little confused by how you are specifying your models. What you have written seems to imply is only the baseline is measured with error. A more traditional form might be:

E(Y_{it}|X_i) = X_i\beta-kt and an appropriate within-subject correlation structure (note that typically y_0 is modeled as a function of baseline covariates X_i\beta)

Prior Selection
I am unsure if brms allows you to specify parameter bounds. If it doesn’t, you could try coding the model in Stan directly. But yes, generally if a parameter cannot physically take on negative values you want a prior that represents that belief.

In your case you have a couple of options for prior elicitation. Generally, one way to do this is to set your priors (before looking at the data) such that you exclude physically impossible or highly improbable scenarios. You can use prior predictive checks (28.4 Prior predictive checks | Stan User’s Guide) to see how your choice of prior is interacting with your data model. For simple models, or for parameters that have an intuitive physical interpretation, you can reason directly. You mention for the IORE model that N is very unlikely to be greater than 4 – so a first pass at a prior might be a half normal distribution that you fix the standard deviation such that Pr(N>4) is some small value (say 0.025 or 0.05)

Model Selection
PSIS-LOO is probably a good start, though it is entirely possible that your data may not contain enough information to strongly distinguish between candidates absent any prior reason to favor one model over another. An interesting approach would be just to stick with the IORE model – you can treat the rate order as a parameter to be estimated during the modeling process. That way you’ll end up with a sort of prior + data averaged trajectories and avoid having to make a forced choice between linear or exponential decay.

2 Likes

@js592 This is really useful thank you. Yes I’ve not been very clear regarding my error structure. You’ve given me great food for thought - thanks again!