Hi all,

I have a question about a modelling strategy for a pharmacokinetic application, potentially using model averaging. Would appreciate any tips or suggestions!

In my field, we have a series of measures of a radioactive label which is chemically attached to our compound of interest, and we take a series of measurements of radioactivity. However, our compound is also being metabolised over time, so our measured signal contains both the parent compound and the metabolised fraction too. As such, we also measure a set of typically 6-10 measurements during our time course describing the parent fraction of our compound. We aim to interpolate our measurements of the parent fraction in order to optimally correct for the effects of metabolism.

For different compounds, there are different pharmacokinetic profiles. There are a set of 3/4 main models (each of which have 2-3 parameters) which are used to model these profiles, and choosing which of these families best corresponds to the data is usually quite clear, as the curves match up to the points pretty well. We don’t really care about the actual biology, or why any one model is chosen: just that it does a good job of interpolating the data well.

We also sometimes even need to extrapolate a little for some curves, where measurements have failed. And some measurements which don’t “fail” are still bad measurements, and can really mess up a fit, but these are usually quite easy to spot.

Our usual strategy has been to use nonlinear least squares curve fitting to fit each of the 3/4 models to each individual’s measurements, and then select the model which wins out most frequently on AIC or BIC (and it’s usually ~90% for one or another model) between individuals. So, that’s not usually an issue.

However, problems arise when the selected model used shows a consistent bias in particular time ranges. This is also usually quite clear, and the field has typically just created new ad-hoc variants of each of the model families to account for these differences. So this is where things start getting a little bit iffy. Now, we have about 20 different model variants, many of which are very similar to one another, and which start being really hard to choose between (described nicely here: https://journals.sagepub.com/doi/pdf/10.1177/0271678X15610585). And it gets very easy to start overfitting too, when one has a new compound.

I am planning to write an R package to make it easier to fit the main 3/4 models in a multilevel framework, which I’ve successfully applied for a couple of these models, and I’ve found that it does a brilliant job of automatically taking care of the nasty outliers, and seems to produce better estimates. Our sample sizes are usually around 20-100, and it seems to work in small samples too. However, the model itself will still be biased over certain portions of the time course. I don’t want to implement every model I can find and just throw them at every problem - this feels very inelegant.

So, I was hoping to extend this into some kind of model averaging approach. Now, we know that one model usually describes the data best for most of the curve, but if it is biased in a certain time range, maybe one of the other curves does a better job. So I don’t really want to do regular model averaging just because model 1 works better in the first half, and model 2 works better in the second half. So here was my first draft of a plan (and I realise there are some red flags here):

- Fit each model using NLME across individuals
- Interpolate the squared residuals into a consistent grid of maybe ~1000 times (the measurements are taken at slightly different times between individuals, so this puts everything into the same time course)
- Calculate mean (or trimmed mean) squared residuals across individuals for each time point, for each model. (it should be noted that these are all on percentage scale, and the curves are usually in a fairly similar range between individuals for any given time point)
- Calculate Akaike weights for model comparison of the models, for each time point.
- Perform model averaging for each time point, to get a time series of model weights for each model.

Now, I realise that there are some issues here!

- There is a lot of potential for overfitting: however I figure that the hierarchical models used for the individual models should hopefully do most of the hard lifting for accounting for this.
- I believe model averaging relies on the fact that the residuals should add up to zero, which they do not for individual time points. This is probably the major flaw here, I think.
- We are not giving any priority for the direction of the bias: our main curve may have a negative bias, so we would optimally perform model averaging with another curve that had some positive bias over the same section of the curve.
- We run into problems if we need to extrapolate for everyone.

I don’t really know where to go from here. The optimal approach would probably be to fit them all together, and have some kind of model averaging spline fit through the lot of them, but I don’t know of of anything like this. This could maybe be done with a big MCMC model, but we don’t want this taking too long to calculate: this is just something we want to do for preprocessing of our measurements. I considered the possibility of just actually fitting a GAM with smoothes over the predicted values of all three models to just come to an optimal balance of the three which changes across time, this way, I could start having e.g. negative coefficients for some periods of time for some models, which just feels a bit too crazy. I’ve also considered fitting a hierarchical GAM to just model the bias over time, to make a de-biaser, but this feels a bit like it really does become a little bit too flexible in ways which might overfit the data.

I would love any and all suggestions for how I might attack this problem. It feels like there are probably a multitude of good ways to go about it - from fairly easy to very hard - but I also feel that it shouldn’t be too hard to at least come up with a strategy which outperforms our current approach.

Thanks so much in advance for any help!