Internal variables with a heavy boundary of zero

Matthew_Rich · February 10, 2025, 4:58am

Hi all,

First post! (rms and this website has been an oasis as I strive to become a better modeler)

I do PK modeling and I have a bear of a project where the data is really poor. Different assays, different disease states, you name it, it is messed up. I am still tasked with analyzing it.

I am doing an exposure response analysis with overall survival as the key endpoint. I have made a PK model, which I do not think is that great because I have so many BQL values ( roughly half the data is BQL, due to really poor clinical study design). My fits with the zeros included are terrible, better without, but I have high random effect shrinkage values. I folded this information into a TTE analysis for a cox model using average dose interval AUC as the exposure variable.

So I know my model is pretty much junk and I am really looking for a post mortem from people more skilled in the art.

My three major questions are as follows

How would one deal with data where there is very larger potions of zeros in the data? It skews the entire analysis. I saw some mention of joint modeling but I did not see anything like that in the RMS sections so I am wondering if there is a better way. If it is the way is there way to achieve such a model using RMS.
I know I lose a lot of information by averaging but the AUC is constantly changing and I was not sure how it would be integrated in the model there are not a lot of simple examples of internal variables being used in a cox hazard model to get me started. I was concerned because I guess tmerge would need an entry for each time the AUC changed which would make each subjects take up many many rows. Is this the only way to do it? Is there a better way?
What do you when your continuous covariate is absurdly noisy (like with too many zeros or just noisy data) that one cannot discern what’s going on. Do you bother burning degrees of freedom fitting a spline to a point cloud?

Thanks in advance

Matt

f2harrell · February 12, 2025, 10:42pm

Lots of zeros in an X don’t necessarily ruin that part of the model, but often the residual variance needs to be allowed to be different if the zeros come from detection limit truncation. Also you might add a discontinuity for the X effect at zero.

Matthew_Rich · February 13, 2025, 12:54pm

I apologies, left out the detail that the prevalence in zeros is in my Y! I got this data after the work had been completed and the timing of the sampling was such that they missed drug concentration completely ( basically half my data is below the detection limit).

In my literature search they suggest joint modeling. Is that something that can be accomplished with the rms package.

davidcnorrismd · February 13, 2025, 9:10pm

Formally, one would say BQL values are censored. Bayesian methods handle such data coherently and efficiently, and of course there is a long history of Bayesian methods in PKPD.

One nice thing about PKPD is that at least you have a rich underlying theory upon which to build realistic models, and you don’t have to make recourse to theory-free empirical curve-fitting. This paper of mine shows how even a very paltry data set can be approached meaningfully, using methods for censored data.

f2harrell · February 15, 2025, 8:53pm

No, it’s best to do that with joint Bayesian models using Stan.

Matthew_Rich · February 25, 2025, 1:32pm

Thanks for the tip, I will look into that.

Matthew_Rich · February 25, 2025, 1:35pm

@davidcnorrismd

Thanks for the reference I will be reading may circle back with some questions.