RMS Discussions

I don’t know of a paper that has laid this out. This is akin to the fact that you can write principal component analysis as a kind of penalized estimation. I wish I had more.

A great question. I don’t have experience with that “new baseline” design. It may be covered in @Stephen 's amazing Statistical Issues in Drug Development book or in Steve Piantadosi’s clinical trials book.

Age is right-sensored at 90 years. You could use a model dealing with censoring.

If you are interested in the Propensity Score, you might find our paper (by Erika Graf, Angelika Caputo and me) of interest: https://pubmed.ncbi.nlm.nih.gov/18058851/
It is also treated in 7.2.13 of my book Statistical Issues in Drug Development and is the subjet of this slideshare https://www.slideshare.net/StephenSenn1/confounding-politics-frustration-and-knavish-tricks

We don’t have as many models for censoring of independent variables.

I was hoping you would chime in Stephen.

If I am modeling data across multiple clinical trials, would it make sense to perform CV to validate allowing me to leave out one trial at a time as a test set (in this context it wouldn’t be repeated CV), or would bootstrap still be preferred? If CV is the preferred approach in this context, is there still a way to adjust for trial in this model? Should I assume a compound symmetry correlation matrix for each trial?

Trial may be too large of a unit for the bootstrap to work well. If you can treat trials as exchangeable (as done in most meta-analysis; same as compound symmetric) you can use random effects for trials.

Thank you very much for your response. Is there a reason you suggest to treat trial as a random effect instead of imposing a compound symmetry correlation matrix for trial via Gls? I was under the impression from the course that gls is a more preferred approach? Also, is there a function in rms package which allows for random effects?

Thank you very much for your response. Is there a reason you suggest to treat trial as a random effect instead of imposing a compound symmetry correlation matrix for trial via Gls? I was under the impression from the course that gls is a more preferred approach? Also, is there a function in rms package which allows for random effects?

GLS (generalized least squares) is a good approach for continuous longitudinal outcomes. It provides nice ways to handle within-subject serial correlation as well as compound symmetry (which typically fits less well than say an AR(1) serial correlation pattern). The correlations we’re talking about are continuous in time, which is much different from how one models a categorical variable such as clinical site.

Hi Dr. Harrell. Sorry for the very basic question. I was wondering how to run model validity and calibration after using multiple imputation (using Hmisc and rms).

A good question with no easy answer. There are some notes from Ewout Steyerberg and possibly some references on the website of papers I pointed class participants to.

I found this link on stack exchange. I also found the following paper which seems consistent with the response in stack exchange: Wahl, Simone, et al. “Assessment of predictive performance in incomplete data by combining internal validation and multiple imputation.” BMC medical research methodology 16.1 (2016): 144.
However, intuitively, I’d think one should be able to perform multiple imputation on the full dataset prior to validation and not worry about information leakage, just as we aren’t concerned about using Y to impute X in training set. Am I missing something (don’t mind the pun)? I would love to hear your insights.

Thank you for the referral. I have ordered @Stephen 's book and look forward to reading up more about this topic.
I am wondering, in a different scenario, where lets say there were only 2 doses (instead of
daily dosing) in the above schedule (Days 1 and 20 with samples collects at several intervals in between like above) is there a necessity to treat Day as a separate categorical variable with hours-post-recent-dose nested within Day, or can I just have hours-post-initial-dose (let’s call it “time”) as one time variable and just allow for additional knots in my “time” variable to allow for the extra fluctuation after the subsequent dosing at 480 hours post original dose. I guess what I am not clear about is, are there reasons to prefer treating time as continuous spline with many knots instead of as a categorical variable to allow for the flexibility?

At the end of the course we had a brief discussion about different ways of handling time-dependent covariates. One involved extensions to the Cox model, and I think there are plenty of resources available on that. Dr. Harrell also mentioned an alternative idea which I believe he called a moving landmark design (or something similar). I believe he made reference to a study of his that used this kind of approach. In this case, I am thinking about a prediction model (no inference) that would have a moving window where predictions from a logistic or ordinal regression are made forward in time using covariates from a look-back period that moves along with the window. If Dr. Harrell or anyone else could point me to any papers using this kind of approach I would appreciate it. I do have one example of the kind of design I am referring to here (https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2728625).

Time is a simple fixed effect (the only complexity with time is allowing for a within-subject correlation pattern that is a function of time differences) so model it flexibly as a spline with more knots than usual if there are many distinct time points. Use the actual measurement times, not intended measurement windows.

See https://pediatrics.aappublications.org/content/116/5/1070 and

Thank you for the references Dr. Harrell.

1 Like

Hi Dr. Harrell. Can varclus (Hmisc) accommodate a mixture of categorical and continuous data (with bothpos)? If not, would you use ClustOfVar instead?

varclus doesn’t officially support that but we tend to forge ahead anyway. ClustOfVar is made for this but check if it is still being supported.

In the first lecture, about a half hour in, Frank mentions how specifying a simple model and then assessing the deficiencies of the model via diagnostic plots increases uncertainty in the modelling processes. Interestingly, this process sounds very similar to Gelman’s Continuous Model Expansion, where a model is specified and slowly improved upon through posterior predictive checks or similar.

Does the Bayesian perspective get around these concerns, or might we still be worried about this as Bayesians?

1 Like

Great question. The Bayesian approach only gets around this is you set priors before looking at the data, the priors cover all aspects of what you are about to model, and you don’t change the priors after looking at the data. In general, Bayesian modeling has a similar problem as frequentism: if you try a lot of models and the priors are not well pre-specified the posterior distributions will be too wide. If you add a parameter to the model for what you don’t know (e.g., normality or equal variances) the posterior gets a little wider.

1 Like