I have a large (300k patients) dataset with several relevant predictors for mortality collected in the past 9 years. Outcome is limited at hospital discharge (binary classification problem).
I have the hypothesis that mortality predictors changed their “importance” over time, that is, some predictors in 2011 have become weaker in 2019, while other have increased their importance.
I have considered several approaches, including building several (say one every six months) logistic regression models using rms::lrm followed by plot(anova(model), what = “proportion_chisq”)).
However, the model performance as a whole may not be constant over time, so I am unsure if I can interpret the changes in proportion explained chi-squared as a marker of trend in variable importance.
The same would be true for other variable importance methods, including tree based models, etc.
Anyone has a better idea?
I’m doing something like that. I don’t know if that’s what you’re asking. This is the treatment effect over the years. In such a case it is constructed by interacting the effect of the variable with the year of treatment (by means of a spline function).
In this case, aggresive therapy is not as effective at present, since more options are available.
I am more interested in prediction in my model than estimating effect size, but the same concept should apply.
The issue is I have about 10 important predictors to assess and all could (theoretically) interact with year. I will give you a shot and see how it goes.
Minor point about interacting with year: use year + fraction of a year to preserve the time information.
Thank you Dr. Harrell.
I ended up using 3 months batches with good results. But still, how to answer the question on whether the importance of the predictor changed over time?
Overall mortality decreased over time, so I cannot use “decrease” in odds as a marker of reduced importance. Any suggestions?
I’d stick with recommendations given previously. Estimate time interactions and plot effects vs. time.