Parametric survival models for prediction and non-proportional hazards

Pavel_Roshanov · November 16, 2020, 4:27am

Basic question here as I am trying to avoid the pitfall of ignoring violation of the proportional hazard assumptions but don’t have the necessary practical experience.

I want to develop a parametric survival model with the purpose of prediction. Several predictors will violate the PH assumption so I want to model interactions with some function of time. Ideally I want to use rms facilities. Are there code examples for how to do this? And for competing risk ideally Fine & Gray type models?

I’m aware of the stmp2 command in Stata and the competing risk version for that command but I’d rather leverage the flexibility of R.

f2harrell · November 16, 2020, 12:16pm

Take a good look at this. You can do some of that with rms::cph but it may be a bit better to stick within the survival package for time dependent covariates.

Note that learning which covariates need the PH assumption relaxed is quite difficult and perhaps non-replicable so you may want to look into Robert Gray’s general penalization approach on all the covariates or find another survival model that fits most of the covariates well.

Pavel_Roshanov · November 17, 2020, 11:08pm

Thank you.

I wonder if the most interpretable approach is to split follow-up time into a small number of periods, selected a priori based on clinical expectation/prior data, and model the interaction of followup time with a pre specified set of covariates. I recognize the marginal disadvantage over splines, but then can use cph to fit the model instead.

f2harrell · November 18, 2020, 12:09am

I think the choice of time cuts is too hard to determine and too arbitrary. And things are seldom that discontinuous.

Pavel_Roshanov · November 18, 2020, 4:06pm

Thank you.

Another question that I have not been able to straighten out:

If the interest is purely in estimating probability of outcome occurring before specific times, say, 60 days and 180 days after hospital discharge, what is the disadvantage of simply using logistic regression? And if you are interested in predicting 2 separate but competing outcomes (where the proportional odds assumption is likely to be violated)?

I assume the disadvantage is that you cut down the number of events drastically at 60 days compared to what is available throughout the entire time period, but i’m not sure how important that is if you have ample events.

f2harrell · November 18, 2020, 5:06pm

Some other disadvantages are loss of efficiency (higher standard errors, lower power) and inability to handle censoring before the cutoff.

Pavel_Roshanov · November 21, 2020, 6:15am

ok so I have read now all about the evils of period-specific hazard ratios essentially resulting from selection bias because “susceptibles” have already come out of the risk set in the group that is experiencing the event sooner, thereby making that group look relatively better in the later time periods and you see the HR start to reverse course.

Imagine now that there is a bad event, a heart attack, and we follow people out from that heart attack and compare them to other people who were in hospital at the same time but did not have a heart attack. Initially the folks who had the heart attack start to die at a much more rapid rate than those who did not have the heart attack. The aHR associating heart attack with death is 4 in the first 30 days. Thereafter you clearly see the survival curves start to become more parallel, and as you would expect, the aHR begins to decrease to 3, then to 2, then to 1.5…

Is it really a biased interpretation to say that the susceptible patients who were going to die soon due to the heart attack have mostly died up to some time period and those who survived thereafter are clearly not as susceptible?

The problem with the period-specific hazard ratios is obvious with estimation of treatment effects: if you give half the patients a medication that kills those susceptible to its harmful effects who may be older and more frail, you only leave healthier people after time t to compare against a control group in whom mortality follows a natural history so the HR goes from harm to benefit.

(putting prediction issues aside and just focusing on questions of etiology, I recognize the issue of arbitrary discontinuities in time but the reality is that hazard ratios for specific times make things a heck of a lot more interpretable, especially when they make sense given what you see in the survival curves.)

Pavel_Roshanov · December 20, 2020, 5:41pm

To close the loop on this:

Ultimately I think flexible parametric models (fit via stpm2 in Stata, for example) with adjustment by regression standardization is a much better approach than period-specific hazard ratios from a Cox model.