Forest Plot of HRs in subgroups in Cox model

JuliaRose · January 31, 2025, 6:56pm

I am currently struggling with a plot I have in mind, which I cant produce in the setting of Cox models.

My idea is to compute Hazard Ratios of my variable (lets say bloodpressure). I want to compute the HR in different subgroups of my cohort, to look if any subgroup stands out in „performance“/„importance“ of bloodpressure for predicting outcome.

My problem is now also the usage of splines. Maybe I have an error in my thoughts:

I am modelling my best model on the whole data sets with clinical relevant variables and rcs with sufficient knots. (as described in Regression Modelling strategies)
I am using summary(model, bloodpressure=c(100,120)) to get the HR for a specific interval of interest.
I compute the same model on my subset, lets say only males and only females separately, and use step 2 to compute my HR.
Error, cause I had Sex as variable in my model, and if I only use male or female its singular.

How can I proceed here?

So my main aim is to compare the HRs between male and female for example, to know if gender makes a difference in the predictive performance of bloodpressure.

Hopefully someone can help me here, I am working on this since days and cant get my head around it

Ahmed_Sayed · January 31, 2025, 7:20pm

Just to clarify, is the aim to estimate sex-stratified hazard ratios comparing given levels of blood pressure (e.g., a separate HR (comparing let’s say a BP of 100 to 120) for males and females)?

JuliaRose · January 31, 2025, 10:06pm

yes correct.
The final aim would be a forest plot for all the subgroups, so for example:

Sex
male HR 1.2 (1.1-1.5)
female HR 1.6 (1.4-1.8)
Age
<40 HR 0.8 (0.7-0.9)
40-60 HR 1.0 (0.9-1.2)
greater or equal 60 HR 1.5 (1.4-1.6)
…

And all HRs are for bloodpressure comparing 120 and 140.

And then I could look if the HRs are significantly different (with a t-test?)
And in the above setting I could argue, that bloodpressure is risk marker especially in female and >=60 for example.

(Data are just examples, no medical context)

Ahmed_Sayed · January 31, 2025, 11:00pm

I think what you can do is:

First, build a simpler model (no interaction): outocme ~ rcs(bp, 5) + sex
Then fit a model with an interaction between BP and sex. So something like outcome ~ rcs(bp, 5) + sex + rcs(bp, 5) %ia% sex
Compare the 2 models using a likelihood ratio test (lrtest(model2, model1)). This will give you a P-value for the statistical significance of the interaction. (Does the effect of BP differ by sex?)

To get sex-stratified hazard ratios, you can do something like: contrast(model2, a = list(bp = 100, sex = "Male"), b = list(bp = 120, sex = "Male")) to get a HR comparing a BP of 120 vs 100 for males. Then, do the same for females. This will get you 2 HRs.
If you want to get a P-value for a male-female difference for this particular comparison (rather than for the interaction as a whole, as the likelihood ratio test in Step 3. does), you can get a double contrast. I haven’t done this before using rms but I believe it’s something like: contrast(model2, a = list(bp = 100, sex = "Male"), b = list(bp = 120, sex = "Male"), a2 = list(bp = 100, sex = "Female"), b2 = list(bp = 120, sex = "Female")). This should (I think) first calculate b - a (that is, effect of 100 vs 120 BP among males), then calculate b2 - a2 (that is, effect of 100 vs 120 BP among females), and then the difference between them so that you’re able to say “The effect of a 100 vs 120 BP is this much higher in females as compared to males”.

f2harrell · February 1, 2025, 12:29am

Well put. It’s important to not think of this as a subgrouping analysis.

JuliaRose · February 3, 2025, 1:05pm

very helpful, thanks!!
I will try it this way! One additional question: How can I proceed with Age instead of Sex? Cause I need to analyse this Age-groups for clinical relevance, but dont want to go into dichotomania, so I want to use the continuous values of age in the model but analyze the group <40, 40-60 and >60?

One thought, which came to my mind is to use the median of the subgroup as the reference value? so something like:
summary(model2, bp = c(100,120), age = 33)?
And another point: contrast gives me the log hazard right? what about the sign of the contrast evalue? cause I got -0.34 with contrast but with a summary call like above 0.34 and then HR=1.40?

One point which came now later to my problem list is the following:
I want to look at glomerular filtration rate as one “subgroup”, but in my model I use creatinine to avoid the age*gfr dependency which comes due to the computation of gfr (using age, creatinine and race).
How could I proceed here with the way of computing contrasts, cause I dont have the “variable of interest” directly in my model
(Same case for subgroups, for which there isnt any variable in the model, like “eatingdisorder”, eatingdosorder isnt relevant enough to put it in the prediction model, but it would be interesting to look at, if there are differences in HRs of bloodpressure)

f2harrell · February 3, 2025, 10:56pm

These are all good questions but are best separated to avoid “responder fatigue”

To deal with the first part, the desire for age groups does not make sense to me, and only creates complexity and difficulty interpreting the result. I’d rather see fully age-specific estimates and uncertainty intervals.

contrast provides the log hazard ratio and the anti-log is the hazard ratio. The sign is interpreted from the fact that contrast takes the estimated log hazard at the first covariate setting listed and then subtracts from that the log hazard for the second covariate combination.

JuliaRose · February 4, 2025, 1:08am

First of all thanks so mich for your reply and time. I am so happy to learn every day via this forum and your incredible resources! And yes, way too true that responder fatigue can happen with my overstacked questions, I am just way too curious how to deal with these things and dont want to do things the wrong way.

Regarding the age groups: It‘s more of an example for the general question: Are there any differences between specific „areas“/intervals of a continuous variable. I thought of something like a sinus shape „interaction-term“, so specific intervals of for example age have interaction and others not, so all in all interaction isnt significant although there is some interval where the interaction is significant. But it makes sense to me, that one HR and the pvalue of interaction are enough informations to argue regarding this and maybe additional ones just for fixed age values and not intervals, which can be computed via contrast/summary.

May I ask you for your expert knowledge regarding the last part of „my worrying questions“?

How to analyse a variable (e.g. egfr) which is directly connected with other predictors (creatinine, age) so inclusion in the model is impossible/difficult
How to analyse a variable which isnt a predictor itself, but the interaction could be of interest (maybe just include the variable in the main model? → only problems of degrees of freedom if data is limited)

f2harrell · February 4, 2025, 9:07pm

To me this about about well-specifying the model, and checking it against a model with more relaxed assumptions to make sure you can’t do drastically better. Then you form contrasts using your model. For example you can pick a reference value for a predictor and use contrast() to contrast a bunch of other values (of the same predictor) with it. You can get simultaneous confidence intervals using that approach, which I think are in line with your question. It will detect redundant contrasts and not penalize you for them when confidence limits are computed. Regarding interactions, it’s still about carefully specifying contrasts. See the many examples by typing ?contrast.rms at the R command line once rms is attached.