Calculating fraction of new information in Cox proportional hazard models


In continuation to the topic we discussed today in our workshp, in regard to the blog post about calculation of the added predictive information by a biomarker:

The example mentioned in the blog post, calculates the added predictive information of a biomarker in a Binary Logistic Model.
The example given, tries to assess the added information of a new biomarker (cholesterol) to a base model.

The formula to calculate the added information is:
fraction of new information= 1- variance(pre)/variance(post)

I will attach here the code used to calculate the variance of the model with the biomarker:

g <- lrm(sigdz ~ rcs(age,4) * sex + rcs(choleste,4) + rcs(age,4) %ia%
         rcs(choleste,4), data=acath) # This is the basemodel+biomarker
post <- predict(g, type='fitted')   # post-test probability
variance <- round(var(post), 3)


This text will be hidden

I tried to do the same calculations for a Cox proportional hazard model, would like to get feedback on the code/approach if that is possible:

I am attaching a hypothetical example, I am using a different package (that is the one I used at the time, would be happy to learn other ways of doing the calculations using the rms package)

Model <- coxph(Surv(surv_time, event) ~ biomarker+ variable1+variable2, data =mydata) # The base model+biomarker

# In to the predict the survival probability I am using exp(-expected), ref: (page 86-87)

model_prediction <- predict(Model , type="expected")  
model_prob <- exp(-model_predicton )

model_variance <- round(var(model_prob), 3)

This text will be hidden

I think my question is, how can I calculate the probability for the outcome in cox model?

I hope I am being clear enough!

I’m not clear that expected survival is the right metric, and why \exp(-x) is used.

A general issue is choosing a scale for the calculation of the variance of predictions. For binary logistic model I’ve seen more expected results by computing on the probability scale, i.e., running the expit transformation on X\hat{\beta}. For the Cox model it is very easy to compute the variance on X\hat{\beta} which is also nice in not requiring you to choose a time. For a probability scale you have to choose a time u and estimate \Pr(Y \geq u | X) for every subject. This is easy with the rms package cph and survest functions and also with Predict(..., time=).

It’s worth also taking the simpler approach of using R^{2}_\text{adj} using this.

Thank you for the reply.
I am revisiting the post for a paper I am working on.
We are trying to compare the performance of two different biomarkers.
The biomarkers are supposed to be measuring the same things.

The comparison is of the cox proportional hazard models.
I calculated the traditional metrics (c statisic, NRI, IDI)
I wanted to calculate the fraction of added information and produce the back to back histogram.

I noticed the link you provided in your reply for R^2 adj is not working, is there a way to link it again?

Thank you

I fixed the link. Please try again.

Thank you! it works.
I have few more questions to make sure I am applying the metrics correctly, if that is ok
(I am sorry in advance if the questions are basic)

From my understanding, and from the blog post, there are different metrics that can be used to asses the fraction of new information (other than the traditionally used ones):

  1. The liklehood ratio (the frequenst metric)- I managed to calculate this.

  2. The adj R^2 (related to LR ratio) it allows comparison between two or more different models.

  3. The variance in pre vs post test probability and visualising with histograms: this is where I am a bit struggling.
    I took your advice in using survest function to compute the survival probabilities. I was wondering about choosing the time u, how do I make that decision. (do I choose a unified time point for all the subjects, is it the median/mean time?, or do I provide a vector with the follow up time for each subject)

Again, I am sorry if my questions are too basic, I am still learning.

Thank you

There is no unique way of choosing them time so I’d do it for several possible times, separately.

Thank you for clarifying this!


An advantage of sticking to the linear predictor scale is that it is time-invariant. But I seen a logistic regression example where the variance on the logit scale was a bit counter-intuitive.

how can I calculate the variance while sticking to the linear predictor scale?
h(t|X) = h0(t) * exp(b1X1 + b2 X2 + … + bp*Xp)

The linear predictor is what’s inside \exp() so doesn’t require t or h(t).

Hi Prof. Harrell,

Sorry for asking some simple questions, does the “X^β” refer to the β parameter in cox model?
For example, β for AGE in the following cox model.
Model <- cph(Surv(py, d) ~ AGE,data = dt)

How can I obtain the variance of β? The following code seems not correct.

predict(Model , type="expected") # for cph

In general linear models, it seems that a higher R-squared value is better, such as above 0.5. However, in Cox models, the pseudo R-squared values I calculate through the cph are usually very low, such as 0.2 or 0.1 or even lower. Is a higher pseudo R2 better like that in general linear models,

And, I have another fundamental question. Is the R2 in cph the adjusted R2?

Thank you.

                       Model Tests     Discrimination    
Obs       618    LR chi2    111.51     R2       0.208    
Events     85    d.f.           18    R2(18,618)0.140    
Center 7.9482    Pr(> chi2) 0.0000     R2(18,85)0.667    
                 Score chi2 117.34     Dxy      0.602    
                 Pr(> chi2) 0.0000

My preferred pseudo adjusted (for no. of predictors) R^2 is the 0.667 here. This and the D_{xy} are quite high. Times to events are usually hard to predict. There are two adjusted R^2 in the output. See here.

To get the variance-covariance matrix of \beta use vcov(fit object). To get the variance of the linear predictor X\beta use var(predict(fit object)).

Thank you very much, Prof. Harrell.

I want to confirm my understanding。 Is the following expression correct?
R2(18,618) is the R2(p, n): adjusted for estimating p regression coefficients (non-intercepts)

R2(18,85) is the R2(p, m): use effective instead of actual sample size, don’t penalize for overfitting
!!! This was a mistake I made while copying, and Prof. Harrel corrected it. The correct expression is
R2(18,85) is the R2(p, m): effective sample size adjusted for estimated regression coefficients !!!

About the terminology of R2(0.667) that you recommend, is “Pseudo adjusted R-squared based on an effective sample size” an appropriate expression?
(I want to make sure that I use the correct terminology when referring to the R2)

Prof. Harrell,

If possible, could you please help me review my analysis results and confirm whether my interpretations are correct?

I am currently examining whether there is additional value in predicting diabetes by observing the levels of triglycerides in non-fasting and in fasting blood samples.
I have two datasets—one for non-fasting and another for fasting blood samples.
I have constructed two simple models to observe the impact of triglyceride levels.

The following is the non-fasting code. The fasting code is omitted due to limited space. By the way, No. of fasting is 5830, which is lower than non fasting (7217).

discb <-  cph(
        Surv(dm_py, diabetes) ~  AGE + SEXC + strat(AREA) +
            BMIndex + # not use rcs() for age and BMI because no apparent nonlinear trends
            rcs(TIMEm, 4) * rcs(GLU, 4) # TIMEm is time after meals(mins)
        x = TRUE,
        y = TRUE,
        data = dtnfp
# Obs      7217    LR chi2    576.28      R2       0.097    
# Events    819    d.f.           18    R2(18,7217)0.074    
# Center 3.6175    Pr(> chi2) 0.0000     R2(18,819)0.494    
# Score chi2 862.85      Dxy      0.494    
# Pr(> chi2) 0.0000                  

discn <-  cph(
        Surv(dm_py, diabetes) ~  AGE + SEXC + strat(AREA) +
            rcs(TIMEm, 4) * rcs(GLU, 4) +
            rcs(TIMEm, 4) * rcs(TGmmol, 4),
        x = TRUE,
        y = TRUE,
        data = dtnfp
# Obs      7217    LR chi2    678.72      R2       0.113    
# Events    819    d.f.           30    R2(30,7217)0.086    
# Center 2.7732    Pr(> chi2) 0.0000     R2(30,819)0.547    
# Score chi2 987.62      Dxy      0.524    
# Pr(> chi2) 0.0000                        



Table. Model discrimination from including serum triglycerides.
Model 1(age+ sex + bmi +glu) Model 2 (further + tg)
Likelihood ratio test statistic 576.3 678.7
fraction of new information Ref. 15.1
Variance-covariance 0.63 0.77
fraction of new information Ref. 18.2
Pseudo adjusted R-squared 0.49 0.55
Somers’ D 0.49 0.52
Likelihood ratio test statistic 812.2 867.7
fraction of new information Ref. 6.4
Variance-covariance 1.01 1.15
fraction of new information Ref. 12.2
Pseudo adjusted R-squared 0.64 0.66
Somers’ D 0.58 0.59

For the above results, my interpretation is:

  1. non-fasting glucose combined with triglycerides provides a better prediction of diabetes.
  2. Fasting glucose is sufficient for a good prediction of diabetes, and adding triglycerides has a relatively slight improvement.

R2 in fasting in model 1 is 0.64, which is higher than that (0.49) in non-fasting. Can these R2s be compared?
Also, can these variance-covariance (0.6/0.7 in non-fasting and 1.0/1.1 in fasting) be compared?

That’s all correct except that any R^{2}_\text{adj} that adjusts for the regression degrees of freedom is penalizing for overfitting. Not as much as AIC would (it has a 2p penalty instead of p), but trying to get an unbiased estimate of R^2 just like the ordinary R^{2}_\text{adj} does with ordinary linear regression.

I understood my mistake.
I made an error when copying the explanation for R²(p, m). R²(p, m) represents the effective sample size adjusted for estimated regression coefficients. I mistakenly copied the definition of R²(m). I will add comments on this mistake in my previous response.
Thank you very much!!

I also want to confirm the following (sorry for being verbose): Is it valid to compare the R2 calculated on different datasets (fasting and nonfasting) for the same model? What should be considered when comparing R2 for model evaluation?

Some metrics, such as AUC, are not suitable for comparison, Likelihood ratio can be used for comparing nested models, but it seems that comparing the same model across different datasets may not be appropriate. R2 is a good metric for comparison in most situation.

Is my understanding correct?"

Comparing any R^2 measure on different datasets is risky, as R^2 will be lower when predictors have a narrower range. The same goes for other measures such as c and D_{xy}. It is still useful but you have to take the comparisons with a big grain of salt.

1 Like

I see. I will avoid comparing them in different datasets. Thank you for your kind assistance. Your comments are very valuable.