Calibration of Cox model

I am using Cox model for risk prediction, and I am kind of lost regarding calibration. I am reading on Nam D’Agostino test, but I am unable to grasp the time period to be selected for calculating predicted probabilities. The predicted probabilities vary depending on the time period I select for calculating the probabilities, which directly affects my observed vs predicted probabilities/rates across deciles. I was thinking of using the mean follow-up time in the data to calculate predicted probabilities but a post on statsexchange recommended against using it. I am not sure how to go about this.

(I am using administrative claims data with significant right censoring. Because the enrollment information is month-to-month, my predicted probabilities are monthly)

Thank you.

1 Like

It is not appropriate to use deciles or any other binning of continuous predicted values. My RMS book and course notes go into details related to obtaining smooth semi-parametric calibration curves for survival models, for a single time horizon. This is implemented in the R rms package calibrate.cph function. For examining the entire range of times and not selecting a single time point, there are residual-based methods you may want to take a look at, e.g., see the R rms package val.surv function.

1 Like

Dear Professor,

I developed an AFT log-normal model with the help of your great tutorial RMS book, however I get stuck when I am trying to externally validate the model using a dataset from another country, I think I should use the val.surv function for calibration, but I cannot find any examples illustrating external validation using val.surv, could you please give me a hand? Many many thanks!

The documentation gives the building blocks but doesn’t put it all together in an obvious way. You either specify a fit object or est.surv and you specify a time point u and the new observed survival time and event indicator as a Surv object in S. If you specify the actual predicted survival probabilities in est.surv they are all for time u. You’ll get a smooth external calibration curve estimate based on hazard regression from the polspline package.

1 Like

Dear Professor, thanks a lot for your reply. I tried with the codes below:

w1<-val.surv(g ,newdata=zzunonsurgical, u=60, S=Surv(zzunonsurgical$TimeToGe,zzunonsurgical$Generalization))

g represents the lognormal model I developed. Then I get a calibration curve at a specific time point of 60 months with plot (w1), right?

Thank again for your kind help! feng

I think so but it’s fastest to try it yourself. Note I edited your post to format the code.

1 Like

Many many thanks, Professor!

Sorry that I do not know how to edit the codes in the formal way you used. I got a calibration curve which is far from my expectation:


But the c index is acceptable to me,
C Index Dxy S.D. n missing uncensored
7.698844e-01 5.397687e-01 5.563388e-02 1.720000e+02 0.000000e+00 7.000000e+01
Relevant Pairs Concordant Uncertain
1.694800e+04 1.304800e+04 1.230600e+04

For the calculation of the c-index, I used your Hmisc::rcorr.cens function and the codes are as below,
###Determine concordance

Now I still doubt if I am doing it in a right way? May I have your opinion on this please? Thank you!!

Edit you post to include the code to get the calibration curve, and if you haven’t tried it already try using your computed estimates in calling val.surv. Since your validation apparently failed with respect to calibration you can ignore the c-index except to know there is signal in your predictions.

1 Like

Dear Professor,

Sorry if I am being dim, I re-did the external validation part as far as I understand:

###calibration for external validation
fitlognorm<-psm(Surv($TimeToGe,$Generalization)~ MaleGender + AChRValue + AntiMuSKAb + OAID + AgeAtOnset+ SteroidsUse, data =, dist = “lognormal”)

g<-update(fitlognorm, x=TRUE, y=TRUE)


w1<-val.surv(g,newdata=zzunonsurgical, S=Surv(zzunonsurgical$TimeToGe, zzunonsurgical$Generalization), est.surv = estimates, u=36)



Output of print(w1):

Output of plot(w1)

Then I tried to get the c-index:
###Determine concordance

The output reads as below:

If my validation is failed, I will start over again. Thanks a lot for your time, patience and kind help. :slight_smile:

I think you did everything right, other than possibly unreasonable linearity assumptions for the covariates. What is the difference in how the training and test datasets were sampled? How many events were in each of the two samples? Why are you using external validation instead of strong internal validation?

One way to check the code is to use val.surv on the training data to make sure you get a (meaningless) perfect calibration plot.

Note that x=TRUE, y=TRUE can be included in psm().

1 Like

Dear Professor, thanks for your reply.

A non-linear effect was not found in any variables included for analyses. I was trying to put knots to continuous variables, but the model’s AIC was smallest when there are no knots added. I found a time-varying effect for the “age” variable, so I chose the lognormal AFT model instead of cox regression.

My colleague collected the training data in a German hospital (n=253, events=113), and then I collected the validating data with the help of one colleague in a Chinese hospital (n=194, events=78). I chose external validation because the two samples are quite different (race, treatment, even diagnostic tests) and the sample size is actually not small given the rarity of the disease.

As you mentioned above, I really got a perfect useless calibration plot using val.surv on the training data. T_T I suppose the model I developed is a failure. One easy solution might be that we combine the two samples, re-develop the model, and then try strong internal validation? But I found this useful paper from Janssen, which showed that “recalibration” methods are recommended to update the model.

BTW, my friends told me that rms::calibrate function can also be used for external validation like this:

set.seed (717)
calv1<-calibrate(fit, cmethod=‘KM’, newdata=external_dataset, m=60, method=‘boot’,u=12,B=400)

But I tried several times in vain. May I have your opinion please? Many thanks.

calibrate only works for internal validation and as you’ll see in the documentation there is no newdata argument.

What you may be observing is a very strong region effect. It is usually better to do a combined analysis and to include region as a variable, hoping that region is in proportional hazards. If using AFT do the stringent residual plot as shown in my RMS course notes to check fit.

1 Like

Dear Professor,

Thanks a lot for guiding me through this. :grin:Have a nice weekend!

With best regards

Dear Professor,

I was surfing the internet during the weekend and hope to find some possible solutions for my external validation problem. I found some codes for external validation as below:

fev<- psm(s~ predict(fitlognorm,newdata = validatingdata),x=T,y=T, data = validatingdata, dist=“lognorm”)
cal<- calibrate(fev, cmethod = “KM”, method = “boot”, u=36, m=60,B=100)

It seems to work for me, but I am a layman for coding and not sure if it makes sense for external validation. Could you please tell me your expression on this?

Thanks a lot and have a good one!

That code is not correct. It does internal validatiion on the external sample. You want val.surv for external validation, as we went over earlier.

1 Like

Hello one and all,
I have a quick question that relates extracting the actual value of the mean absolute error (0.052) on the calibration plot. Is there an easier way of getting this value to be reported in my knitr reproducible report, since source data would be changing I want to avoid cut and paste.


Hello Steve!

That looks like the output from a binary logistic model. Are you sure it’s from a Cox model fit (i.e., from the cph function in the R rms package)?

Questions about RMS are best posted in

It is actually from an ols function, I apologize for asking under Cox model calibration. I searched for calibration and posted it not looking at the topic. Thank you for your quick response.

Dear Professor,
I’m writing this post in order to ask for your help and wisdom regarding the external validation of an existing coxph model for a different population than it was originally made.

  1. We started with our cohort, of 300 individuals (a dataset of Time/Event data and covariates). In the original article the authors provide the function for the linear predictor which was used to calculate the linear predictor for each individual (thus, creating the vector “LinearPredictor” in our dataset). Secondly, the authors provide with the So(t) as a function depending only on time (t).

  2. For discrimination, we first estimated the regression coefficient on the linear predictor coefficient by fitting a Cox proportional hazards model with only the linear predictor as follows:

cox.mod ← coxph(Surv(time = Time,event = Event) ~ LinearPredictor, data=dataset)

However, the β of the linear predictor is 0.4 instead of a value close to 1. Does that means that there’s something wrong with our data or approach? Next we calculated the C-Statistic (or Harrell’s C statistic, it’s an honour addressing to the person who concieved this himself) and the (R^2 D) to evaluate fit performance. The results were within acceptable limits.

  1. The most challenging part, calibration. As mentioned in Royston et al..

3.1) There is a reported baseline survival function and obtained directly from the publication . How is it possible in r to create a Kaplan- Meier‒like estimate of the baseline survival function in our data by fitting a Cox model with no covariates other than the linear predictor, with regression coeffect constrained to 1. while also comparing it to the baseline survival function in the same graph in r?
Our attempt in code was:

t ← seq(0:284)
ti ← (t+0.1)/100
Sot ← 1.0003754 - 0.1131641 * (ti)^2 + 0.0964763 * (ti^2 * log(ti))
kmcurve ← survfit(Surv(time = Time,event = Event) ~ offset(1*LinearPredictor), data=dataset)
lines(Sot, col=‘purple’)

3.2) We’ve categorized the subjects in risk groups according to quantiles (16th, 50th, 84th) of their respective LP and obtained their KM curves. However, how is it possible to provide in the same graph the mean predicted survival curve for each group? Every attempt using both survival and rms package using survest() has ended in a failure.

3.3) Lastly, the hardest part. val.surv. in order to compare the predicted survival probability with the observed percentage stratified by deciles and calculate the Kaplan-Meier statistics.

We used the survest to calculate the predicted survival:

u ← 60
coxph ← cph(S~ dataset$LinearPredictor, data = dataset, surv=TRUE,, x=TRUE, y=TRUE)
est ← survest(coxph , newdata=dataset, times = 60)$surv
val ← val.surv(fit=coxph, newdata=dataset, S=Surv(time = dataset$Time,event = dataset$Event), u=60)
plot(val2, lim= c(0.2, 1), xlab=“Observed Survival Probability at 5-year”, ylab = “Predicted Survival Probability at 5-year”)
groupkm(est, S=Surv(time = dataset$Time, event = dataset$Event), m=30,
u=60, pl=TRUE, add=TRUE)
we used the m=30 due to the cohort size, being 300, regardless of their risk stratified by deciles.

Have we used the val.surv correctly? How can we overcome the obstacle of 3.1), 3.2) ?

I deeply thank you for your time and for inspiring me and the newer generations of data scientists.