External Validation and Calibration of Cox model

Dear Professor and everyone who may help,

I’m writing this post in order to ask for your help and experience regarding the external validation of an existing coxph model for a different population than it was originally made.

  1. We started with our cohort, of 300 individuals (a dataset of Time/Event data and covariates). In the original article the authors provide the function for the linear predictor which was used to calculate the linear predictor for each individual (thus, creating the vector “LinearPredictor” in our dataset). Secondly, the authors provide with the So(t) as a function depending only on time (t).

  2. For discrimination, we first estimated the regression coefficient on the linear predictor coefficient by fitting a Cox proportional hazards model with only the linear predictor as follows:

cox.mod ← coxph(Surv(time = Time,event = Event) ~ LinearPredictor, data=dataset)

However, the β of the linear predictor is 0.4 instead of a value close to 1. Does that means that there’s something wrong with our data or approach? Next we calculated the C-Statistic and the (R^2 D) to evaluate fit performance. The results were within acceptable limits.

The most challenging part, calibration. As mentioned in Royston et al..

3.1) There is a reported baseline survival function and obtained directly from the publication . How is it possible in r to create a Kaplan- Meier‒like estimate of the baseline survival function in our data by fitting a Cox model with no covariates other than the linear predictor, with regression coeffect constrained to 1. while also comparing it to the baseline survival function in the same graph in r?
Our attempt in code was:

t ← seq(0:284)
ti ← (t+0.1)/100
Sot ← 1.0003754 - 0.1131641 * (ti)^2 + 0.0964763 * (ti^2 * log(ti))
kmcurve ← survfit(Surv(time = Time,event = Event) ~ offset(1*LinearPredictor), data=dataset)
lines(Sot, col=‘purple’)

3.2) We’ve categorized the subjects in risk groups according to quantiles (16th, 50th, 84th) of their respective LP and obtained their KM curves. However, how is it possible to provide in the same graph the mean predicted survival curve for each group? Every attempt using both survival and rms package using survest() has ended in a failure.

3.3) Lastly, the hardest part. val.surv. in order to compare the predicted survival probability with the observed percentage stratified by deciles and calculate the Kaplan-Meier statistics.

We used the survest to calculate the predicted survival:

u ← 60
coxph ← cph(S~ dataset$LinearPredictor, data = dataset, surv=TRUE, time.inc=u, x=TRUE, >y=TRUE)
est ← survest(coxph , newdata=dataset, times = 60)$surv
val ← val.surv(fit=coxph, newdata=dataset, S=Surv(time = dataset$Time,event = dataset$Event), >u=60)
plot(val2, lim= c(0.2, 1), xlab=“Observed Survival Probability at 5-year”, ylab = “Predicted Survival >Probability at 5-year”)
groupkm(est, S=Surv(time = dataset$Time, event = dataset$Event), m=30,
u=60, pl=TRUE, add=TRUE)
we used the m=30 due to the cohort size, being 300, regardless of their risk stratified by deciles.

Have we used the val.surv correctly? How can we overcome the obstacles of 3.1), 3.2) ?

I deeply thank you for your time.

but they specified a parametric form for So(t) and should provide values for the shape and scale parameters? edit: ie i dont understand mention of KM, unless assessing fit or something?

The idea of “risk groups” is not a valid one because of heterogeneity within each group and not taking advantage of individualized risk estimates.

When you do an independent relative validation by fitting the pre-specified linear predictor against the outcome and \hat{\beta} is far from 1.0, you can stop there. Calibration has already been demonstrated to be inadequate.

1 Like

Yes, in order to assess fit.

Undeniably, calibration curves for survival probabilities are the preferred method. The idea of risk groups as proposed from Roystol et al was used as a technical device to indicate how well the model fits and predicts in a validation sample, for example by graphing or tabulating performance. In order to create a mean survival curve for a KM survival curve, is So(t)^ exp(mean(LP)) a valid way?

Could “β being far from 1.0”, mean we had inadequate sample size (~300) or was something wrong with our usage of R in estimating it? If not, what can be done next?

As always, thank you for your insight and time.

i believe so, based on dave collett’s book on survival analysis - sectn 3.8 Estimating the survivor and hazard functions. I’m not familiar with R though and how you’d obtain estimates from it

Your validation sample is what it is. The overall \hat{\beta} is the most precise estimate of relative global calibration. The model failed. Stop.

Thank you mr. Brown for the reply!

Thank you Professor for your time to answer.