External Validation and Calibration of Cox model

Dear Professor and everyone who may help,

I’m writing this post in order to ask for your help and experience regarding the external validation of an existing coxph model for a different population than it was originally made.

  1. We started with our cohort, of 300 individuals (a dataset of Time/Event data and covariates). In the original article the authors provide the function for the linear predictor which was used to calculate the linear predictor for each individual (thus, creating the vector “LinearPredictor” in our dataset). Secondly, the authors provide with the So(t) as a function depending only on time (t).

  2. For discrimination, we first estimated the regression coefficient on the linear predictor coefficient by fitting a Cox proportional hazards model with only the linear predictor as follows:

cox.mod ← coxph(Surv(time = Time,event = Event) ~ LinearPredictor, data=dataset)
summary(cox.mod)

However, the β of the linear predictor is 0.4 instead of a value close to 1. Does that means that there’s something wrong with our data or approach? Next we calculated the C-Statistic and the (R^2 D) to evaluate fit performance. The results were within acceptable limits.

The most challenging part, calibration. As mentioned in Royston et al..

3.1) There is a reported baseline survival function and obtained directly from the publication . How is it possible in r to create a Kaplan- Meier‒like estimate of the baseline survival function in our data by fitting a Cox model with no covariates other than the linear predictor, with regression coeffect constrained to 1. while also comparing it to the baseline survival function in the same graph in r?
Our attempt in code was:

t ← seq(0:284)
ti ← (t+0.1)/100
Sot ← 1.0003754 - 0.1131641 * (ti)^2 + 0.0964763 * (ti^2 * log(ti))
kmcurve ← survfit(Surv(time = Time,event = Event) ~ offset(1*LinearPredictor), data=dataset)
plot(kmcurve)
lines(Sot, col=‘purple’)

3.2) We’ve categorized the subjects in risk groups according to quantiles (16th, 50th, 84th) of their respective LP and obtained their KM curves. However, how is it possible to provide in the same graph the mean predicted survival curve for each group? Every attempt using both survival and rms package using survest() has ended in a failure.

3.3) Lastly, the hardest part. val.surv. in order to compare the predicted survival probability with the observed percentage stratified by deciles and calculate the Kaplan-Meier statistics.

We used the survest to calculate the predicted survival:

u ← 60
coxph ← cph(S~ dataset$LinearPredictor, data = dataset, surv=TRUE, time.inc=u, x=TRUE, >y=TRUE)
est ← survest(coxph , newdata=dataset, times = 60)$surv
val ← val.surv(fit=coxph, newdata=dataset, S=Surv(time = dataset$Time,event = dataset$Event), >u=60)
plot(val2, lim= c(0.2, 1), xlab=“Observed Survival Probability at 5-year”, ylab = “Predicted Survival >Probability at 5-year”)
groupkm(est, S=Surv(time = dataset$Time, event = dataset$Event), m=30,
u=60, pl=TRUE, add=TRUE)
we used the m=30 due to the cohort size, being 300, regardless of their risk stratified by deciles.

Have we used the val.surv correctly? How can we overcome the obstacles of 3.1), 3.2) ?

I deeply thank you for your time.

but they specified a parametric form for So(t) and should provide values for the shape and scale parameters? edit: ie i dont understand mention of KM, unless assessing fit or something?

The idea of “risk groups” is not a valid one because of heterogeneity within each group and not taking advantage of individualized risk estimates.

When you do an independent relative validation by fitting the pre-specified linear predictor against the outcome and \hat{\beta} is far from 1.0, you can stop there. Calibration has already been demonstrated to be inadequate.

1 Like

Yes, in order to assess fit.

Undeniably, calibration curves for survival probabilities are the preferred method. The idea of risk groups as proposed from Roystol et al was used as a technical device to indicate how well the model fits and predicts in a validation sample, for example by graphing or tabulating performance. In order to create a mean survival curve for a KM survival curve, is So(t)^ exp(mean(LP)) a valid way?

Could “β being far from 1.0”, mean we had inadequate sample size (~300) or was something wrong with our usage of R in estimating it? If not, what can be done next?

As always, thank you for your insight and time.

i believe so, based on dave collett’s book on survival analysis - sectn 3.8 Estimating the survivor and hazard functions. I’m not familiar with R though and how you’d obtain estimates from it

Your validation sample is what it is. The overall \hat{\beta} is the most precise estimate of relative global calibration. The model failed. Stop.

Thank you mr. Brown for the reply!

Thank you Professor for your time to answer.

Dear Professor and everyone who may help,

I’m writing this post in order to ask for your help and experience regarding the external validation of nhs.predict model for indian population.

In nhs.predict, os.predict function providing 5 year overall survival for each patients along with breast cancer related mortality. I have calculated PI(Breast cancer mortality prognostic index) and MI (Other mortality prognostic index) for our data, using os.predict I am getting also 5 year survival. Now I am having 2 Cox model one having beta coefficients related to PI and other Cox model for MI. But I do not have breast cancer related mortality in our dataset. It will be great if someone help me out how to deal with both the cox ph model. And how could I get the predicted number of events from this model.

Please precisely define PI, njs.predict, os.predict. And tell us more about the dataset and how patients got into the dataset and which endpoints were accurately collected.

PI and MI was obtained using the formula
pi = age.beta.1* age.mfp.1 + age.beta.2* age.mfp.2 + size.beta* size.mfp +
nodes.beta* nodes.mfp + grade.beta* grade.val + screen.beta* screen +
her2.beta + ki67.beta
mi = 0.0698252*((age.start/10)^2-34.23391957)
from the os.predict function using nhs.predict package.
This data is for Indian breast cancer patients having screen, tumor size, grade, nodal status, er, her2 details. I have calculated 5 year breast cancer survival using the code:
os.predict(validation,year = 5,age.start = age.start,screen = screen,size = size,
grade = grade, nodes = nodes, er = er, her2 = her2,
ki67 = ki67, generation = generation, horm = horm,
traz = tras, bis = bis.t)
Now I want to validate this data. I have tried to do that using

this link.
Now the main challenging part is:

  1. We have events as alive or death. Do not have details of deaths happened due to breast cancer nhs.predict package was validated using breast cancer related survival.
  2. As mentioned in the previous link, we only have validation data. Do not have derivation data to compare the calibration.
  3. Using os.predict, I am getting 5 year survival from each patients which are massively differing from the values
    est ← survest(coxph , newdata=dataset, times = 60)$surv
    where coxph ← cph(Surv(osmonths, osevent) ~ age.mfp.1+age.mfp.2+size.mfp+nodes.mfp+grade.val+offset(pi), data = validation, surv=TRUE, time.inc=u, x=TRUE, y=TRUE)
  4. I am also not very sure how to deal with two cox ph model related to PI and MI.
    How can we overcome these obstacles? I deeply thank you for your time.

Just a gentle reminder to professor @f2harrell and everyone who may help.

All I can think of is that if you have a truly external validation on a “frozen” predictive model you can use rms::val.surv.