# Interpreting and visualizing a Cox model for prognostic factors

I’m analyzing TCGA survival data to understand the prognostic importance of biomarkers from genes G1 and G2. Two different markers are measured for each gene: copy number (cnv) and expression (rna). I’ve fitted a pre-specified model which includes the genetic markers and other prognostic factors available from the data:

test.model <-  fit.mult.impute(Surv(OS.time, OS) ~ rcs(age, 3) + gender + tobacco_smoking_history +
radiation_therapy + G1.cnv + G2.cnv + rcs(G2.rna, 3) * rcs(G1.rna, 3),
cph, imp, data = patients)


Checking the ANOVA for the model yields:

               Wald Statistics          Response: Surv(OS.time, OS)

Factor                                         Chi-Square d.f. P
age                                             2.89       2   0.2359
Nonlinear                                      2.71       1   0.0999
gender                                          3.73       1   0.0533
tobacco_smoking_history                         2.38       4   0.6670
G1.cnv                                          0.70       4   0.9519
G2.cnv                                          0.54       4   0.9700
G2.rna  (Factor+Higher Order Factors)          14.06       6   0.0290
All Interactions                              13.89       4   0.0077
Nonlinear (Factor+Higher Order Factors)        6.42       3   0.0930
G1.rna  (Factor+Higher Order Factors)          14.36       6   0.0259
All Interactions                              13.89       4   0.0077
Nonlinear (Factor+Higher Order Factors)        2.34       3   0.5040
G2.rna * G1.rna  (Factor+Higher Order Factors) 13.89       4   0.0077
Nonlinear                                      6.91       3   0.0749
Nonlinear Interaction : f(A,B) vs. AB          6.91       3   0.0749
f(A,B) vs. Af(B) + Bg(A)                       0.03       1   0.8723
Nonlinear Interaction in G2.rna vs. Af(B)      5.80       2   0.0550
Nonlinear Interaction in G1.rna vs. Bg(A)      0.85       2   0.6534
TOTAL NONLINEAR                                 9.65       6   0.1402
TOTAL NONLINEAR + INTERACTION                  15.57       7   0.0294
TOTAL                                          27.56      24   0.2792


The small p-values for G1.rna, G2.rna, and the interaction(0.0259, 0.0290, 0.0077 respectively) suggest to me that the expression markers for each gene may have prognostic value, but I’m concerned that the model’s overall p-value is rather large (0.2792).

I’ve further assessed the importance of G1 and G2 expression with the likelihood ratio test and

subset.model <- fit.mult.impute(Surv(OS.time, OS) ~ rcs(G2.rna, 3) * rcs(G1.rna, 3),
cph, imp, data = patients)
lrtest(subset.model, full.model)
A <- 1 - subset.model$stats['Model L.R.'] / full.model$stats['Model L.R.']


lrtest p-value = .1 and A = .42, providing some (although weaker) evidence that G1 and G2 expression may be prognostic.

How should I interpret these contrary results? Is it possible to reconcile the overall model’s large p-value with the individual variables smaller p-values, or must I conclude more data is needed to make any conclusions? In general, if the overall model’s p-value is large, should any individual p-values be ignored?

Additionally, I’m trying to visualize the relationship between G1.rna and G2.rna. When I visualize all the predictors the plots seem rather flat:

ggplot(Predict(test.model))


Visualizing only G1.rna with certain values of G2.rna is much less flat:

ggplot(Predict(test.model, G1.rna, G2.rna = c(0, 3, 6, 9)))


I see that the the second plot says adjusted to several factors while the first plot doesn’t. Are the curves from ggplot(Predict(test.model)) averaged across all conditions while ggplot(Predict(test.model, G1.rna, G2.rna = c(0, 3, 6, 9))) shows the curve only for specific condition? If so, which plot should I be showing?

2 Likes

i wouldn’t look at the overall model p-value. Regarding the change in p-value for g1 and g2, is it due to type I v type II sums of squares? I’m not familiar with the software you’re using (i guess it’s R). Something is certainly wrong with the plot. I would calculate it by hand if you don’t know what R is doing

edit: I just saw the final plot, looks better!

1 Like

Tomas you’re looking at this is an excellent way. A complication is that not all the statistics coming from cph are worked out for multiple imputation. The LR test and measures of predictive discrimination come only from the last imputation unfortunately. It’s best to stick with anova Wald tests which use all the imputations. You might do a chunk test for combined effects of both rna factors, e.g. anova(test.model, G1.rna, G2.rna) and also do an anova combining these with cnv variables to get an overall multiplicity-adjusted test of all G variables.

How many events are in the dataset?

In terms of large overall P-value this is a bit of concern but needs to be more directly addressed by computing some measures of total model predictive discrimination. You might compute the variance of X\hat{\beta} for each imputation and average all those variances, and do likewise for c-indexes using rcorr.cens or the faster function for this in the survival function.

1 Like

I wasn’t aware of those cph limitations, thanks for the heads up. I’m working with a very small dataset of 185 patients and 77 events. I realize with so few events the large p-values are to be expected unless the effect size is huge.

Both chunk tests offer some evidence for G1 and G2 rna expression as useful predictive markers

anova(test.model, G2.rna, G1.rna)

              Wald Statistics          Response: Surv(OS.time, OS)

Factor                                   Chi-Square d.f. P
G2.rna  (Factor+Higher Order Factors)    13.74      6    0.0327
All Interactions                        13.40      4    0.0095
Nonlinear (Factor+Higher Order Factors)  6.24      3    0.1003
G1.rna  (Factor+Higher Order Factors)    13.69      6    0.0333
All Interactions                        13.40      4    0.0095
Nonlinear (Factor+Higher Order Factors)  1.93      3    0.5874
TOTAL NONLINEAR                           6.92      5    0.2263
TOTAL                                    13.99      8    0.0820

anova(test.model, G1.rna, G2.rna, G1.cnv, G2.cnv, G2.rna * G1.rna)

     Wald Statistics          Response: Surv(OS.time, OS)

Factor                                         Chi-Square d.f. P
G1.rna  (Factor+Higher Order Factors)          13.69       6   0.0333
All Interactions                              13.40       4   0.0095
Nonlinear (Factor+Higher Order Factors)        1.93       3   0.5874
G2.rna  (Factor+Higher Order Factors)          13.74       6   0.0327
All Interactions                              13.40       4   0.0095
Nonlinear (Factor+Higher Order Factors)        6.24       3   0.1003
G1.cnv                                          0.73       4   0.9475
G2.cnv                                          0.53       4   0.9705
G2.rna * G1.rna  (Factor+Higher Order Factors) 13.40       4   0.0095
Nonlinear                                      6.24       3   0.1006
Nonlinear Interaction : f(A,B) vs. AB          6.24       3   0.1006
f(A,B) vs. Af(B) + Bg(A)                       0.00       1   0.9774
Nonlinear Interaction in G2.rna vs. Af(B)      5.22       2   0.0734
Nonlinear Interaction in G1.rna vs. Bg(A)      0.82       2   0.6650
TOTAL NONLINEAR                                 6.92       5   0.2263
TOTAL NONLINEAR + INTERACTION                  13.93       6   0.0305
TOTAL                                          17.79      16   0.3366


I’ve managed to use survConcordance to compute the c-index for the full and subset model:

concordance(Surv(OS.time, OS) ~test.model$linear.predictors) n= 185 Concordance= 0.3497 se= 0.03639 concordant discordant tied.x tied.y tied.xy 2672 4968 0 2 0  concordance(surv.obj ~ subset.model$linear.predictors)
n= 185
Concordance= 0.3573 se= 0.03735
concordant discordant     tied.x     tied.y    tied.xy
2730       4910          0          2          0


I would assume that subset.model's slightly better c-index is due to the c-index not being sensitive enough to compare performance between models. Neither c-index from survConcordance seems to match the D_{xy} from the model output

full.model
Model Tests       Discrimination
Indexes
Obs        185    LR chi2     31.80    R2       0.162
Events      77    d.f.           24    Dxy      0.294
Center 11.0456    Pr(> chi2) 0.1336    g        1.028
Score chi2  31.21    gr       2.797
Pr(> chi2) 0.1490


Using D_{xy} = 2 (c - \frac{1}{2}), I would get a D_{xy}= 2(.3497-.5) = -3.006, not 0.294.

subset.model
Model Tests       Discrimination
Indexes
Obs     185    LR chi2     18.35    R2       0.097
Events   77    d.f.           16    Dxy      0.270
Center 6.58    Pr(> chi2) 0.3099    g        0.798
Score chi2  17.44    gr       2.229
Pr(> chi2) 0.3634


Similarly, D_{xy} = 2(0.3573 - .5) = -.2854, not 0.270. Is there a reason these numbers don’t match?

After doing some reading on Type I vs Type II sums of squares, I’m assuming anova in rms uses Type II although I couldn’t find any information about this in the documentation. What’s your intuition for not looking at the overall model?

Note I used concordance instead of survConcordance because there was a warning that survConcordance has been deprecated and suggested using concordance instead. Interestingly they yield different c-indexes:

survConcordance(surv.obj ~ test.model$linear.predictors) Call: survConcordance(formula = surv.obj ~ full.model$linear.predictors)

n= 185
Concordance= 0.6502618 se= 0.03804693
concordant discordant  tied.risk  tied.time   std(c-d)
4968.000   2672.000      0.000      2.000    581.357

concordance(surv.obj ~ test.model$linear.predictors) Call: concordance.formula(object = surv.obj ~ full.model$linear.predictors)

n= 185
Concordance= 0.3497 se= 0.03639
concordant discordant     tied.x     tied.y    tied.xy
2672       4968          0          2          0


Looks like concordance = 1- survConcordance

1 Like

i have never looked at the p-value for a full model, but my background is mostly RCTs. i suspect you want something else ie ‘predictive discrimination’ mentioned by @f2harrell Although i’m not sure you’re building a model for prediction, you just want to evaluate the predictive ability of the two markers(?)

1 Like

Ah yes, I missed that. It turns out my concordance call was incorrectly formatted. The way I call concordance(surv.obj ~ test.model\$linear.predictors) must flip the interpretation because concordance(test.model) gives the same c-index as survConcordance. This only works with non-imputed models though - concordance doesn’t seem to know how to handle an imputed model.

Yes, I’m not interested in building a prediction model per se - there isn’t enough data/covariates to build a model that will externally validate and be useful in clinic. Due to the limiting circumstances, I’m more interested to know if there is any evidence to suggest these genes be further studied as potential prognostic factors that can be included in future models.

In that regards, I’m interested in showing that these genes do improve predictive discrimination, but I’ve read that measures like the c-index aren’t sensitive enough and it’s better to use likelihood ratio p-values and adequacy indexes. I’m open to other suggestions though if you think there are better ways to evaluate predictive ability.

You can’t use an interaction term in anova( ) with asterisk. anova automatically pools all the needed terms, i.e., if you say anova(a,b) you’ll get any interactions involving either a or b.

Your pooled test with 8 d.f. for G*.rna has P=0.08. This is a perfect multiplicity adjustment. So your model is not ready for prime time. You might even argue that you should pool the cnv variables with this chunk test.

1 Like

That makes sense, thank you everyone for all the help!