I’m analyzing TCGA survival data to understand the prognostic importance of biomarkers from genes G1 and G2. Two different markers are measured for each gene: copy number (cnv) and expression (rna). I’ve fitted a pre-specified model which includes the genetic markers and other prognostic factors available from the data:
test.model <- fit.mult.impute(Surv(OS.time, OS) ~ rcs(age, 3) + gender + tobacco_smoking_history + radiation_therapy + G1.cnv + G2.cnv + rcs(G2.rna, 3) * rcs(G1.rna, 3), cph, imp, data = patients)
Checking the ANOVA for the model yields:
Wald Statistics Response: Surv(OS.time, OS) Factor Chi-Square d.f. P age 2.89 2 0.2359 Nonlinear 2.71 1 0.0999 gender 3.73 1 0.0533 tobacco_smoking_history 2.38 4 0.6670 radiation_therapy 1.42 1 0.2330 G1.cnv 0.70 4 0.9519 G2.cnv 0.54 4 0.9700 G2.rna (Factor+Higher Order Factors) 14.06 6 0.0290 All Interactions 13.89 4 0.0077 Nonlinear (Factor+Higher Order Factors) 6.42 3 0.0930 G1.rna (Factor+Higher Order Factors) 14.36 6 0.0259 All Interactions 13.89 4 0.0077 Nonlinear (Factor+Higher Order Factors) 2.34 3 0.5040 G2.rna * G1.rna (Factor+Higher Order Factors) 13.89 4 0.0077 Nonlinear 6.91 3 0.0749 Nonlinear Interaction : f(A,B) vs. AB 6.91 3 0.0749 f(A,B) vs. Af(B) + Bg(A) 0.03 1 0.8723 Nonlinear Interaction in G2.rna vs. Af(B) 5.80 2 0.0550 Nonlinear Interaction in G1.rna vs. Bg(A) 0.85 2 0.6534 TOTAL NONLINEAR 9.65 6 0.1402 TOTAL NONLINEAR + INTERACTION 15.57 7 0.0294 TOTAL 27.56 24 0.2792
The small p-values for G1.rna, G2.rna, and the interaction(0.0259, 0.0290, 0.0077 respectively) suggest to me that the expression markers for each gene may have prognostic value, but I’m concerned that the model’s overall p-value is rather large (0.2792).
I’ve further assessed the importance of G1 and G2 expression with the likelihood ratio test and
the adequacy index:
subset.model <- fit.mult.impute(Surv(OS.time, OS) ~ rcs(G2.rna, 3) * rcs(G1.rna, 3), cph, imp, data = patients) lrtest(subset.model, full.model) A <- 1 - subset.model$stats['Model L.R.'] / full.model$stats['Model L.R.']
lrtest p-value = .1 and A = .42, providing some (although weaker) evidence that G1 and G2 expression may be prognostic.
How should I interpret these contrary results? Is it possible to reconcile the overall model’s large p-value with the individual variables smaller p-values, or must I conclude more data is needed to make any conclusions? In general, if the overall model’s p-value is large, should any individual p-values be ignored?
Additionally, I’m trying to visualize the relationship between G1.rna and G2.rna. When I visualize all the predictors the plots seem rather flat:
Visualizing only G1.rna with certain values of G2.rna is much less flat:
ggplot(Predict(test.model, G1.rna, G2.rna = c(0, 3, 6, 9)))
I see that the the second plot says adjusted to several factors while the first plot doesn’t. Are the curves from
ggplot(Predict(test.model)) averaged across all conditions while
ggplot(Predict(test.model, G1.rna, G2.rna = c(0, 3, 6, 9))) shows the curve only for specific condition? If so, which plot should I be showing?