Estimating mean survival from marginal and conditional models

My question is:

What is the appropriate estimate of mean survival in a target population defined by a health care provider?

The context is:

Typically, evidence about the relative effects of treatments comes from one or more RCTs and mean survival for each treatment is estimated from some parametric model fitted to the data i.e. with extrapolation to patients lifetimes and without an assumption of proportional hazards. When there is only one RCT it is common to use only the study data and to fit models separately to the data from each treatment arm or with an effect for treatment included for each parameter in the models.

When there are multiple RCTs the studies are typically used to estimate relative effects (e.g. using a network meta-analysis of fractional polynomials for the hazard function) which are “added” to a baseline hazard function in order to generate treatment specific survival functions and mean survival.

A common feature of both situations is that the analysis is of marginal models with no account taken of the sampling scheme or known prognostic factors or treatment effect modifiers. Accepting the principle that clinical trials should be analysed as randomised then we should include any stratification factors and also any known prognostic factors.

My questions are:

  • How should mean survival for the target population be estimated from a conditional model?

  • What is the relevance of the joint distribution of the covariates in the target population and how might this be estimated?

  • What can be said about the interpretation of the estimates of mean survival from marginal and conditional means and the extent to which estimates from a marginal model are biased with invalid estimates of error?


Welcome to datamethods John. There are wonderful questions.

I’d like to start the discussion by noting that one of the most common misunderstandings of clinical trialists is that because treatment is randomized one can ignore within-treatment patient outcome heterogeneity. This leads to sample size inflation, and this attitude is ironically displayed by clinical investigators who claim to be interested in precision medicine. A great opportunity for precision and patient specificity has been lost by avoiding covariate adjustment, and treatment effects are underestimated as detailed in the Analysis of Covariance chapter in BBR. Related to this issue is the fact that proportional hazards is more likely to hold upon covariate adjustment than when treatment is the only variable in the model, i.e., only marginal results are reported.

When only marginal results are reported, e.g. unadjusted hazard ratios and survival curves, the results are a function of the covariate distribution over which marginalization was done. That makes the results not fully transportable.

To answer at least one of your questions, I don’t know how to do any of this without getting patient-level data and reanalyzing in a fully conditional way. To your first question “How should mean survival for the target population be estimated from a conditional model”, I think “target population” is not well defined, and conditional models should be used to estimate conditional quantities. To uncondition on covariates we need to know the multivariate covariate distribution in the population (good luck getting that!). Terry Therneau and Patricia Grambsch, I believe, cover a simple case in their book Extending the Cox Model (perhaps giving an example of adjusting to a certain age distribution). My personal preference is to just keep providing covariate-specific estimates when applying results from a clinical trial.

A general answer to your second question “What is the relevance of the joint distribution … and how might this be estimated?” I don’t think you will find the needed data. You will find age distributions, sex distributions, age x sex distributions, but not age x severity of disease distributions. An exception might be the SEER registry in oncology.