Random effect model for multi center prognostic markers

In the Gerds and Kattan Medical Risk Prediction text they mention that " Including the cohort variable as a random effect in a logistic or Cox regression model would make it possible to predict new patients from other centers. However, doing this is not well motivated. There are two problems. The first is that adding a random effect corresponds to conditioning on another predictor variable (the random effect), although this variable has not been observed. Hence, if the random effect turns out to be important this means that an important predictor variable is not available and hence the predicted risk may be systematically too high or too low. The second problem is related to the non-collapsability of logistic regression models and Cox regression models."

If this is accurate what is the appropriate method for dealing with prognostic factor models using data from multiple centers?

Here’s my understanding.

  • Estimation of random effects requires a lot of data from a lot of centers
  • Unless using a Bayesian nonparametric random effects distribution, hierarchical models tend to make restrictive normality and single variance assumptions about random effects, and violations of these assumptions can hurt overall inference and invalidate predictions of future centers
  • Random effects models assumes centers are exchangeable
  • Sometimes it is more meaningful, useful, and extrapolatable to model center characteristics than actual center attended
  • Prediction of outcomes at future or non-sampled centers is highly dependent on the centers being exchangeable when the method in the last bullet point is not used
  • When there is large unexplainable variation in outcomes across centers it’s often better to include random effects than to exclude them
  • Don’t ever make the mistake of paying a lot of attention to centers while refusing to model all-important patient-specific baseline characteristics such as age and extend of disease

Thank you. How would you determine if the prognostic model or prognostic factors apply to the average individual or to the average center?

May be best to give an example of what you are referring to.

I dont have a specific example but just looked at this paper and was wondering if with random effects models defining the estimand is essential.

Cluster randomized trials require random effects to get the right correlation structure. That’s somewhat different.