Suppose one wants to estimate a treatment effect in a covariate-adjusted model. When the treatment does not interact with the other covariates, the treatment effect is independent of all covariates as long as you stay on the linear predictor scale. For a Cox proportional hazards model or a logistic model, treatment effect ratios (hazard ratio or odds ratio) are also covariate-independent. But when we want to estimate other quantities such as absolute risk reduction due to treatment or difference in median survival times, the nonlinear transformations involved make the result covariate-dependent.
I am seeming an approach to finding covariate settings that yield representative results for general nonlinear transformations. For example, what is a vector X of covariate values for 4 covariates with each of the values near its marginal covariate median such that the predicted probability that Y=1 | X equals the non-covariate adjusted predicted probability (i.e., dependent only on treatment)?
Note that it doesn’t work to set all the covariates to the median or mean, as these combinations may not occur in the data for various reasons including collinearity.
Here is one algorithm. It also can create covariate settings that don’t occur, but allows you to stay somewhat close to the marginal medians and has an adjustable parameter. I wonder if anyone has a better one.
- Consider all quantiles q from 0.35, 0.36, 0.37, …, 0.65
- Compute the q’th marginal quantile of each covariate that has an increasing relationship with Y, and the (1-q)'th quantile of each covariate that has a decreasing relationship with Y
- Evaluate the covariate-adjusted predicted risk at these covariate quantiles, for the control treatment arm
- Compute the absolute difference between this predicted risk and the marginal (non-covariate-adjusted) risk
- Choose the value of q that minimizes this absolute difference
- Save the q and (1-q) quantiles (depending on covariate direction)
- Use these covariate values to get desired example estimates
Here is another possible algorithm—one that will find “real” combinations:
- Compute predicted absolute risk for all covariate combinations occuring in the data
- Find all of the combinations with absolute difference between predicted risk and target marginal risk less than \epsilon for some \epsilon
- Choose the single covariate combination that is in some sense near the center of the data
When the covariates are categorical and few in number, the following approach could be tried:
- Find covariate combinations for which more than 10 patients have that combination
- Sort the remaining combinations in descending order of frequency
- For each combination show the absolute difference between that combination’s predicted risk and the target average risk
- Choose the combination that is the “biggest bang for the buck”, i.e., that has the best tradeoff between absolute difference and cell frequency.
Keep in mind that for covariate-dependent quantities we generate a series of estimates varying covariate settings. When curves do not cross, statistical evidence for a treatment effect will probably not vary very much over the different choices of X. But for the moment I am seeking a vector of X that would be useful for summarizing covariate-adjusted results for derived parameters.
Terry Therneau et al. have some related thoughts. See also this, which discusses averaging over covariate distributions rather than making predictions for individual covariate vectors.