Variable selection when there is a covariate of main interest



I have a dataset where the main question of interest concerns the association between the use of a particular medication prior to surgery and the occurrence of a particular surgical adverse event - both are binary.

Unsurprisingly there are some potential confounders that could be associated with both the medication use and the outcome (e.g. age) that I would like to account for. These have been selected by the clinicians, but even though there are not very many (10 covariates with 400+ observations), the event rate is quite low (< 5%) and if I just fit a standard logistic regression model that includes all of the covariates, I get complete separation.

If this was a ‘normal’ modelling / variable selection problem I would use some regularisation method, but I’m not sure if that is the best approach when I am specifically interested in the adjusted effect of one particular covariate, and it is the set of adjustment covariates that I want to ‘select’ (in whatever sense).

Some of the covariates (e.g. type of surgery) are likely to be associated with the outcome but not with the medication use. Is it necessary to account for these in the regression model?

Any advice would be much appreciated!


This is a good question that should be discussed more often than it has been in the literature. I think that penalization/shrinkage/regularization is the preferred approach, where one does not penalize on the exposure variable but only penalizes the 10 covariates, using a ridge (quadratic) penalty. This procedure and studies of its performance are covered here. You’ll see in the paper that performance is better than propensity adjustment.

Competing methods are propensity adjustment, still needing to adjust for pre-specified ‘major’ predictors, and using data reduction on the 10 covariates to reduce them to require fewer regression coefficients. Nonlinear principle components is one of many methods available, as described in detail in my RMS course notes. I also have a case study using sparse principle components, and many examples using the clinically intuitive approach of variable clustering. In all of these, the exposure variable would be treated as ‘special’ and not part of the adjustment variables being reduced.

Regardless of the path you choose I would develop a propensity model to better understand which types of patients are getting the pre-op drug. And note that there are many papers in the literature claiming benefit of pre-hospital-admission drugs (statins are a notorious example) that was later found by randomized trials to have been completely biased by the healthy patient effect.


With respect strictly to separation, I’ve found Firth’s penalized maximum likelihood useful.


Thanks Prof Harrell, that paper is very relevant. I’ve followed your advice and seem to get sensible results (in fact the fear was that the drug was responsible for surgical morbidity, confounded by the fact that sicker patients were prescribed it).

Thanks, I had looked into this but for some reason or another R would hang indefinitely when I tried to use the logistf package/function on my data.


I’ve had good luck with the logistf package in R, but feel that the amount of penalization that FIrth’s method provides is not heavy enough.