Predicting an exposure using variables also related to the outcome

I have built a prediction model using lasso regression for a continuous variable X. The predictors of X include age, sex, and several other covariates

Now I want to examine the relation between predicted X (Xpred) and mortality and compare it to the relation of observed X (Xobs) with mortality.

The association between age and Xpred is stronger than the association between age and Xobs. Therefore, when I run a Cox model to examine the association between X and mortality, adjusting for age has a stronger effect for Xpred than for Xobs.

I would like to keep age among the predictors of X because it plays a strong role. I could for instance residualize Xobs abd Xpred with respect to age before including them in the Cox model, but not sure this would help.

I was wondering whether others have faced this situation and would welcome any advice or references.

Thank you.

In the spirit of “once lasso always lasso”, lasso depends on carrying its heavy shrinkage forward to downstream analyses, i.e., it is not proper to re-fit a model on just the variables lasso “selected”. How are you doing this?

Thank you.

We used two different methods to build the prediction model ; both yielded very similar results:

  • Lasso. After running the lasso, we refit a standard linear regression model including the variables that were retained by the Lasso. I understand from your response that this is not correct. Are there alternatives.
  • XGBoost. In this case, we did not refit a model and applied the XGBoost model to predict Xpred.

However, my question was more related to how to include Xpred in a model that predicts an outcome when age and sex are included in the model used to predict Xpred and are also related to the outcome.

Mainly remember that once you choose a method you need to use that method for all subsequent analyses. A possible exception is the relaxed lasso which has a 2nd stage analysis with a different penalty function than what was used in the first state lasso. But importantly, it still retains a penalty function.