Residual-based dependent variables

Jeff · June 1, 2021, 8:52pm

I have noticed an increasing number of studies using residual-based dependent variables (DV) and I do not understand the purpose of creating a DV in this manner. What am I missing?

For example, I have seen a study of the effect of the number of a firm’s alliances (A) on the firm’s capabilities. The study measures a firm’s capabilities (C) by regressing the firm’s investment (I) on revenues to create predicted values (R) and then subtracting that from actual values (R*). That is, if R = B1 x I + e1, then C = R* - B1 x I - e1. Thus, if C = B2 x A + e2, then R* - B1 x I - e1 = B2 x A + e2.

Would not R* = B1 x I + B2 x A + e1 + e2 ? Why not just include investment as a control variable and alliances as an independent variable in a single regression model instead of using two regression models - one to create the DV and one to test the hypothesis?

f2harrell · June 1, 2021, 9:01pm

This is done a good deal in econometrics but I question this practice. It only works in very special cases. When the model is nonlinear, e.g., if you switch from a parametric to a semiparametric model, there is no way to use the residuals approach. Instead go with well-specified models that involve only one analysis stage.