Nature Biotechnology has just published An electroencephalographic signature predicts antidepressant response in major depression by Wei Wu, Yu Zhang, et al. Gustav Nilsonne has posted a series of tweets about it which I hope that others can elaborate on. Gustav has pointed out what seem to be major flaws in the analytic approach.
It is ironic, but not uncommon, that a paper using advanced machine learning methods fails to get the simplest things right. Key to predicting response to depression therapy is using the response variable correctly. In antidepressant drug trials, when one plots, say, 12 week Hamilton D depression scale against baseline HamD, the resulting plot is extremely nonlinear. This nonlinear relationship is caused by patients with severe depression (high HamD) having much larger drug response than those with smaller HamD. This implies that change from baseline for HamD is meaningless. The baseline needs to be kept in context. This is easily done by fitting a proportional odds ordinal logistic model that is tailored to this situation:
Follow-up HamD = restricted cubic spline in baseline HamD + treatment effect
In the Nature Biotechnology paper, the authors improperly used ordinary change from baseline HamD, so they failed to notice that the most important predictor of change in HamD is the baseline HamD. And failure to take baseline HamD into account distorts their analysis is unknown ways.
Since the response variable used in the paper is defective, it is possible that what the machine “learned” is how to predict baseline Hamilton D. But we do not need to predict what is observable.
I note that no biostatisticians were involved in the paper. In this biostatistician’s opinion, it shows. I am also wondering about the peer review system at Nature Biotechnology.
General problems with the use of change scores are detailed in BBR Section 14.4.