Critique of paper on generalizability of oncology trials

Frank, a part of your response I don’t understand is the rationale for using R^2, by which I presume you mean the squared correlation of the observed and fitted outcomes (if not, what are you using it for?). Hence I must ask you to please explain why and how you are using R^2; after all, there is a fair bit of literature on its scaling defects as well as its detachment from contextually relevant measures of effect, model fit, and predictive loss, e.g., Cox DR, Wermuth N, A comment on the coefficient of determination for binary responses. Am Statist 1992;46:1–4. Rosenthal RB, Rubin DR, A note on percent variance explained as a measure of importance of effects, J Appl Soc Psychol 1979;9:395–396. Greenland S, Schlesselman JJ, Criqui MH, The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am J Epidemiol 1986;123:203–208. Greenland S, A lower bound for the correlation of exponentiated bivariate normal pairs. Am Statist 1996;50:163–164.

3 Likes

Hi Sander - I was referring to the deviance-based R^2 measures listed here. My favorite one uses the effective sample size and penalizes for the number of parameters in a very similar way that ordinary linear model R^{2}_\text{adj} does.

3 Likes

Thanks Frank for the explanation. The measures you cite can be viewed as generalizations of the original Gaussian R^2 I mentioned, and inherit all of the original R^2 scaling defects as well as detachment from contextually relevant measures of effect, model fit, and predictive loss.

Those defects have not stopped me (or anyone I know) from looking at such measures as part of a suite of model diagnostics; and the adjustments your article mentions (such as penalization against overfitting) are definite improvements. Even if used alone the measures are better than no check at all (which I think is the norm in the research literature I see). But presumably you would agree that, if the purpose of the model is to come up with reliable individual predictions for clinical decisions (as opposed to just estimating associations or effects averaged over the data source), much more detailed and contextually relevant checks are needed.

4 Likes

Yes, agree with all. I use generalized R^2 indexes primarily as measures of predictive discrimination / predictive information and not as goodness-of-fit measures.

1 Like