Recent NEJM paper on DM: matching and relative importance of predictors

A recent NEJM paper: Risk Factors, Mortality, and Cardiovascular Outcomes in Patients with Type 2 Diabetes

Design: Cohort study- based on registry data
Exposed: diabetes ; non-exposed: no diabetes
FU ~ 5 yr
Outcomes: all-cause and CVD types mortality

  1. They matched ~ 270K diabetes patients with ~ 1.3 million no-diabetes ppl
    What could be the reasons for matching? Confounding control can easily be done with regression. Statistical efficiency and precision? necessary with such large sample size?

  2. They report “relative importance of risk factors in predicting outcomes” as measured by R squared and “explainable log-likelihood”.
    What are the relative merits of standardized coefficients vs. R sq – vs LRT?

Thank you for your thoughts.


“Confounding control can easily be done with regression.”

I’m having trouble accessing the paper, but meds would be confounded with diabetes, and some of these meds have been linked with heart outcomes (some apparently provide a protective effect eg liraglutide? [ref]) If it’s inextricably tangled in this way, is it so easy to handle with regression?

Thank you for your reply.
I agree that meds can potentially play a role in either direction. While type and intensity of treatment w’d be interesting to know, the paper did not explore how control of various risk factors was achieved. The aim was to compare outcomes across number of controlled risk factors no matter how they were controlled.

My question w’d like to invite a discussion on the relative merits of matching (1:5) vs. regression in a cohort study with an impressive sample size.

i’m not well-versed in this topic, but are comments in this thread relevant?: propensity-score-methods-vs-penalized-regression

eg @f2harrell "too many researchers jump to propensity scores when direct adjustment using ANCOVA works fine. "


Yes, thank you. The thread touches on this. It seems to me that for some, matching reached the status of a fetish. It makes a study look “better”, without real benefits but with potential for introducing bias. BBR book has a discussion on this.

1 Like

The first thing I look for is whether the matching process discarded subjects who would have been relevant to the analysis. Then I look to see if the matching algorithm provides the same matches regardless of how the dataset was sorted. If either subjects were excluded or the matching process does not represent reproducible research, then matching should be avoided. There are also other reasons. I’ve detailed much of this in Sections 10.1 and 17.2.1 in BBR.