Proper analysis and presentation of the association between a continuous exposure and a binary outcome

noambard · November 8, 2020, 7:30pm

Dear colleagues, we would greatly appreciate your assistance with an issue we are facing.

We are studying the relationship between a continuous exposure and a binary outcome (to be concrete, a lab test and yes/no severity of a disease), adjusted for certain other variables.
The few previous studies studying this association discretized the exposure into several bins and, using the lowest bin as the baseline in a regression model, found that only above a certain threshold does the association become significant. They took this to mean that under said threshold, the exact level of the exposure does not matter.
This is very clearly an artifact of the discretization. Using a cubic regression spline, we see that the risk rises smoothly at the relevant clinical range. There is (obviously) no threshold.

The question is, having done the non-discretized analysis, how best to present the continuous dose-response relationship in a paper in the medical literature.
We want the main conclusion to be (and this is novel): “There is no threshold in the relevant clinical range, less is better”.

Three comments, and then some options we’ve considered.
Comment 1: Despite its many shortcomings, the medical literature is used to discretized exposures.
Comment 2: We are aiming for a clinical journal, so the narrative needs to be understandable.
Comment 3: The analysis is, in R, mgcv::gam(binary_disease ~ s(lab, bs=“cs”) + covariates, family = “binomial”) or alternatively, a Bayesian analysis with brms::brm and a similar formula.

Options we’ve considered:

Show that the derivative of the spline (calculated approximately using (f(x+\epsilon) - f(x))/(\epsilon) is significant across the entire relevant range. This is problematic because the choice of epsilon impacts the significance.
Choose a reference value at the bottom of the relevant range, x_0, and do something similar to the discretized analysis, just on a continuous spline. That is, present f(x_1) - f(x_0), then f(x_2) - f(x_0), etc. This has the disadvantage of not emphasizing the dose-response relationship.
Settle for showing the figure, which is clearly up-sloping with a tight confidence/credible interval. This will probably not get past a statistical reviewer, and does not really conform to the STROBE guidelines.

Reading my question now, I realize this is a basic issue of how best to present a spline as the primary exposure, but I could find no answer.

Thanks in advance,
Noam Barda

f2harrell · November 8, 2020, 10:48pm

This is a really good question and thanks for setting it up so well.

Statistical tests should account for how many opportunities you gave the relationship to be non-flat. This is easy to think about with a parametric spline function, and is equal to the number of knots minus a constant that depends on which type of spline you fit. From this you get a k degree of freedom composite (aka “chunk”) test for association (flatness). This is for a frequentist analysis. For Bayesian models you can compute all kinds of posterior probabilities and provide uncertainty intervals for predicted probabilities, logits, or means. Getting a global assessment that is akin to the chunk test is more challenging.

The main presentation is the fitted equation with uncertainty bands. We have published these types of graphs in medical journals (JAMA, NEJM, Annals of Internal Medicine, …) since the early 1980s so don’t worry about publishing such grants.