Association of continuous outcomes with binary endpoints; and "optimal" cut-off points

Hello, everyone. A frequent type of problem that I deal with in my work can be characterized as follows: 1) I have some observational data that is obtained from various hospital records, 2) The primary exposure of interest is some intraoperative continuous measure such as heart rate or blood pressure, 3) The outcome is some sort of postoperative complication (e.g. postoperative delirium) and is usually binary, 4) The researchers are usually interested in characterizing the association between these and also finding an “optimal” cut-point.

The sample size is usually quite large, so I suggest using regression adjustment (instead of PS methods) and also use restricted cubic splines to allow for non-linear relationships. One problem here is that it is not obvious how the exposure variable should be summarized i.e. should I use min. mean arterial pressure (MAP), mean MAP, max MAP etc.? (Or perhaps all of them?)

Another problem arises when dealing with “optimal cut-points”. So, for example, the researchers might want to say that time-weighted-average (TWA) MAP < 50 mmHg is associated with increased risk of postoperative kidney injury. As I understand it, optimal cut-points don’t really make sense in a multivariate setting. Furthermore, you’re necessarily throwing away a lot of other data because most patients might not fall below 50 mmHg. So what do you do when a researchers requests an optimal cut-point?

To summarize, I have two main questions: 1) What is the best way to summarize the exposure variable and characterize its association the binary outcome? 2) How to deal with requests for optimal cut-points?

Apologies if this has already been answered satisfactorily elsewhere (my google search yielded many useful threads, but perhaps none that exactly answered my questions.) General advice, links to previous discussion, exemplary papers dealing with such analyses etc. are all welcome. Thank you!

Have you found Dr. Harrell’s Biostatisics for Biomedical Research (aka BBR - link)?

He addresses some of your concerns in chapter 18 – Information Loss.

1 Like

As shown in that chapter, it is not valid to seek a cutpoint in that context. Your job is to talk an investigator out of that by getting them what they need and not what they want.

The other problem is more interesting: how to deal with multiple baselines. With such a large sample size, and assuming you want to make use of historical data within patients, I suggest a landmark analysis. For example, take all patients who had systolic blood pressure (SBP) measured at least once within each of the past 4 years. Compute the 4 yearly averages . Use those as 4 predictors. Analyze the slopes of those predictors (assuming linearity for now). You’ll typically find that the most recent SBP is the most important but can handle any trajectory of impact of SBP on future complications.

1 Like

Hello stat_guy - I have exactly the same conversations (including on kidney injury!).
I begin by trying to get the researchers to define what they mean by “optimal” in a clinical sense. Sometimes I find they are just caught up in the idea of trying to “optimise” some relationship between Sensitivity and Specificity, yet if they do, the threshold derived is of no clinical relevance at all. ie they are not really optimal.
What is more likely of relevance is something like a v high sensitivity or v high specificity (or both, ie two thresholds) which may be used to guide clinical decision making. So, some of my first questions are aimed at finding what is clinically relevant. eg, with Acute Kidney Injury there isn’t a lot “positive” that can be done but for those at greater risk, nephrotoxins may be avoided. But if these are less efficacious for other reasons there is a balance between efficacy for condition X with harm to kidney’s. Find out how they express this and try and find the relevant statistic to target for a threshold(s).

If you can get to this point, then perhaps discuss how a multivariable model which generates probabilities is going to be better than focussing on a single cut-point of one variable. If you run across the - “but it must be simple” argument, then suggest that they all have computers in their pockets or on their desks which can generate the output of a model. They no longer have to rely on pen and paper :).

As for summarizing an exposure variable I tend to do this graphically with the y axis being various metrics clinicians are familiar with - Sensitivity etc & x-axis all possible thresholds. I also produce graphs of the number of subjects below/above each threshold as this too is relevant to decision making. Decision curves are good too (though I am not very good at explaining them yet).

Sorry if this is all obvious … just some musings from my experience.

(by the way, I prefer not to put composite variables like MAP or eGFR into models, rather the individual components if possible.)