Article(s): Coffee Benefits and Continuous vs. Categorical Variable Analysis

Saturday morning, feeling guilty about making a second (or was it third?) cup of coffee, I was mucking around on PubMed to see if I could see if we really know if it benefits health. You can’t read the news without hearing someone extolling the supposed health virtues of consuming luxury goods like coffee, tea, dark chocolate, red wine… Except not so much red wine anymore, since it seems the medical community finally agreed ethanol causes seven types of cancer, and yes, a little poison is still bad for you – it just looked possibly beneficial in moderation for a long time due to confounding (see also: luxury goods).

So I had in mind a recent Guardian piece on possible health benefits of coffee-drinking (1). (But I can only include two links in this post as a new user, so I’ll number the missing ones in the text and post them in a comment later if people care and that’s allowed.) It’s on this paper: ACP Journals. The authors claim their analysis is novel because it distinguishes unsweetened, sweetened naturally, and artificially sweetened coffee consumption. They find a U-shaped association between coffee-drinking without artificial sweetener and lower risk of death; unsurprisingly, black coffee beats sugared coffee for health benefit.

This raises the question of how the other lit on coffee and health cuts up the data, and if we can see these sweetener sub-stories plausibly hiding in them. Here’s an analysis that uses continuous and then categorical analyses: Association between coffee drinking and telomere length in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial - PMC . It seems fine to call the continuous results null (OR = 1.01, 95% CI = 0.99-1.03) — which could be consistent with the sweetener substories, if a lot of people in the sample used artificial sweeteners. Those do seem to get double-digit use in the relevant (U.S.) population (2). So it seems plausible that sweetener substories may have made some false null results.

But then why also run the categorical analyses here? And what to make of the substantial possible effects they show? As in, OR <3 cups/day vs. none = 1.37, 95% CI = 0.71-2.65; OR ≥3 cups/day vs. none = 1.47, 95% CI = 0.81-2.66… and “in the largest of the four contributing studies, moderate (<3 cups/day) and heavy coffee drinkers (≥3 cups/day) were 2.10 times (95% CI = 1.25, 3.54) and 1.93 times as likely (95% CI = 1.17, 3.18) as nondrinkers to have above-median telomere length, respectively.”

In light of those findings, the abstract’s dismissal of the categorical results as null (“no evidence that coffee drinking is associated with telomere length”) seems wrong. But should we be asking why categorical analyses were done after continuous in the first place?

There’s apparently a possible mechanism for anti-aging effects of coffee/caffeine (3). But, other research says their anti-aging effects work in opposite directions, with coffee lengthening telomeres and caffeine shortening them (4). This is apparently an area of contention, as other studies do suggest anti-aging effects of caffeine itself (5).

It seems we have enough evidence about possible mechanisms to doubt the null that coffee doesn’t affect aging or otherwise have longevity-relevant benefits— but not enough to know which way a caffeine effect should then go. So what do we make of the null continuous findings, the choice to also analyze categorically, and the resultant substantial possible effects? Does this point to broader possible consistency in the literature, where we need to analyze smaller subgroups (e.g., black/sugar/artificially sweetened, # of cups…) to see real effects, and the sweetener sub-story is part of that striding toward truth with more precision? Or does this look more like a possible sparse cell count issue, where the more you slice up the data into different analytical categories, the less power as well as precision you have – possibly accounting for the divergent results within and across analyses? And, what else is there to learn here about continuous versus categorical variable analysis choices? I’m looking at F2Harrell’s post on “Categorizing Continuous Variables,” thinking, lots of these points seem to apply here… PubMed returns 32 results relevant to #15 there, for coffee and restricted cubic spline. The first ten of those analyses suggest possible benefit.

PubMed returns null results for searches like “coffee experiment anti-aging” and “coffee experiment telomeres.” So I guess not enough people are committed enough to find out what the telomere/other effects of coffee consumption over six months are, to go on or off the stuff at random. It’s almost like this stuff is addictive…


It’s a wonderful question Vera. I think the categorical analysis is a kind of admission by the authors that they mistrust their main analysis. Or they are totally incompetent. The categorical analysis is virtually uninterpretable and is certainly unhelpful and misleading for the reasons you identified.

Were one to have good data on the amount of coffee consumed and the amount of (1) real and (2) artificial sweetener added one could do an “unbiased analysis” with a tensor spline in coffee & sweetener (i.e. include cross-products of individual splines of coffee and sweetener consumed). A 3-dimensional graph with risk on the z-axis would be very informative.


Thanks! Makes sense.

What are the limitations of an “unbiased analysis” with a tensor spline accounting for possible differential effects of dairy/non-dairy creamers and natural/artificial sweeteners? I had to look up tensor (seems like a 3D axis with some beautiful IRL wooden models so you can really “see”) and spline (“a special functioned defined piecewise by polynomials”). I’m having trouble visualizing what this analysis would look like with this addition.

I ask because I was wondering if drinking cream in coffee could hurt cognition (guzzling fat seeming like a bad idea perhaps), looked it up, and it turns out there is some evidence of the opposite: higher dairy consumption and especially dairy fat consumption is possibly associated with lower heart disease risk and better cognition. On the basis of that literature, one would then want ideally to cut dairy consumption categories further between full- and low-fat — unless something (like common presence of “bad fats”/additives in non-dairy creamers) tells us it’s more important to separate non-dairy, dairy, and black categories, and we only have enough power to differentiate that many categories.

It seems like this raises potential degrees of freedom issues statistically (more categories, fewer observations in each cell, less power), sparse theoretical framework issues analytically (we should know what we’re doing more before analyzing, especially if we have to pick analyzing creamer or sweetener and both might matter for various outcomes of interest), and this visual thinking limitation for me in seeing how the dimensions would extend. Am I wanting a better way to visualize multi-dimensional data; and if so, where could I find it?

The effective sample size (e.g., number of outcome events) will dictate whether you can flexibly model two covariates at a time or three. Or how much penalization needs to be used. See the graphics chapter of BBR for an example tensor spline looking a diastolic and systolic blood pressure jointly to predict hemorrhagic stroke.