Comparing continuous outcome in case-control-style design

mdonoghoe · February 25, 2020, 10:49pm

A colleague has collected data in a case-control(-style) design:

They approached adults who had experienced a particular illness in childhood (“cases”), and separately recruited a similar number of ‘healthy’ adults (“controls”). The controls were recruited so that the two groups had similar distributions of ages and locations, although this was very rough and nowhere close to one-to-one matching.

All participants completed a survey, and this investigator is interested in one particular quality-of-life outcome, to see if there is evidence of a difference between the two groups.

My first thought was that if the outcome is reasonably well-behaved, this could be analysed with a two-sample t-test, and perhaps some efficiency could be gained by adjusting for age and location in a linear regression model.

Does that sound reasonable, or does the sampling design need to be accounted for somehow? Would there be any advantage in adjusting for other factors that may be related to the outcome? They are not necessarily interested in making a causal claim about the impact of the childhood illness, and are aware of the limitations of the cross-sectional data, especially the likelihood of selection bias in this sample.

Thanks in advance for any thoughts.

leonardof · March 6, 2020, 6:12pm

@mdonoghoe, I agree with your first thought. Because you can’t tell exactly who is the control for what case, you don’t need a one-same t-test of the differences. Adjusting for other factors (other than those used for recruitment) depends on subject knowledge, and on how the research question matches the origin of the controls.

Maybe you have already considered this but, just in case: please refrain from calling the design “case-control”. In the case-control design, being a case (or not) is the outcome, not the exposition. Having “caseness” as the exposure is more or less specific to clinical epidemiology, and classical epidemiology terminology may not apply.

pmbrown · March 6, 2020, 8:30pm

maybe consider: Modeling continuous response variables using ordinal regression