Spurious relation between biomarker and outcome

Is it possible to induce a correlation between a biomarker level and an outcome if the subjects that are selected have an increased probability of the outcome? For instance, if the cohort is patients undergoing abdominal surgery and the outcome is survival. Those with a lower probability of survival are selected for surgery and those with higher levels of the biomarker are selected for surgery. Do the subjects all have to start the study with the same chance for survival ?

1 Like

If the selection condition is a common effect of intervention and outcome, then yes


Thank you. The case I am looking at is L-Lactate as a predictor of survival in horses undergoing abdominal surgery. Since Im interested in prediction is a DAG useful?

In prognostic studies, confounding and colliding are not relevant as it does not matter if your biomarker is a proxy for anything else etc. So in this case you need not worry about a DAG.

1 Like

Consider having surgery T/F as a covariate.

Its a study that is being referenced in an editorial. I looked at the study more closely and realized that the outcome, survival only had 7 events (deaths) and 63 non events. Amazing that the results are being treated as useful. In my field no one does sample size calculations for prediction models.

1 Like

I presume the large CI for the highlighted predictor is a result of sparse data bias since there were only 4 times the exposure exceeded threshold. The entire methodology seems like a mess.

Deleted based on feedback from @James_Stanley

@s_doi the counts in Table 3 (as posted above) showing e.g. for peripheral venuous lactate (Post) n=4/34 and n=30/34 are the distribution of the dichotomised covariates (hence why the patterning looks complementary across the two rows). They don’t show the count of outcomes (deaths), so can’t calculate the crude OR from the presented table.

Also I thought it would be useful to add a reference to the paper in question, which has the author’s version of the MS in open access on PubMed (I didn’t know it covered veterinary material) in case people want to comment on methodology in detail.

1 Like

Does the low number of events and the fact that there were only 4 subjects with the factor peripheral venous lactate (Post) above the threshold signify that sparse data bias may be an issue?


Thanks for posting the paper; an interesting problem. It’s worth considering what sort of research a group of experienced surgeons might undertake with the sole aim of improving their own practice, and without regard to the perverse demands of formal publication. The collaboration of Kirklin and Blackstone (@f2harrell turned me on to this at one of his short courses years ago) seems like a model. See e.g. Decision-making in repair of tetralogy of Fallot based on intraoperative measurements of pulmonary arterial outflow tract - PubMed.

Well spotted ! I automatically assumed they were cases/n - they are reporting horses over total horses in each category! very odd presentation so please ignore my comments on the OR and the symmetry
Seems only 4 horses had high lactate so that is definitely very sparse data