Choice of variables to include in regression model specification and risk of bias assessments

The objective of this (revised) blog is to ask three questions:

  1. What published studies provide exemplary rationale to support choices of variables for a conceptual framework to support regression analysis?
  2. Given definitions of confounding [1-3], why don’t observational studies provide support for both confounder-outcome associations and confounder-exposure associations in their rationale?
  3. Does any study provide a risk of bias assessment of the studies used to define its conceptual framework?

Other authors [4] have advocated using quality assessments of systematic reviews to support variable choices using the Risk of Bias in Systematic Reviews (ROBIS) tool [5]. While they indicate that assessing studies for bias can be time intensive, should these assessments be essential? If evidence to support construction of a conceptual framework is biased, then the resulting framework will also be biased.

My research question is in the context of a lower middle-income country where I found only eight published studies to justify my conceptual framework. However, they were at high risk of bias for several reasons.

What criteria inform variable selection?
Neglecting prespecification of variables risks drawing inappropriate conclusions from regression models [6, 7] (for example [2]). Where available [8], several sources of information can inform decisions about which variables to choose [4, 9-11]:

  • Published literature (meta-analyses, primary studies, grey literature)
  • Similar datasets
  • Accepted theories
  • Current hypotheses
  • Expert opinion (clinicians, statisticians, and health systems experts)
  • Known constraints on model parameters

Diagram Based Analysis of Causal Systems (DACS) provides a useful framework and practical considerations for this identification process [11], and Directed Acyclic Graphs (DAGs) are useful to ensure inclusion of confounders and exclusion of mediators to estimate the full effect of an intervention [1-3].

What examples exist of well-documented rationale for pre-specified variable selection?
Historically few observational studies provided rationale for conceptual framework [12], although more are now including rationale for confounding [13] (also for randomized controlled trials prespecifying adjusted analyses [14]). Between 2010-2012, 25% of one set of studies provided rationale for selecting potential confounders, and 40.0% gave reasons for including confounders in the final model. However, only 0.9% of these studies from the latter period included a causal diagram [13]. Individual studies provide examples of various strengths such as reporting the universe of considered variables [11], soliciting expert opinion [15], application of a DAG framework [2], supporting cofounder-outcome relationships [16, 17], and combining a priori statements of model structure with a posteriori testing for model building [11]. In my opinion, the following elements represent a full set of rationale to support pre-specified rationale for regression modeling.

  • Prespecified hypotheses for each exposure-outcome relationship
  • Evidence supporting that each confounder causes the outcome
  • Evidence supporting that each confounder is associated with the exposure, without being caused by the exposure
  • A DAG showing the relationships between variables
  • A risk of bias assessment of the studies used to support the DAG
  • A list of variables that researchers considered but chose not to include with rationale
  • Rationale for sensitivity analyses on alternative model specifications

Given the pace of current research, concerns about the time investment required to produce these components are legitimate [4]. Resources are not infinite for any research study and researcher talent improves as our careers develop. However, we should consider whether the prevalence of biased research in publication is acceptable [18, 19], and whether second-order peer review is efficient [20].

Why is risk of bias assessment necessary?
Risk of bias assessments are an essential component of systematic review and meta-analysis [21], and have been applied to evaluations of high-profile journal publications [12, 13]. However, it is difficult to find similar assessments of studies used to define conceptual frameworks for regression analysis. Why not? If rationale to define relationships within a DAG are not accurate, then conceptual frameworks will be biased.

What tools are available?
Risk of bias assessment tools are proliferating for non-randomized studies [21, 22], randomized trials [23, 24], quasi-experimental studies [25], predictive modeling [26, 27], systematic reviews [5], and other applications. These tools represent quality of conduct tools, where the EQUATOR group provides useful checklists for assessing reporting quality [28].

Importance of sensitivity analysis
Explicitly reporting rationale to support pre-specified associations and risk of bias of supporting studies highlight that evidence for some associations will be stronger than for others. For example, evidence to support a variable may be equivocal with some studies supporting an association but not others. This circumstance increases the importance of sensitivity analysis [9], although few studies report alternative specifications. My strategy is to test:

  • All variables well-supported by theory (full effect)
  • A full model, adding variables that are equivocal to the previous model
  • A full model subtracting variables that are measured by proxy
  • All supported by theory direct effect
  • Full model adding variables that are equivocal (direct effect)
  • All variables and targeted interaction terms of interest

Harrell [29] has provided several useful ideas for which interaction terms to consider, and I have ordered a copy of Chatterjee and Hadi for other ideas [30].

Researchers should check the strength of rationale for specifications used by previous research before testing them in sensitivity analysis. Caution should guide conclusions about consistency as ‘claimed research findings may often be simply accurate measures of the prevailing bias’ [18].

A step forward
It will be interesting to see if methods to automate elements of risk of bias assessments proliferate and make this process more efficient [31].

Some useful resources
Miguel Hernán provides several free resources on his website:

Thank you for any feedback you can offer.


  1. Greenland S, Pearl J, Robins JM (1999) Causal diagrams for epidemiologic research. Epidemiology 1:37-48

  2. Hernán MA, Hernández-Díaz S, Werler MM, et al (2002) Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 155:176-184

  3. Rothman KJ, Greenland S, Lash T, Modern epidemiology (2008) Wolters Kluwer, Baltimore

  4. Bero L, Chartres N, Diong J, et al (2018) The risk of bias in observational studies of exposures (ROBINS-E) tool: concerns arising from application to observational studies of exposures. Syst Rev 7:242

  5. Whiting P, Savović J, Higgins JP, et al (2016) ROBIS: a new tool to assess risk of bias in systematic reviews was developed. J Clin Epidemiol 69:225-234

  6. Harrell Jr FE What are some of the problems with stepwise regression? Stata 1996.

  7. Harrell Jr FE, Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (2015) Springer-Verlag, New York

  8. Royston P, Sauerbrei W, Multivariable model‐building. A pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables (2008) John Wiley & Sons, Ltd., Chichester

  9. Chatfield C (1995) Model uncertainty, data mining and statistical inference. J Royal Stat Soc A 158:419-444

  10. Burnham KP, Anderson DR, Model selection and multimodel inference: a practical information-theoretic approach (2003) Springer Science & Business Media

  11. Rehfuess EA, Best N, Briggs DJ, et al (2013) Diagram-based Analysis of Causal Systems (DACS): elucidating inter-relationships between determinants of acute lower respiratory infections among children in sub-Saharan Africa. Emerg Themes Epidemiol 10:13

  12. Pocock SJ, Collier TJ, Dandreo KJ, et al (2004) Issues in the reporting of epidemiological studies: a survey of recent practice. BMJ 329:883

  13. Pouwels KB, Widyakusuma NN, Groenwold RH, et al (2016) Quality of reporting of confounding remained suboptimal after the STROBE guideline. J Clin Epidemiol 69:217-224

  14. Ciolino JD, Palac HL, Yang A, et al (2019) Ideal vs. real: a systematic review on handling covariates in randomized controlled trials. BMC Med Res Methodol 19:136

  15. Connors AF, Speroff T, Dawson NV, et al (1996) The effectiveness of right heart catheterization in the initial care of critically III patients. JAMA 276:889-897

  16. Chesson H, Owusu-Edusei Jr K (2008) Examining the impact of federally-funded syphilis elimination activities in the USA. Soc Sci Med 67:2059-2062

  17. Chesson HW, Harrison P, Scotton CR, et al (2005) Does funding for HIV and sexually transmitted disease prevention matter? Evidence from panel data. Evaluation Review 29:3-23

  18. Ioannidis JP (2005) Why most published research findings are false. PLoS medicine 2:e124

  19. Smith R (2006) Peer review: a flawed process at the heart of science and journals. J Royal Soc Med 99:178-182

  20. Haynes RB, Cotoi C, Holland J, et al (2006) Second-order peer review of the medical literature for clinical practitioners. JAMA 295:1801-1808

  21. Sterne J, Hernan M, McAleenan A, et al, Chapter 25: Assessing risk of bias in a non-randomized study (2019) In: Higgins J, Green S (eds) Cochrane handbook for systematic reviews of interventions

  22. Sterne JA, Hernán MA, Reeves BC, et al (2016) ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355:i4919

  23. Sterne JA, Savović J, Page MJ, et al (2019) RoB 2: a revised tool for assessing risk of bias in randomised trials. BMJ 366:1-8

  24. Higgins JP, Savović J, Page MJ, et al, Assessing risk of bias in a randomized trials (2019) In: Higgins JP, Thomas J (eds) Cochrane Handbook for Systematic Reviews of Interventions, p. 205-228

  25. Waddington H, Aloe AM, Becker BJ, et al (2017) Quasi-experimental study designs series—paper 6: risk of bias assessment. J Clin Epidemiol 89:43-52

  26. Wolff RF, Moons KG, Riley RD, et al (2019) PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Inter Med 170:51-58

  27. Moons KG, Wolff RF, Riley RD, et al (2019) PROBAST: A Tool to Assess Risk of Bias and Applicability of Prediction Model Studies: Explanation and Elaboration. Ann Inter Med 170:W1-W33

  28. EQUATOR Enhancing the QUAlity and Transparency Of health Research, 2019.

  29. Harrell Jr FE, Lee KL, Mark DB (1996) Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15:361-387

  30. Chatterjee S, Hadi AS, Sensitivity analysis in linear regression (2009) John Wiley & Sons

  31. Marshall IJ, Kuiper J, Wallace BC (2015) Automating risk of bias assessment for clinical trials. IEEE Journal of Biomedical Health Informatics 19:1406-1412