Handling non-response in confounders measured by questionnaire

In cohort studies which collect information on confounding variables by questionnaire, there are occasionally options provided like ‘prefer not to answer’. For example, asked about their income (a set of ordinal responses), some participants will chose this ‘prefer not to answer’ response. In this case, it’s most likely that ‘prefer not to answer’ is associated with true (unreported) income.

My question is - how ought one handle these responses when wishing to adjust for variables such as these (via regression)? I.e., where the goal is to have as complete control of confounding as possible.

I can think of a few possible options:

  1. Include ‘prefer not to answer’ as its own category. This has the unfortunate consequence of meaning ordinal variables will need to be modeled as categorical (rather than, e.g., monotonic). Secondly, because these responses are relatively infrequent, sparse data bias and positivity violations could occur.
  1. Consider these responses missing data and Impute them based on answers provided to other questions. For e.g., income could be imputed based on education and neighbourhood socioeconomic status.
  2. Consider these responses missing and perform a complete case analysis. Probably the less compelling of the options but one I’ve seen used.

I would appreciate any suggestions or references on this topic. Thanks!

it might depend on how much missing there is, i.e. if it is not more than 10%, say, then maybe you do 3 as the primary analysis and 2 as a sensitivity analysis

1 Like