Negative correlation, but positive effect in multilevel regression


I am doing a longitudinal multilevel analyses.

I am researching the impact of certain restricting abortion policies of US states on the number of abortions. Furthermore I control for the characteristics of these states.

One of these characteristics namely poverty rate of a state, has a negative significant correlation with number of abortions. (So a higher poverty rate in a state, correlates with lower rates of abortions)
Yet when introduced as a controlvariable in the multilevel regression together with others like median income, %black etc. the regressioncoefficient is significant and positive, so contradicting my correlation result.

I dont really know what to think of this result, as it contradicts all my other conclusions.
Namely, that poorer states, will have less abortions, as they are more prone to vote Republican, and these administration are more prone to implement restricting abortion policies. So the effect of the abortion policies does not stand on its own, but is embedded in the social structure of the state. (because when controlling for characteristics of a state, the negative effect of restriciting abortion policies on the number of abortions dissapears)

How come my effect changes from negative to positive?
Tell me if you need more information, I’ll add it!

Thank you!!!

what are the cluster factors? are any covariates on the cluster level? this may not be relevant but i recall this interesting (and mildly entertaining) exploration of an error in an analysis: editorial: broken hearts: “… According to Nichols and Schaffer, “when fixed effects and clustering are specified at the same level, tests that involve the fixed effects themselves are inadvisable (the standard errors on fixed effects are likely to be substantially underestimated, though this will not affect the other variance estimates in general).””

1 Like

I dont really understand what you mean unfortunately.

My first level is state per year, and my second level is state, in the multilevel analyses. (longitudinal study)

I thought it may be because of suppresion. But am then not sure what output I should follow. Does poverty then have a negative effect on number of abortions, but when controlling for the other factors it has a positive effect?

youre using a GEE model? id first plot poverty rates v rates of abortions, just to get a feel for the data and variation among states and try to understand it, as you say: the effect may be an artefact of policy. I wouldnt be quick to produce a p-value, im not very familiar with these complex sociological studies
edit: povery rate and median income are correlated variables ?

Maybe Simpson’s paradox can explain it.

1 Like

I would highly recommend reading chapters 5 and 6 of Richard McElreath’s Statistical Rethinking. It’s a great introduction to causal inference and the unexpected things that can happen when certain variables are included or not included in a multivariable regression model.

Additionally, this paper describes why interpreting the regression coefficients for variables which are not your exposure of interest (in this case restrictive abortion policies) can be unreliable. The regression coefficients for your ‘control’ variables often don’t have the same interpretation as your exposure variable.


Came to support @lachlan
If you are trying to answer multiple questions, you should think of the causal relationships between the variables separately for each question. This might result in different sets of adjustments necessary to get an (hopefully) unbiased answer. You can use the online tool DAGitty to develop a visual model of causal relationships between your variables and proceed from there to decide on adjustments for seperate questions.

1 Like

perhaps this paper is useful: Does Home Health Care Increase the Probability of 30-Day Hospital Readmissions? Interpreting Coefficient Sign Reversals, or Their Absence, in Binary Logistic Regression Analysis
“Data for 30-day readmission rates in American hospitals often show that patients that receive Home Health Care (HHC) have a higher probability of being readmitted to hospital than those that did not receive such services, but it is expected that when control variables are included in a regression we will obtain a “sign reversal” of the treatment effect.”
I can’t access the full paper though …

1 Like