Analyzing two different groups of subjects

This question is about what is the correct statistical model to choose. I have a small dataset with two distinct group of participants.
Group 1 :

  • predominantly White
  • Not at risk (lower risk scores)
  • Equal Male female proportion

Group 2:

  • mixture of Hispanics and Blacks.
  • At risk (higher risk scores)
  • 75% are females.

This is a one row per subject dataset, around 500 observations and 10 covariates of interest and one exposure variable, both exposure and outcome are continuous.

The goal is the measure the association between
exposure [continuous data] : which is a suite of chemicals in subjects urine sample
and
outcome [continuous data] : risk of developing a specific cognitive disorder

At first I thought I could do a simple linear random effects model with random intercept for each group. Then I am not sure if a simple random effects model would be adequate to capture all the imbalances in covariates between different groups.

So curious to know what could be a good model to capture the intricacies of such dataset. Thanks as always.

-Sudhi

What’s the analysis goal?

Hi Frank, the goal is the measure the association between
exposure [continuous data] : which is a suite of chemicals in subjects urine sample
and
outcome [continuous data] : risk of developing a specific cognitive disorder

1 Like

Without knowing all the details, I’ll say I’m not very bothered by this imbalance.
Here are some initial thoughts.

First, you will use covariates in your exposure-outcome model, and the regression is likely to adjust for those inferences (this is what regression does).
Second, if you additionally include random intercepts, you additionally account for any other unobserved group-level confounding that might bias your exposure estimates (or account for correlated errors within groups).
Thirdly, if you are still concerned by the difference between groups - try to model it and characterize it. For example, treat it as you would have treat an binary exposure in a causal analysis (or more accurately, how you would approach transportability):

  • You could try to predict group assignment using a classification model (if you fail - that’s good, the groups are relatively comparable with respect to your covariates).
  • Alternatively, try to predict the propensity score for each subject to be in that group, do the probabilities for the two groups overlap? (if so - great they not that distinct).
  • If you are still concerned by the observed differences, you could generate inverse propensity weights with regards to the groups and fit your GLMM/fixed-effects model weighted by those weights. This might further balance you estimation, on top of the covariate adjustment done in your exposure-outcome model.

Fourth, try to run sensitivity analysis on this design decision. Maybe fit two different models, one for each group, does the effect of your exposure differ between groups? does it differ from the effect from a naive model on the consolidated data? does it differ from your random-intercept model?

1 Like

Inverse probability weights increase the variance of the main effect estimator. Sensivity analysis can sometimes add confusion. I like whenever possible to settle on one flexible model that is most likely to fit.

1 Like

@ehudk , Thanks Ehud. @f2harrell , Thanks Frank,