Survey Non-Response Weighting Stata

Hello Everyone,

My question is very specific and it looks towards adjusting for non-response in a survey that has no design weight (or any weight for that matter). I need help in finding out how to solve this problem using stata and was wondering if anyone of you could kindly paste an example from one of their work where they used stata to adjust for unit non-response.

The dataset I have is of a government training program that that has around 127000 observations in total. These 127000 trainees were then samples using a SRS on gender whereby we drew around 6700 observations. When the team was sent out to survey these 6700 people, around 4200 responded while the remainder were either not available or refused to respond.

I am really stuck and getting confused each time i browse the internet for help as there is hardly any method out there performed in stata which I can use to understand how someone rightly solved this problem. I know this sounds desperate but I really need help for this.

Hope to hear from the community.

PS: I know this is non health related however the method I am looking for probably works in every case which is why I am thought of asking the question here

SRS = stratified random sample?

And the core of your question is to adjust for non-response right? So you are assuming your non-responders have different characteristics than the responders (which is likely true). As you don’t have any weights (yet), do you have any information on characteristics (e.g. sex, age, education, anything?) from both the responders and non-responders? Or was the goal of the survey to collect various characteristics and you now only have them in the responders?

1 Like

Hello scboone (not sure about your real name)

Yes, the sample was a stratified random sampling (gender-wise).

The core is definitely to adjust for non-response. I do have information for both responders and non-responders which includes age, sex etc but not marital status. I am trying to use stata to form these weights and I have been searching online for a concrete method for going about this. If you want I can somehow send you the datafile which you can also have a look at but at the moment I need to understand exactly how to go about adjusting for non-response.

I have read on post-stratification which is a calibration for non response weights but to calibrate the method in use talks about using logit to determine propensity score and take its predicted inverse. However, the clear steps are not provided and it will be wonderful if you can kindly help me out.

Thank you

Sorry for bumping this post, just thought I ask if you might have figured out something

I guess post-stratification is a good option. All you have to do is use response (No=0, Yes=1) as the outcome in a logistic regression model. The model should include all the variables you have both for the responders and non-responders (age, sex, etc). After fitting the model, predict the probability of response § for for each individual. Then take 1/P as the weight for responders and 1/(1-P) as the weight for non-responders. Once you have the weights, verify they sum up to the total of your observations (6700). To do the analysis, svyset your data and use svy: [whatever command you need here] [var: whatever variable you need here]. I haven’t done this in a long time. Thus, I suggest you check my definition of the weights.

If this seems to complicated, and you have the age, gender, whatever other-variable(s) distribution for the whole population (the 6700 or the 127000) you can just do a direct standardization. Actually, that’s what you are doing above.

I hope this helps.