I appreciate the time you took to write up that simulation code to demonstrate the most charitable case for the linear probability model.
My criticism is similar to my perspective on the simulations trying to justify the use of metric models on ordinal data: It is the least favorable distribution that is relevant, because the assumption of normality (of the confounder in your example) cannot be verified in practice.
Even proponents concede it generates impossible predictions and fall back on the “interpretable inference” argument. Think about trying to perform a research synthesis on a set of papers that use a method that does not claim to generate accurate predictions. No scientific community that instituted any incentives for reproducibility (ie. betting on future replications) would converge upon the linear probability model.
In other posts, Giles points out even if one desires a marginal estimate, the LPM cannot identify the correct parameters if there is even one error in any of the classifications of Y.
Quote:
So, ask yourself the following question:
“When I have binary choice data, can I be absolutely sure that every one of the observations has been classified correctly into zeroes and ones?”
If your answer is “Yes”, then I have to say that I don’t believe you. Sorry!
If your answer is “No”, then forget about using the LPM. You’ll just be trying to do the impossible - namely, estimate parameters that aren’t identified.
And that’s not going to impress anybody!
Further Reading