I had a ‘quick’ question about bootstrapping in logistic regression. From my very basic understanding (which may be wrong) its a method of resampling your data to give you more accurate confidence intervals. That being the case, why is it not the default option in SPSS for regression modelling (as opposed to being an optional extra) and should I be doing it routinely? I’m doing some work looking at predicting patient outcomes based on trauma triage criteria
That’s a great question Dr Nnajiuba - and one I have pondered over myself. I do hope someone from the community can shed some light on this.
Hello Henry - welcome to the discussion board.
I can’t help you with SPSS… I “converted” to R many moons ago. However, I regularly use bootstrapping in logistic regression and other scenarios - ie, almost “routine”.
I see you also work with emergency department data. The clinicians I work with are very risk averse (good ), so, I often want to calculate CIs for statistics (Sensitivity, NPV etc) near 100%. I have a bit more confidence with bootstrapping in such situations than other methods of calculating CIs. However, I’ve noticed in the literature once or twice that when people have a 100% sensitivity (say) point estimate and use bootstrapping they give the CI as 100-100, which I think is just nonsense. In this case the Wilson-score or other methods may be more appropriate.
Thank you very much for your quick response. So are you essentially saying that bootstrapping gives you more accurate confidence intervals?
Certainly better than p +/- zsqrt(p(1-p))/n when considering very high or low probabilities. I think there are many people on this board better than me to talk about whether it is more accurate than other methods or not. My understanding is that bootstrapping more closely reflects the “real” randomness & hence uncertainty in results that we are trying to convey with a confidence interval than other techniques.
When the predicted risk comes from not a simple proportion (for which I’d use the Wilson CI), the bootstrap is often no more accurate than just using the Wald-based regular CI with the logistic model. This involves getting the standard error of the predicted logit, getting a CI for that, and then logistic transforming the two CI endpoints.