RMS Titanic Binary Logistic Case Study

Regression Modeling Strategies:

This is the 12th of several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.

Overview | Course Notes

Elinder & Erixson (2012, PNAS) provide the data on 18 maritime disasters since 1855, including the Titanic. They provide the data, over 15,000 cases & 17 variables (free at PNAS: https://www.pnas.org/content/109/33/13220). The Titanic is one of two disasters (the HMS Birkenhead is the other) in which female survival is greater than males From the abstract: “Women have a distinct survival disadvantage compared with men. Captains and crew survive at a significantly higher rate than passengers….Taken together, our findings show that human behavior in life-and-death situations is best captured by the expression “every man for himself.””

Additional links

RMS12

Q&A From May 2021 Course

  1. Table 9.2 Lines: can you comment/explain how to properly interpret each line and what information these give to understand the model?
age x sibsp (Factor+Higher Order Factors) 10.99 4 0.0267

Nonlinear 1.81 3 0.6134

Nonlinear Interaction : f(A,B) vs. AB 1.81 3 0.6134

1.81 = 3 d.f. Chunk test for the nonlinear interaction effects; In this case this measures the departure of interactions from being a simple product. 10.99 is the chunk test with 4 d.f. For all interaction terms involving age and sibsp. There is an option for the anova print method to have it tell you exactly which parameters are being tested.

→ extra Q: What are the hypotheses here? For nonlinear and nonlinear interaction (and also for TOTAL NONLINEAR, TOTAL INTERACTION). How should we interpret the p-values here?

  1. Suppose I had a calibration plot looking like this:

Should I be concerned? Or is the lack of calibration at higher predicted probs OK as there are few data points there? Is there anything that can be done to improve the situation? Not too concerned. Mainly be concerned when prob = .55 .7 where you still have some data.