RMS Titanic Binary Logistic Case Study

Regression Modeling Strategies:

This is the 12th of several connected topics organized around chapters in Regression Modeling Strategies. The purposes of these topics are to introduce key concepts in the chapter and to provide a place for questions, answers, and discussion around the chapter’s topics.

Overview | Course Notes

Elinder & Erixson (2012, PNAS) provide the data on 18 maritime disasters since 1855, including the Titanic. They provide the data, over 15,000 cases & 17 variables (free at PNAS: https://www.pnas.org/content/109/33/13220). The Titanic is one of two disasters (the HMS Birkenhead is the other) in which female survival is greater than males From the abstract: “Women have a distinct survival disadvantage compared with men. Captains and crew survive at a significantly higher rate than passengers….Taken together, our findings show that human behavior in life-and-death situations is best captured by the expression “every man for himself.””

Additional links


Q&A From May 2021 Course

  1. Table 9.2 Lines: can you comment/explain how to properly interpret each line and what information these give to understand the model?
age x sibsp (Factor+Higher Order Factors) 10.99 4 0.0267

Nonlinear 1.81 3 0.6134

Nonlinear Interaction : f(A,B) vs. AB 1.81 3 0.6134

1.81 = 3 d.f. Chunk test for the nonlinear interaction effects; In this case this measures the departure of interactions from being a simple product. 10.99 is the chunk test with 4 d.f. For all interaction terms involving age and sibsp. There is an option for the anova print method to have it tell you exactly which parameters are being tested.

→ extra Q: What are the hypotheses here? For nonlinear and nonlinear interaction (and also for TOTAL NONLINEAR, TOTAL INTERACTION). How should we interpret the p-values here?

  1. Suppose I had a calibration plot looking like this:

Should I be concerned? Or is the lack of calibration at higher predicted probs OK as there are few data points there? Is there anything that can be done to improve the situation? Not too concerned. Mainly be concerned when prob = .55 .7 where you still have some data.

dear prof
I would like to know the option for that in anova print method.
thanks in advance

?print.anova.rms    # shows help file; see the argument called which
1 Like

Dear Professor Harrell,

I am trying to reproduce the code from cap 8 of your amazing book (case study on Titanic binary logistic regression).
Unfortunately, when I try to run this code chunk:
runmi ← function()
fit.mult.impute(survived ~ sex * pclass * rcs(age, 4) + rcs(age, 4) * (sibsp + parch),
lrm, mi, data=t3, pr=FALSE, lrt=TRUE)
seed ← 17
f.mi ← runifChanged(runmi, seed, mi, t3)

I get the following error:
Error in runifChanged(runmi, seed, mi, t3) :
attempt to run runifChanged without file= outside a knitr chunk.

could you help me? I am actually interest in fitting a cox regression model. In this latter case, would it work to change lrm with cph and swapping ‘survived’ with a Surv object?

Let me thank you for your help. I am apologize in advance in case the question is not posted correctly

This is one of those cases where my error message actually tells you what’s going on :slight_smile:

You are running the code interactively and not using knitr or quarto to run the whole document. For that cast you need to specify file= to runifChanged, e.g., file="my.rds".

The Titanic dataset does not have survival time; it only has survival as yes/no, so a survival model is not appropriate.

1 Like

Dear Professor,
Thank you very much, that worked perfectly for the titanic dataset!
Yes, my goal was indeed to translate the procedure applied in the titanic case study to a survival analysis. I would have used the case study on cox regression but in the latter only single imputation via transcan is used and in my dataset probably multiple imputation is advised.

That being said, I have tried to adjust the code for a survival analysis (like the prostate database) using cph instead of lrm.
To be more clear:
Model: 10 predictors (1 continuous and 9 factors) + time + binary outcome
S1 is the surv object from (time, outcome =1)
Example code:
set.seed(17) # so can reproduce random aspects
mi ← aregImpute(~ predictors + I(time) + outcome,
data=t3, n.impute=20, nk=3, pr=FALSE)

runmi ← function(){
fit.mult.impute(S1 ~ predictors,
cph, mi, data=t3, fitargs = list(x=TRUE, y=TRUE), pr=FALSE)
seed ← 17
f.mi ← runifChanged(runmi, seed, mi, t3, file = “my.rds”)

could this be ok?, should i specify surv=true?

I can move this post to another tread if required
Thank you very much

That looks OK I think but my functions have only thought through the \beta part of the process, not the survival curve part. You may want to estimate the underlying survival curve by stacking all the imputed datasets. See this.

Thank you very much! Am I getting it right that I should use vClus to stack the imputed data sets and then draw the survival curve? Is this made within vClus or giving vClus output to survplot?

Thank you very much

You’ll probably have to look inside the functions I’m calling to see how it’s done.