I’ve been working with Kaplan–Meier (KM) survival estimates and ran into something that seems paradoxical at first:
If I split my sample into strata, compute KM survival separately in each stratum, and then try to combine those stratum-specific survivals I don’t get the same result as if I had computed the KM on the whole data.
My motivation is to extract distribution of TP-FN / FP-TN for time-to-event predictions and splitting them into bins for a given p.threshold / ppcr. It feels more natural to count real-positives / real-negatives and than add them up, but for each p.threshold the estimate of real-posiives is different real-positives / real-negatives within the bins!
Toy example
- Stratum A: 2 subjects
- Stratum B: 2 subjects
Events/censoring:
- t=1: event in A
- t=2: censor in B
- t=3: event in B
4 individuals, 2 strata.
Wrong Calculation: Stratum-specific KMs
- Stratum A: 1 event at t=1 with n=2 → survival drops to 1 - 1/2 = 0.5. No further events → S_A(3)=0.5.
- Stratum B: censor at t=2 (no effect on survival), then 1 event at t=3 with n=1 → survival goes to 0. So S_B(3)=0.0.
Baseline-weighted average:
With baseline weights 0.5 each (2/4 per stratum):
0.5 * 0.5 + 0.5 * 0 = 0.25.
Correct Calculation: Pooled KM
Ignore strata and compute KM on all 4 subjects.
- At t=1: 1 event / 4 at risk → decrement factor = 1 - 1/4 = 0.75. Survival = 0.75.
- At t=3: 1 event / 2 at risk → decrement factor = 1 - 1/2 = 0.5. Survival = 0.75 * 0.5 = 0.375.
So S_pooled(3) = 0.375.
- Weighted average of stratum KMs at t=3: 0.25
- Pooled KM at t=3: 0.375
If I understand correctly it has something to do with the way KM borrows information about censoring: The information about censored observations only come into play when a new event is introduced.
Any thoughts? ![]()