Yup. You’ve bumped into a classic: the Kaplan–Meier (KM) estimator is non-collapsible. A baseline-weighted average of stratum-specific survivals will not equal the pooled KM, except in special (rare) cases. The reason is that KM is a product of time-local risk‐set fractions, and those risk sets change differently by stratum as censoring and events accumulate. Linear averaging can’t reconstruct that product.
For clinicians we provide an oncology example of this in Section 2 and Appendix A here. In the methodology literature, great references here and here. It is also at the heart of this very vigorous DataMethods discussion.
This mathematical artifact whereby the whole is not the sum of its parts (or what we think are its parts) vexes our intuition because such strong emergent properties are typically found in magic but not observed in the scientific world. What we do know about the laws of nature to date is consistent with weakly emergent phenomena (like conductivity or temperature), but the whole is always the sum of its parts. Hence why we need to be so very extra careful when looking at forest plots etc on the hazards ratio scale trying to fish for subgroup effects.