Using robcov() to adjust for donor-level clustering in kidney transplant outcomes?

Samuel_Tingle · July 6, 2025, 7:27am

I am doing a study to assess the association between transplant factors and kidney transplant outcomes (specifically survival of the transplanted kidney) using US registry data. I’m using cph().

A key issue is that when looking at outcomes, most recipients are not truly independent as they are clustered by donor. Specifically, in our analysis we have 57,153 donors where both kidneys were transplanted (into different recipients) and 39,389 donors where only one kidney was transplanted into a single recipient. So our analysis on recipient outcomes has n=153,695 recipients, and 96,542 donor clusters (each of which has either 1 or 2 recipients).

The focus of the research study is to assess if how the kidney was preserved impacts on kidney graft survival, and I adjust for several donor and recipient factors in the model.

This is how I am currently using robcov to get cluster robust SE (from which I am calculating 95% confidence intervals)


fit <- cph(Surv(time, event) ~ var1 + var2 + var3, data, x = TRUE, y = TRUE)

fit_robust <- robcov(fit, cluster = data$donor_id)

My understanding is that this uses the Huber-White method for updating the variance-covariance, which will increase the variance due to the clustering. I’ve seen other papers in the field use frailty terms (or other mixed effect models for non-survival outcomes). However, I think that robust SE using robcov() is probably a better solution here.

Is this a valid approach, given the large number of clusters, but small cluster size (either 2 or 1 per cluster)? Would the same approach be valid for logistic regression and linear (fit with ols())?

davidcnorrismd · July 6, 2025, 9:20am

How many of the 39,389 were living donors? What reasons explain transplanting only 1 kidney from cadaveric donors?

f2harrell · July 6, 2025, 11:35am

The cluster sandwich estimator works best when there is a large number of small clusters. So no problem there. But the frailty approach is probably slightly better as it adjusts the fixed effects for cluster heterogeneity.

Relate to David’s excellent question, having donor alive/dead as a fixed effect in the model seems necessary.

Samuel_Tingle · July 6, 2025, 1:20pm

Thank you for the responses! All of the donors in the study are deceased donors, there are just a number of donors where only one kidney was transplanted (lots of reasons for this).

I did try to build some frailty models (which I don’t think are supported in rms), but really struggled with convergence… I think because of the large number of clusters, and fact that many clusters only have a single observation.

f2harrell · July 6, 2025, 2:47pm

I assume you used survival::coxme which is well tested, so I’m surprised you have trouble in this ideal situation. It may be worth writing Terry Therneau.

Samuel_Tingle · July 7, 2025, 7:54am

Thank you for the suggestion.

Part of my reason for moving to the robcov() approach was concern about having so many clusters with a single participant. I’ve also read a lot of your work which often advocates against frailty / random intercepts models, and others have recommended avoiding such models when you have lots of singleton clusters (in this case we have 39,389 of 96,542 with only a single recipient).

I think I am missing some nuance?

f2harrell · July 7, 2025, 11:22am

We know that singleton clusters don’t harm sandwich estimators. I’d like someone to find a reference where singletons has been studied in random effects models. As long as there are many non-singletons I’m 0.9 sure that random effects models work very well in this setting. They have the advantage of adjusting \beta and not just doing after-fitting standard error calculation. My disdain for random effects models pertained to a different setting: longitudinal data.