Censored binomial models

Hi all,

Suppose I have the following problem:

  1. I have 100 surgeons, each of whom perform N{i} procedures in a given year to treat a particular condition.
  2. For our particular condition, they have the option of performing 2 procedures: A vs B
  3. I am interested in modelling the probability that a given physician would choose to perform A rather than B based on a given set of predictor variables.
  4. I can not simply use a binomial model/classic logistic regression because the database censors all records of a surgeon doing 10 or fewer procedures per year. For example, if surgeon performs operation A 16 times and operation B 7 times, I would be able to see that they did operation A 16 times but I would only know that they did operation B 10 times or less. Only surgeons with N of 11 or more are included, so you always know at least one of A or B. This sort of censoring is often done when there are concerns re: confidentiality on public databases.

One sensible approach I’ve found is outlined here. The approach basically consists of using the standard binomial likelihood for all exact observations (in this case, anything ≥11) and using the cumulative distribution function for all censored observation (<11).

Another intuition that I had (which I suspect is wrong) is to just use an ordinal logistic model to directly model the count of procedure A done (where the outcome variable <10, 11, 12, etc.) while controlling for the total number (A + B = N) of procedures performed by a given surgeon. I suspect this is not quite right because adjusting for N it doesn’t factor in the fact that A is always ≤ N.

I’m wondering if anyone has tackled a similar issue before or whether they have any alternative suggestions on how to tackle the problem.

3 Likes

Excellent question. Side comment: the censoring used in the health system reflects a poor understanding of the random effects models commonly used in quality outcomes research whereby providers with small case counts are discounted / shrunk towards the mean.

2 Likes

The derivation of the censored likelihood looks correct, but I wonder in this case, because the physicians have to choose it might be best to approach it via a Bradley-Terry/discrete choice type model. I known in the uncensored case Bradley-Terry ends up being equivalent to logistic regression, provided the winner/binary coding is kept consistent. I wonder if the consistency holds for the “win m of n” trials when censoring is present.

This might be one of the cases where it’s worth it to go through the combinatorics and derive the appropriate pmf probabilities directly.

2 Likes

Thanks! My understanding is that the censoring is usually not done for analytic purposes per se but more so for confidentiality (e.g., that we might be able to dig into who the patient is if they’re the only one who underwent a procedure with a given physician).

2 Likes

Oh, this is actually the first time I’d heard of a Bradley-Terry model! My hunch is that it would likely end up being the same since the censoring in a censored binomial is handled by summing up PMFs (CDF) of censored observations, so it should be equivalent to a Bradley-Terry model in the same way standard logistic regression is.

One thing that’s easier to use (since there isn’t a whole lot of support for censored binomial models) is a censored Poisson (you would use the log of N as an offset), which is implemented in more packages (mgcv, brms, INLA, and probably others as well). You would need to use robust standard errors to rectify the CIs (which I was planning to do anyway since I have a few clusters) and you would need to be okay with RRs rather than ORs as the default contrast.

1 Like