Propensity matching and analysis of resultant data on a data set with nonuniform repeated measures



Good afternoon all, and thanks to community for this amazing resource. The following question was posed to CrossValidated last summer with a small number of views and no responses. To avoid burying the lede: we chose to use a Monte Carlo approach for the analysis (with the blessing of the statistician we were working with, and ultimately of the stats reviewer for the journal). However, I am curious to see if there is a “better” way to approach the problem. Thanks in advance for your consideration!

Original Post

Propensity matching and analysis of resultant data on a data set with repeated measures

We have extracted retrospective case-level data collected over several years. We are using the administration of rescue antiemetic in the postanesthesia care unit as a proxy for postoperative nausea and vomited (binary outcome). We’ve extracted data around many variables including age, gender, ASA-PS score, exposures, etc. as confounders (the matching variables). We are looking to see whether preoperative strategies are associated with differential outcomes.

The problem we are having is that the 14k cases represent only 10k unique patients. It has been proposed to me to discard all but, say, the first case. My main concern about choosing an arbitrary strategy for discarding cases is that there is no a priori reason to discard any given case. I ran a Monte Carlo simulation where I randomly select single unique cases and then propensity match; I’ve found wide dispersion in the resultant confidence interval of a test association.

In other words, discarding cases before propensity matching seems to discard valuable information as well. I’d need to look at absolute standardized mean differences in the Monte Carlo to see if there is an increased imbalance in the sample as well.

Specific questions I am looking for answers to:

  1. For propensity matching, would it be acceptable to match at the case level, then stratify on the matched, then combine strata for nonunique treated patients, then used conditional logistics regression or generalized estimating equations to assess the treatment effect on the treated? This would potentially level nonunique patients unstratified in the non-treated group.

In a personal communication, it has been summarized to me that the core issue here is that there are two sources of clustering to account for: a) Matched pairs: treated and control subjects who were matched would share a homogeneity in outcomes. b) The same subject having multiple records (and thus a within-subject homogeneity in outcomes).

I’m wondering how to approach this problem without excessively discarding useful information.

  1. More broadly, can the same tools that can account for repeated measures (e.g. conditional logistic regression or generalized estimating equations) also be used on the matched sample? In other words: if have repeated measures within a matched sample, can you use these tools to analyze that data?
  2. Is there a mechanism in existing propensity matching packages to exclude non-unique patients in the nontreated (control) population once a case from that patient has been selected? (We are using MatchIt in R). Is that a reasonable approach?
  3. What are your thoughts on propensity matching at the case level, with possible multiple cases for each unique patient in the propensity-matched cohort, and then performing Monte Carlo simulation where conditional logistic regression is performed on a match-stratified dataset where only a single case per unique patient is randomly selected from amongst the cases in the propensity-matched cohort?

Wondering if this or this might have some insight. It seems that the clustering being spoken of in these examples are more broad sources of clustering applicable to relatively large proportions of the sample. Here, the sources of clustering are at the patient-level and the matching-level.


I do not see justification for propensity matching in this context. Matching throws away good data, does not account for outcome heterogeneity (covariates), and does not provide a framework for interaction assessment. And to throw away data from repeated measurements is usually suboptimal also. I strongly recommend using a unified modeling approach that uses all available data and accounts for within-subject correlation.

Another problem with matching that is mentioned in your excellent post is that we don’t have a unique principled way to analyze the matched datasets.

One of the worst aspects of matching is the discarding of subjects who are comparable (with respect to baseline variables) to subjects who are retained. Another problem is that many matching algorithms are dependent on how you happened to sort the rows in your dataset.


@f2harrell, thanks for the comments. I completely agree that it is a waste of perfectly good data to use the approach that we did. While propensity matching has gained substantial popularity for these sorts of retrospective analyses, I’m sure that there must be ways to more optimally make use of the entire dataset. My hope was to better understand what options exist that can account for within-subject correlations – without formal training as a statistician, I know that I live in that place where I know enough to be dangerous, primarily because I don’t know what I don’t know.


Thanks. The only method I’m familiar with for handling matched data is the conditional logistic model, but I don’t see why you couldn’t use a hierarchical random effects model. I hope that someone more experienced with matched data will respond.


i’m interested in the problem but it’s unlikely that i can help. How are you matching? eg method + on how many factors are you matching? it seems youre not using ‘case’ to mean ‘patient’ but intead ‘patient/visit’?


We are matching using a package called MatchIt, which can deploy a variety of strategies. We chose to employ ‘optimal’ matching, which should obviate @f2harrell 's concern about the sort of the list affecting the matched set. Further details on the number of variables matched etc can be found here.

‘Case’ refers to one patient-one surgery. Within our data set, there were patients for whom there was a one-to-many relationship between patient and surgery (e.g. the patient had more than one procedure meeting our inclusion criteria). The issue that arises is heterogeneity in care - the same patient, across multiple procedures, may have had a different set of exposures that mitigated or augmented their risk for the outcome (postoperative nausea and vomiting rescue). So to account for within-subject correlations, you could choose a single case per unique patient, or do (as we did) a Monte Carlo simulation excluding cases at random and examining the distribution of treatment effects of the treatment of interest. It would be better to develop a model that accounted for all the data rather than performing these exclusions.

Thanks @f2harrell for the suggestion to look at a heirarchical random effects model. Time to do some reading…