When using propensity scores to adjust for confounding by indication in observational treatment comparisons, there are many disadvantages of using propensity matching. Besides not accounting for outcome heterogeneity and the non-collapsibility of odds- and hazard ratios [I am never interested in average treatment effects but rather am interested in conditional, i.e., patient-specific effects], propensity matching loses the simple ability to assess whether treatment effect is modified by a specific covariate or by the propensity itself.
Propensity scores are used when the list of possible confounders has a dimensionality that exceeds the number of covariates that can be safely adjusted for using regression. For example, if there are 100 outcome events and 60 degrees of freedom for 50 confounders, regression adjustment fails. By focusing the 50 confounders into a single number that reflects how they affect treatment selection, propensity score is a data reduction technique. But to handle outcome heterogeneity one has to adjust not only for propensity but also for the major outcome predictors. A good default strategy is to adjust for a regression spline in logit propensity plus directly adjust for, say, 5 pre-specified major outcome predictors even though they are also included in the propensity score. Outcome predictors thought to modify the treatment effect can be pre-specified as simple interacting factors.
Some studies have found that when physicians are more sure that a specific treatment should be used in their practice, the treatment has more effectiveness. When the treatment interacts with the logit of the propensity score, and this is not explained by a readily pre-specified individual covariate interaction, is the conclusion as simple as saying the following: Physicians have some generalized non-specific knowledge of which treatment works best when their decisions are more certain (e.g., propensity to treat < 0.2 or > 0.8)? Or is this likely to instead signify a important treatment selection variable that is also an outcome predictor that is missing from the dataset? Or is in the dataset but was only reflected in the propensity model and should have appeared also as one of the separate outcome heterogeneity adjustors? In other words, is it likely we just missed an important interaction with a single covariate when pre-specifying the model?
We may want to have a more general PS topic for that question. My colleague Cindy Chen has led a paper showing the advantages of penalized regression. And if using PS I like regression adjustment for the logit PS, detailed in BBR Chapter 17.
Unpenalized regression adjustment fails. But you can penalize the model down to effectively 10 degrees of freedom, for example, by discounting all the adjustment variable regression coefficients.
Do you have pointers for references for âwhen physicians are more sure that a specific treatment should be used in their practice, the treatment has more effectivenessâ?
We have worked on a slightly different problem, estimating optimal personalized treatment treatment policies, based on inverse-propensity weighted nonparametric estimates of average outcome, and considered interaction between treatment and propensity to treat. (The advantage of considering the decision (treat or not-treat) problem specifically, instead of outcome estimation, is that intuitively one can imagine that you need to learn âlessâ, and need not have point identification everywhere in the covariate space, in order to still be able to make good decisions if treatment interacts with propensity to treat. Granted, this is a very ML-classification style approach that loses inferential guaranteesâhowever Iâm not aware of very much inference without unconfoundedness anyway.)
Are you asking about what would be the most likely explanation for observed dependence between treatment and propensity score? Anecdotally, just from my experience from some simulation studies for our sensitivity analysis work, Iâm more inclined to believe the kinds of confounding effects that are cause for concern are due to the first scenario, that physicians may be more informed about the outcome than just the covariates.
From the simulation studies Iâve run, I havenât found that missing an important interaction with a single (reasonable) unobserved covariate generates the kinds of confounding interactions that are of concern for decision-making. (I wasnât looking at effect estimates per se, but rather on treatment decision based on estimate of effect size). Intuitively one expects an estimation of the propensity score e(X) to simply average over the full propensity score that includes e(X,U) depending on the marginal distribution of such a single covariate. I havenât thought about how it might be different for regression adjustment-based approaches, however.
Youâve thought a great deal about this. Thanks for the input. Iâve seen the phenomenon in two datasets but havenât seen this published. In our published right heart catheterization analysis (Connors et al) we found overall harm of the use of this medical procedure but physicians who used it often seemed to not be causing harm with it. Perhaps they knew something, or were better trained.
Regarding personalized treatment strategies, the approach you outlined seems indirect and will not find specific mechanism-related covariates. I would much prefer using a full regression model with penalization of treatment interaction terms using a quadratic penalty. In general I donât find it compelling to interact treatment with hard-to-define general effects.
I think youâre right, and itâs possible that the stronger tendency to use a treatment in certain types of patients can be either fully understand by the physicians exhibiting that behavior, or the propensity score can be using variables that are correlated with the ones the physicians are actually using, allowing one to see better outcomes for patients where the treatment is used more frequently, even if the patterns of likely benefit are not consciously known by the treating physicians at the time.
Coming late to this convo, to note simply that the informational content of the âeyeball testâ has been explicitly examined in the context of cardiac surgery. I think these are profoundly important questions, warranting close examination in their particulars.
Jain R, Duval S, Adabag S. How accurate is the eyeball test?: a comparison of physicianâs subjective assessment versus statistical methods in estimating mortality risk after cardiac surgery. Circ Cardiovasc Qual Outcomes. 2014;7(1):151-156. doi:10.1161/CIRCOUTCOMES.113.000329
Iâm not sure if this is related, but I find patients receiving a specific treatment end up with more follow-up health-care contacts and an informed presence bias. I.e., this offers physicians more opportunities for adjustment after an intervention to prevent potential harms, and also biases collection of EHR data.