Bayesian Dynamic Borrowing with Robust Mixture Priors: Fair or not?

Jesse_Helman · January 16, 2024, 2:59pm

I am a statistician on the sponsor side of trials, having recently graduated from school and working a good bit in Bayesian methods, and Bayesian Dynamic Borrowing has become a popular topic in my current work. For those unfamiliar, the “Dynamic” part comes from the idea that the model itself is updated somehow based on data coming in. For example, a “robust mixture prior” uses a mixture prior, one vague piece and one informative piece, each with weight adding to 1, and if your data coming in “agrees” with the informative piece of the prior, you update the weights on your prior to favor the informative piece. If the data does not agree with the informative piece of the prior, then the vague piece is given higher weight.

A few of Frank Harrell’s resources and posts on Bayesian methods have mentioned that one way to cheat as a Bayesian is to change the prior after seeing the data. This Dynamic Borrowing seems like a textbook case of that. It feels wrong to me, like you’re counting the data twice. Once to weight the prior, and once as the data itself (likelihood). The defense is that there are no further “user choices” being made, the method is only specified one time at the beginning, and the weights are updated “automatically” as data comes in. I don’t think I buy this. It still seems like the prior is technically being updated and data is being counted twice. The problem is, I don’t have any specific examples/simulations or references that can help explain if my intuition is correct.

I was wondering if anyone had any more thoughts on this topic, especially any specific examples that say Dynamic Borrowing is fair or not.

R_cubed · January 16, 2024, 3:49pm

The terminology is a bit confusing to me, and I may be missing something.

A prior distribution is specified before the data. In a sequential experiment, the initial prior (before any data is collected) can be seen as both a posterior (for the data point seen) and a prior (for the future unseen data point).

If you are computing a posterior from the initial prior (not re-weighted) that strikes me as coherent and fair. If you are computing a posterior from the re-weighted prior, that strikes me as cheating.

If we take a look at this from a betting perspective (ie. a poker game), “re-weighting the prior” seems like either reaching into the pot to remove money when the cards are unfavorable, or adding money when they are favorable, but not permitting your opponents to do the same. I don’t see how it is a coherent procedure.

You might find this paper by Arnold Zellner, who discussed Bayesian methods from the info processing perspective interesting.

An optimal information processing method satisfies the condition that input information equals output information. But there are certain contexts where dynamic updating may be appropriate.

The earlier Zellner paper discusses the information conservation principle that guides his proofs of Bayesian optimality. The later paper discusses some slightly more complicated scenarios, such as dynamic updating.

References
Zellner A. (1988). Optimal Information Processing and Bayes’s Theorem, The American Statistician, 42:4, 278-280, DOI: 10.1080/00031305.1988.10475585

Zellner, A. (2002). Information processing and Bayesian analysis. Journal of Econometrics, 107(1-2), 41-50. (link)

Jesse_Helman · January 16, 2024, 6:08pm

Yes, the term “reweight” is misleading. I corrected the original post. The weights in the prior are left as parameters, and these parameters are estimated during model fitting. So, for a single analysis, there is just one “weighting” of the parameters in the prior, but this is done with the current observed data.

Thank you for the papers, I will check them out.

R_cubed · January 16, 2024, 6:59pm

From your second post, this sounds like this is some blend of traditional Bayesian hierarchical models with Empirical Bayes methods.

For those interested, the following dissertation was published in July 2023 that describes the method in more detail:

Ji, Z. (2023). Bayesian Dynamic Data Borrowing Methodologies for Source-Specific Inference (Doctoral dissertation, University of Minnesota).

The more I think about this, the more it reminds me of a model averaging procedure, but done in a sequential fashion. The initial weighting can be seen as a prior that is properly updated via Bayes Rule.

f2harrell · January 17, 2024, 5:55pm

I’m so glad this is being discussed. I’ve had the feeling that it’s a nomenclature problem and not double dipping. The phrase ‘dynamic borrowing’ is a terrible one if that’s correct, and we need to change that urgently. I don’t think it’s any more dynamic than using a mixture prior with pre-specified mixing weights. If I’m wrong about that I hope someone can correct me.

Jesse_Helman · January 17, 2024, 6:25pm

The Bayesian model averaging perspective helps it make sense to me, it just doesn’t look the same initially I guess because there aren’t distinctly different models you are averaging over. I talked with someone I work with that has practiced this more, and they described it the same way. And everything is done in one go, not sequentially, as I mistakenly edited before. Weights are prespecified, but they are updated during fitting and the updated weights are used in the posterior. The above material has been very informative. I will be working in this area a lot moving forward, so I will use these as valuable references.

This paper helped me some too. https://onlinelibrary.wiley.com/doi/full/10.1111/biom.12242

Also, it led me to read more about thoughts on the general validity of changing the prior/model after seeing data. Seems to be different camps of thought and gets very philosophical very quickly! Thank you both.

Pavlos_Msaouel · January 17, 2024, 8:55pm

Indeed. I have not read or used dynamic borrowing but the concern about counting the data twice reminded me of this excellent discussion between @Sander and Andrew Gelman on prior predictive versus posterior predictive checks.

f2harrell · January 18, 2024, 12:59pm

The @Sander - Gelman discussion is one of the most intelligent discussions I’ve read in a long while. To me the proof of the pudding in the setting they discussions would come from final posterior distributions having the correct width even after posterior predictive checks were used to make the model fit better. I am somewhat certain that there is some double dipping going on, as Sander believes, and that depending on how you use the posterior predictive checks the final posteriors will be too narrow, i.e., not convey sufficient uncertainty to the reader.