Bayesian Solutions for Challenges to Frequentist Statistics

f2harrell · September 22, 2020, 1:08pm

Ability to Use External Information

A skeptical prior distribution may be used when it is known that a therapy is incremental and not curative.
In the less common situation where one has excellent data about the same treatments from a previous study on the same patient population, the posterior distribution from the previous study can be used as a prior in the new study (with any desired degree of discounting) to increase the Bayesian power for the new study. This can apply to a pharmaceutical company’s Phase II development program, for example. The frequentist approach has no formal way of incorporating external information.
Data from adults can inform treatment effects about children.
The frequentist approach has problems even with the use of simple background knowledge. For example suppose that one was told that 3/4 of the coins in a jar of coins were fair and the other 1/4 were unfair with P(heads) = 0.6. Given the task of choosing one coin at random and flipping it 400 times to obtain evidence of its fairness, the background information is trivially incorporated as a Bayesian prior but it is unclear how to use that information with the frequentist approach.

Avoiding Misinterpretation

Statisticians know that a large p-value does not validate the null hypothesis, but journal reviewers and editors still do not understand this and the absence of evidence is not evidence of absence error abounds in medical journals. The Bayesian paradigm does not allow for this misinterpretation, because a posterior probability such as P(-2mmHg < true blood pressure redudction < 2mmHg) will reveal the great amount of uncertainty regarding a conclusion that the treatments have similar effects when the p-value is large and the sample size is not. In those cases the posterior probability of similarity of treatment effects will often be close to 0.5, correctly giving the impression that the small study is not informative.

Accounting for Model Uncertainty

In a two-sample t-test, if one assesses normality and equality of variances instead of assuming them, frequentist operating characteristics are not preserved. Uncertainty about distributions and about transformations that might result in normality, and uncertainty about equality of variances distorts frequentist inference. P-values and confidence limits are only correct if assumptions hold. The Welch t-test allows for unequal variances but does not allow for non-normality and yields only approximate results. A Bayesian t-test as discussed in BBR Section 5.9.3 uses a prior distribution for the variance ratio and a prior distribution for the amount of non-normality of the data distribution. Exact posterior inference for the difference in means results, and it fully takes into account uncertainty about variances and the data distribution. The analysis even provides a posterior probability of normality. Unlike the frequentist t-test’s too-narrow confidence intervals for differences in means, the Bayesian posterior interval is properly a bit wider to take into account uncertainty about model assumptions.

Allowing for Interaction Without Destroying Power

Assume for example a 2-treatment randomized trial where allowance for sex x treatment interaction is made. In the best of circumstances (equal numbers of males and females) estimation of the interaction effect requires 4 times as many patients to achieve the same precision as the estimate of the average treatment effect when interaction is absent from the model. In frequentist models, interactions need to be fully “in” or “out” of the model. In a Bayesian model, borrowing of information about treatment effects from females to males occurs while having the interaction “half in” the model. This is achieved by putting a skeptical prior on the interaction effect that instantiates a belief that the treatment effects are likely to be more similar for males as females than they are different.
As detailed here, one can use the same approach to compute the probability that a treatment affects mortality differently than how it affects nonfatal outcomes. Most clinical trials that include nonfatal outcomes in the primary analysis do not have adequate frequentist or Bayesian power for assessing a pure effect on mortality. By using for example a partial proportional odds model one can put a prior on the differential effect (interaction between treatment and category of outcome) that is skeptical, in order to borrow information across outcome types without assuming a constant treatment effect across those types. If one assumes a low probability that the effect on mortality is drastically different from the effect on nonfatal outcomes, one is able to obtain a posterior probability that the treatment reduces mortality (or that it reduces mortality by a different amount than it reduces risk of nonfatal outcomes) that allows as much customization of treatment effects as the sample size allows. As the sample size grows, the prior wears off and any true mortality-specific effects will reveal themselves.

Compound Assertions About a Single Parameter

Using the Bayesian posterior distribution one can compute the probability that a treatment effect is > \epsilon for all possible \epsilon. There is no reason to have different types of calculations for assessing efficacy vs. non-inferiority vs. similarity.

Compound Assertions Involving Multiple Treatments or Doses

In comparing three treatments, for example, one can easily compute the Bayesian posterior probability that group B is at least 3 units better than group A and that group C is at least 2 units better than group B.
One can compute P(B > A or D > C or E > average of A,B,C,D).
In a dose-response analysis one can compute the probability that the dose-response curve is monotonic.

Compound Assertions Involving Multiple Outcomes

With frequentist interference it is difficult to do risk/benefit tradeoffs for combined efficacy/safety analysis. With joint Bayesian modeling of multiple endpoints one can readily compute such quantities as the following
- P(safe and effective) = P(efficacy > 0 and absolute risk increase for an SAE < 0.02)
- P(effective on endpoint 1 or effective on endpoint 2)
- P(effective on endpoint 1 and effective on enddpoint 2)
- P(any mortality reduction or \geq 4mmHg reduction in blood pressure)
- P(at least 10% improvement on at least 2 of 5 endpoints)

Sequential Trials

Extending a promising study without penalty. With the frequentist approach α has already been spent, so adding m new observations effectively adds < m observations after the new α penalty.
In a sequential study with multiple data looks, frequentist methods are overly conservative especially at early looks. To be able to spend most of the α at the end to maximize power, they spend very little α at earlier looks. To preserve α the frequentist approach must penalize for looks yet to be made as well as for looks already made that are now inconsequential. Since the Bayesian paradigm uses only forward-in-time probabilities that consider what did happen instead of what might have happened, there are no multiplicities no matter how many data looks are made. This gives maximum flexibility.
Handling unscheduled looks and looks that did not happen. With frequentist sequential trials, one penalizes for planned looks. In one case of a device trial, a study was “negative” because it did not meet at study end the predefined critical value that was adjusted for an early look that did not take place because patient recruitment was too fast.

Adaptive Trials

In general, adaptive trials, including response adaptive randomization, present major problems in constructing a sample space that would allow computation of p-values or α. With Bayes, posterior distributions for adaptive trials are computed exactly the same as if the design was trivial.

Inference Even With Penalization

When overfitting is a problem so that penalized maximum likelihood estimation is used, one loses the traditional inferential tools. In Bayesian mdoels, shrinkage (very skeptical) priors can be used to minimize overfitting, and posterior inference is just as simple as when there is no penalization.

Avoiding Approximations

Bayesian posterior distributions do not use large sample approximations but are exact. Large sample theory/asymptotics are not needed for Bayes.
Posterior probabilities about parameters are exact. For most non-Gaussian statistical models p-values and confidence limits are approximations, and in very non-Gaussian likelihoods such as for binary logistic models or with random effects the approximations may not be very good.
Posterior inference for models with random effects is just as easy as for model with only fixed effects, and such inference avoids approximations needed in the frequentist domain.
When one desires to obtain uncertainty intervals on derived quantities, the frequentist approach requires one-off solutions such as the delta method. For example, one may use a semiparametric model such as the proportional odds model to analyze a continuous response variable Y, and desire to summarize the results using more than an odds ratio. When sampling from the posterior distribution of the model’s intercepts and slopes, one merely computes for each sample the derived parameter such as the mean Y at a given covariate setting. The posterior distribution of this nonlinearly-derived quantity will easily provide an exact (and properly asymmetric) posterior interval for the mean.

Solving Particular Modeling Problems

Variance component estimation: When fitting varying intercept models, groups with <2-3 observations/group are often estimated to have zero or -ve variances (depending on the method used). This is not the case with a Bayesian analysis because the variance components are not integrated out of the likelihood but estimated the same way all other parameters are estimated. (@Dilsher_Dhillon)
Hierarchical models: Uncertainty in variance of random effects is fully accounted for, making posterior intervals properly wider. This is in contrast to empirical Bayes methods such as what is typically used in meta-analysis, which act as if variances of random effects are known constants.