R^3 just pointed out this thread to me. There’s a lot I could write but I think the most crucial points have been made, so I’ll just list a few items as I like to think of them:
-
The random-sampling and randomization models are isomorphic (we can translate from one to the other) as can be seen by considering finite-population sampling (which is all we ever really do): Random sampling is random allocation to be in or out of the sample; randomization is random sampling from the total selected experimental group to determine those in the group who will receive treatment. Sampling methods thus immediately translate into allocation methods and vice-versa, although some methods may look odd or infeasible after translation (which may explain why the isomorphism is often overlooked).
-
The evolution of my views on the meaning of P-values and CI when there is no identifiable randomizer may be seen in going from
Randomization, Statistics, and Causal Inference on JSTOR
to
https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529625 -
Despite the isomorphism, in practice sampling (selection into the study) and allocation (selection for treatment) are two distinct operations, each of which may or may not involve randomization. When they both do, some interesting divergences between Fisherian and Neyman-Pearson (NP) testing can arise, e.g., see
On the Logical Justification of Conditional Tests for Two-By-Two Contingency Tables on JSTOR
which brings us to… -
Permutation tests, exact tests, resampling etc.: These are general methods for getting P-values and CI when we are worried about the usual, simple asymptotic approximations breaking down in practice (which occurs more often than noticed). For these methods, covariate-adjusted P-values and CI can be obtained by resampling residuals from fitted adjustment models.
-
Nonetheless, in my experience switching from the usual Wald (Z-score) P and CI to likelihood-ratio or better still bias-adjusted score P and CI (as in the Firth adjustment) have always been as accurate as could be obtained without going on to use well-informed priors. Those priors translate into a penalized likelihood function, and the P-values and CI from that function are approximate tail and interval areas in the resulting marginal posterior distributions. This use of Bayes will be more frequency-accurate than ordinary frequentist P and CI (including permutation and “exact” P and CI) when the information in the prior is valid in a specific sense. Which brings up…
- Valid objections to Bayes seem to come down to the fact that invalid (misinformed) priors can ruin accuracy, and a fear which I share that “expert” priors are often invalid (typically prejudicially biased). In typical practice, the response of using reference priors comes down to using 2nd-order corrected frequentist P and CI, as in the Firth adjustment (which in logistic regression reduces to the Bayes posterior from a Jeffrey’s invariant prior).
- Finally, an important technical point which seems to have been overlooked in most published discussions (including mine): The Karl Pearson/Fisher observed tail-area P-value (their “value of P”) is not always equal to the realization of the random variable that is the minimum alpha for which rejection would occur (the P-value defined from Neyman-Egon Pearson testing). This is so even for the simplest normal-mean interval-testing problem. It happens when frequentist criteria are imposed that sacrifice single-sample coherence for hypothetical long-run optimization, notably uniformly most powerful unbiasedness (UMPU). Failure to notice this divergence has led to logically incorrect claims that compatibility interpretations of Fisherian P-values are incoherent, when the incoherence applies only to NP P-values. These claims thus flag a failure to actually read and understand Fisher and later discussions of P-values, which reject decision-theoretic foundations (and their criteria such as UMPU) in favor of information-theoretic and causal foundations. The conflict goes unnoticed in part because the P and CI from the Fisherian and NP approaches don’t diverge numerically in most everyday applications.