The questions below are very basic and have probably been asked thousands of times by statistically-naive MDs- apologies in advance. Unfortunately, I keep getting hung up on them and can’t find clear answers in my reading. I’ll understand if there’s no way to explain these concepts in layman’s terms.
1. What exactly does a p value mean in the context of an RCT, given that “random sampling” from an underlying population of interest has not occurred?
This is the definition of a p-value provided in section 5.4 of the BBR text (my emphasis in bold):
“A P-value is something that can be computed without speaking of errors. It is the probability of observing a statistic as or more extreme than the observed one if H0 is true, i.e., if the population from which the sample was randomly chosen had the characteristics posited in the null hypothesis. The P-value to have the usual meaning assumes that the data model is correct.”
The following article discusses inferential problems that arise when random sampling has not occurred:
“When p-values or confidence intervals are displayed, a plausible argument should be given that the studied sample meets the underlying probabilistic assumptions, i.e. that it is or can be treated as a random sample. Otherwise, there are no grounds for using these inferential tools and they become essentially uninterpretable…”
I’m confused…We are never actually using a “random sample” of patients when we conduct an RCT, yet p values are found throughout trial reports. For example, if we want to study the effect of a new chemotherapy drug in patients with colon cancer, we will interview patients with colon cancer as they happen to present for medical care (of their own volition). Only if they meet the inclusion criteria for the trial will we next offer them the chance to be randomized. In turn, only those who agree to participate in the trial will be randomized, either to the new therapy (whose intrinsic efficacy is being tested) or to placebo.
As discussed in this blog, “random sampling” does not occur in the conduct of human experiments. Rather, what occurs is “randomization”- a very different process:
Using the clinical example above, random sampling would require that doctors randomly “pluck” a sample of patients with colon cancer from a master list of ALL patients with colon cancer and then randomly allocate them to one treatment or another. But this isn’t remotely how clinical research works, for the following reasons:
there is no “master list” of all patients in the country (or world) with colon cancer;
even if there were such a master list, we wouldn’t be able to just pluck patients randomly from the list and force them to enter a clinical trial of a new therapy;
many patients with colon cancer live in parts of the world where clinical trials are not conducted.
Some non-clinical people/non-statisticians seem to be under the mistaken impression that random sampling occurs in the design/conduct of RCTs. Could this misunderstanding be rooted in the fact that 1) p values are used widely in the interpretation of RCT results, and 2) the concept of “random sampling” seems to be built into the definition of a p-value?
Given that “random sampling” never actually occurs in the design and conduct of an RCT, how should we define/interpret p values in the RCT context?
2. On a somewhat related note, how exactly is the concept of repetition defined in frequentist statistics?
The following sentence is an excerpt from this piece: Statistical Thinking - My Journey from Frequentist to Bayesian Statistics
“I came to not believe in the possibility of infinitely many repetitions of identical experiments, as required to be envisioned in the frequentist paradigm.”
Given the importance of distinguishing random sampling from random allocation, how should the concept of “multiple hypothetical repetitions” in the frequentist paradigm be viewed? Should we view hypothetical repetitions as involving:
repeat draws of random samples from an underlying population, followed by random allocation of subjects in each sample to one treatment or another? (i.e., multiple experiments conducted on multiple random samples of subjects)?; OR
single non-random recruitment of a group of subjects, followed by repeated randomization of these same subjects to one treatment or another (i.e., multiple experiments conducted on the same group of subjects)?; OR
repeated non-random recruitment of different groups of subjects, followed by random allocation of each group of subjects to one treatment or another (i.e., multiple experiments conducted on multiple non-random samples of subjects)?