I really agree. But I want to note that you have framed your example because of the way you were taught probability, and that will perpetuate some of the problem IMHO. You were taught that sensitivity and specificity were properties of the test and are necessary to get P(disease|test). Neither is true. Sens and spec are properties of tests and patients, and vary strongly with patient characteristics. And teaching them gets physicians in a “backwards information flow” frame of mind that they then learn Bayes just to be able to correct (and don’t see the general power of Bayes when you really need it). Your idea of instead dwelling on P(disease|all you currently know about the patient) is the key, I think. We need to teach how to condition on all you currently know (or as much of it as past data have captured in a risk model) and how to interpret the result. And I think we need to teach how to assess diagnostic test information yield using the approach in the Diagnosis chapter in BBR. There I show how to look at the distribution of likely P(disease|current information) where “current information” is captured with and without utilization of the test results (there, the “test” is cholesterol).
Not only is probability not noise, probability is nearly everything (in a sense). Understanding probability and cause and effect are essential to a successful life. This is because understanding and capitalizing on probability are essential to good decision making. This classic paper by Spiegelhalter is worth everyone’s read.
Thanks, I have read your post on this a few times and agree. The issue I have is that in the papers published on these classifiers, they are unabashedly called “predictive systems” or “predictive models.” This is misleading in a dangerous way for the people who use them at the point of care.
@f2harrell Thank you again for discussing. My sense from BBR and this discussion is that you recommend going straight to regression. So if someone asks where 0.6 is from, show them the nomogram.
If that’s the case, this is very nice. You’re right, I was teaching Bayes rule – what was the point! I thought I was teaching conditional probability, but I was actually teaching Bayes.
I agree Frank that probability is essential to good medical decision making and the Spiegelhalter paper lays that out perfect clear (at least in my memory, it has been a while, should give it a re-read). I also like your way of putting classification as a premature decision. In my view that exactly the core of the problem: a probability is not a decision but information essential for a decision while the classification is a decision.
I think classification can be seen as a prediction. But it isn’t a risk (or probability) prediction and therefore I see little use for it in medicine, where the consequences of a wrong decision are generally too high to leave it to an algorithm; for some types of commercial/marketing use, like Netflix, such classification predictions are generally OK.
Thank you. Apologies for diverting the discussion to decision theory. But definitely worth thinking about how to teach probability such that this kind of thing doesn’t happen.
I know this thread has been quiet for a bit, but I discovered another resource for learning prob and statistics that is quite nice. It is from MIT OCW, which is an amazing resource for math and science courses. Also, it appears bayesian slanted, because it teaches Bayes as standard and then adds on the frequentist school; the reverse of how its usually taught.
I have no expertise on teaching probability, but, I do find animations helpful for understanding probability myself. For example: http://setosa.io/ev/conditional-probability/
There was a very good website with similar animations on more topics I saw a while ago on Twitter but I can’t recall the address
Edit: Found it! https://students.brown.edu/seeing-theory/
I think this would be a nice site to direct students to to revise concepts after a lecture
Perhaps a broad effort in getting people to think probabilistically in all scenarios in medicine is not going to get far. Particularly I think it is an uphill battle in diagnosis.
There is real potential in prognosis, however. There are obvious ethical and legal requirements for informed consent before, for example, a surgical procedure. This and other scenarios where there is time to sit down, punch in numbers, look at predicted risks, communicate them, document them… this is a more structured process than the fuzzy land of diagnosis. Maybe focusing efforts on teaching probability around such scenarios will do best.
(This is not to say there are no applications in diagnosis: Well’s score dominates decisions around investigating for PE)
This is an interesting viewpoint. Perhaps prognosis is the way to make headway among the current batch of physicians. The ACS NSQIP calculator, for instance, has had some success by giving surgeons a way to gauge how “risky” an operation is, and might be a great “jumping off” point for a course (at least to that specific group of docs).
I’m not quite ready to give up hope for probabilistic diagnosis. Perhaps one setting to pursue is the decision to stay with a noninvasive test result when the next step is a painful biopsy.
That’s a good one. Andrew Vickers has done that well with prostate bx
It’s funny you should mention the Well’s score. I have always found that to be the most ridiculous clinical prediction rule. It is a point-based rule that uses only dichotomised inputs and the most heavily weighted predictor is “Pulmonary embolism is the most likely diagnoses OR as likely as others”. That criteria alone places the patient into the “moderate risk” group.
I lean to your assessment Pavel, in observing how these discussions go.
Apologies for missing this thread when it was first posted…
How does one engage a student who already thinks they understand a topic, or aren’t particularly interested? Med students aren’t like stat students, they don’t want to learn any more stats/math is then needed. The approach requires some more finesse.
Like other suggestions in this thread, I usually try to assess my audience with clinical questions involving probability (I prefer polls or interactive lectures). A lot of med education is “case based learning”, so I provide clinical cases with probability/numbers. For example, in a recent ASCVD lecture I asked multiple questions on how to interpret probability of dz, use of absolute risk instead of thresholds, etc. This gives me a chance to asses learner’s knowledge level and for learner to appreciate their may be something useful to learn.
Fundamentally I wish all med students had more pre-med stats and math. One of my best friends ( a lawyer) was rejected from med schools for being a math major, which was not considered a “science” for pre-requirement purposes. The bias is real.
In the case of a “probability prediction” the random variable we are guessing (related CV question) is well-defined: we have Y which takes values in {0,1}. What’s the random variable we are guessing when making a “classification prediction?” Isn’t a classification a “decision,” not a prediction?
I like section 3.8 in your BBR. Great that you have included some YouTubes as well. Some thought for further improvement (I’m hesitating but you asked explicitly for it)
-
Perhaps it could be improved by swapping some parts in the text: start with the verbal explanation, only then formalizing it, and finally give your assessment/ conclusion. One of these conclusions could be that you prefer Bayesian thinking over frequentist because…
-
the piece on forward and backward probabilities might be unclear for first time students.Maybe make more clear that a Bayesian would calculate the probability of an event given the prior and the data, whereas a frequentist calculates the probability of the data, given a number of assumptions, followed by a modus tollens reasoning. I see that students are open to the example of a die. “You have thrown a series of 20 sixes. If the die were fair this probability is (1/6)^20 =0,000000000000000273511123. Do you think this die is fair? Explain.”
-
“For many problems we still do not know how to accurately compute a frequentist
p-value.” (aren’t p-values always frequentist?). One example that I use is when taking samples. Suppose you send out 2,000 surveys, you want at least 300 returned (based on your power calculation), you wait for 2 weeks (that is in your plan) and you have collected 315 surveys. Good news! But how do you calculate a p-value? Is that based on countless repetitions with varying degrees of response? -
The answer to the “female senator problem” can be calculated easier: we have 11 female senators versus (0,5*326M = 163M) female Americans = (11/163M). Maybe better an example that cannot be calculated without Bayes formula?
-
degree of belief tends to induce wrong conclusions, as if Bayesian analysis is about personal opinions. Could be me though. Maybe something like degree of knowledge?
-
maybe a thing about aleatoric and epistemic uncertainty? “Probability is in the mind” and so?
-
Many students have problems grasping the difference between P (X | data) versus P (data | X). Some examples may help! P(person dies | shark has bitten his head off) is nearly 1,0 whereas P(shark has bitten his head off | person is dead) is nearly 0.
Hesitatingly, because the course is good and who am I…
corrected for spelling
Super suggestions. Thanks! I only take issue with degree of belief which I don’t have much of a problem with. I hope I have time to make the changes.
I’ve read the course some more. Great stuff
Question: the course describes different approaches in statistics and is critical of frequentist methods and more positive of bayesian methods. Nevertheless, nearly all of the techniques described in the course are (seem?) frequentist. For instance: sample size calculations, comparing group means, and so on. How about introducing the BEST package instead of the frequentist t-test? Here’s the blurb:
The BEST package provides a Bayesian alternative to a t test, providing …complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the
normality of the data. …The decision rule can accept the null value (unlike traditional t tests) when certainty in the estimate is high (unlike Bayesian model comparison using Bayes factors). The package also provides methods to estimate statistical power for various research goals.
Just a thought.