How to teach probability

mds
probability

#1

Understanding probability is key to understanding uncertainty, quantifying evidence for effects, and making decisions in the face of uncertainty. In teaching many clinicians over many years I’ve found the lack of sufficient understanding of probability results in misunderstanding probabilistic diagnosis, sensitivity, specificity, prevalence, probabilistic prognostication, p-values, and number needed to treat. The problem is made more acute when patients have even more difficulties with probability than their physicians. Some patients and physicians even believe that probabilities do not apply to individuals, when in fact “playing the odds” is how we make most of our important decisions. Poker players and sports gamblers can master probability, so why can’t everyone else? Some persons misinterpret a probability of disease of 0.2 as saying “I’m the one in five who will be lucky” even though he acts appropriately when wagering on a football game.

There seems to be three general areas that users of probability need to understand:

  1. Given that a probability has been correctly stated, what does that probability mean?
  2. What is conditional probability, what do you condition on, and how do you compute it? [I’ve seen more physician confusion on conditioning than on any other aspect of probability, with many physicians not even clear on when they are conditioning on unknowable conditions, or on the future. Even more common is the use of incomplete conditioning, e.g. computing the risk of disease given a patient is > 60 years old when you already know the patient is 61 years old.]
  3. How to operate on probabilities using the laws of probability to reverse the conditioning, compute probabilities of unions and intersections of conditions, and understand independence of events.

An attempt at a crash course in probability is in Section 3.8 of BBR. Suggestions for improving that section are most welcomed. The short section introduces the fundamentally different but all valid meanings of probability (limiting relative frequencies, subjective, etc.).

What suggestions do you have for resources and approaches for teaching probability to physicians and patients?


#2

Complicated topic. For example, I would say it’s definitely the case that there is no unique probability of a given event that applies to an individual since it must be conditioned on one or some of an infinite number of reference classes to which each person belongs. This could be interpreted as saying "probabilities don’t apply to individuals,’ but they’re not actually equivalent statements. Subtle topic, in my opinion.

“Your understanding is thoroughly bankrupt” is a tough starting place for the learner, whether it’s true or not. There is perhaps a different way to approach the gap between where people are & where they need to be to be effective, lest it seem like the old joke: “Oh, you want you go to London? If I were you, I wouldn’t start from here.”

Step 1 in my opinion: never ridicule, only build new capacities.

Step 2 IMO is to keep in mind that if the learners already have ideas, and you want to replace them because they’re not the right ideas, it’s different from an ordinary instance of teaching. It’s more akin to converting someone than it is like teaching physics to someone who has no beliefs about physics.

Step 3 IMO: keep the jargon to a bare minimum, and assume no one knows the nomenclature you would usually use to express probabilities.

Step 4: Let’s see what others say… Great topic.


#3

Thanks. If you have particular notation suggestions for the BBR section that would help. Regarding the philosophical point you raised I don’t want to get too much into that but you’ve hit the nail on the head. Probability is always subjective because it depends on what is in the mind of the beholder, e.g., what information is available to her. It’s all relative. I want to deal with the situation where we have enough information to condition on to make it interesting, and I never assume we know all the conditions that are relevant. Some conditioning is better than no conditioning, and conditioning on 3 variables is better than on 1 variable.


#4

Thanks. BBR probabilities section reads well, explains the nomenclature nicely. No recommendations for changing it. Could definitely be used to teach those ideas successfully.


#5

Excellent topic. Let me start by pointing out a MOOC by Joseph Blitzstein about probability: Introduction to probability. Free to follow unless you need a certificate. From what I have seen from it, it is super clear, and I really like the animated video’s.

Conditional probabilities and how they can be useful (and when they might not be) is definitely something I regularly talk about in my teaching. Conditional dependence of test outcomes and their impact on post-tests probability is another topic that comes up when talking about “independent” diagnostic tests.

I think “prediction” is one area where I think it is becoming increasingly relevant to talk about probability (surprisingly). The trend of machine learning predictive models that make blackbox classifications without probability prediction suggests an under-appreciation for probability (that this type of prediction seems to be accepted is mind-boggling). Another reason to think about how to improve our teaching about probabilities such that their value is better appreciated.


#6

I, for one, like to use a model for bowling when teaching probability (to graduate students). You can see / hear the videos I do at
https://youtu.be/-FeGc4c0Am8 (about 9:30 in) with corresponding markdown at
https://courseworks2.columbia.edu/courses/54170/files/folder/LectureMaterial/Week01

In short, bowling is something that is simple enough for almost everyone to understand but minimally complicated enough to illustrate almost all of the concepts in discrete probability. I like the example because how many pins get knocked down is plausibly independent across frames, but it should be fairly obvious to students that two rolls in the same frame of bowling are not independent; i.e. the probability of knocking down x_2 pins on the second roll depends explicitly on number of pins still standing, which is ten minus the number of pins knocked down on the first roll. So, right from the start you have to think about — and build up — a bivariate probability mass function for the two rolls in a frame of bowling. I like this better than starting with coins / dice / cards because when we actually use probability for modeling, it is always with PMFs and PDFs that have non-trivial functional forms.

Fortunately, David Neal and his master’s student Jennifer Hohn have worked out a simple PMF for bowling that I use. Basically, if you take the first 11 Fibonacci numbers (starting from the zero-ith) and divide by the thirteenth Fibonacci number less 1, you have a PMF for the first roll of a frame of bowling. Now you are ready to talk about conditional PMF for the second roll, given what happens on the first roll. In this example, you can use the same Fibonacci scheme but instead of normalizing the first 11 Fibonacci numbers, you use Fibonacci numbers up to one plus the number of pins still standing.

From there, you can introduce the multiplication rule to get the bivariate PMF for the probability of knocking down x_1 pins on the first roll of a frame and x_2 pins on the second roll of that frame. This bivariate PMF can be represented as a familiar table that is 11 \times 11 and you can talk about the two marginal distributions using the addition rule.

If you go fast, then by the end of the first lecture you can get to Bayes Rule for the probability of knocking down x_1 pins on the first roll of a frame given that x_2 pins were knocked down on the second roll of that frame, which is a bit of a contrived example but I like it better than the false positive rate of a medical test. I also do some Monte Carlo simulation that approximates the analytical solution. This usage of Bayes Rule isn’t really Bayesian, but it sets up the rest of a Bayesian course (I’ll reply to myself with a poker example that I use for homework that gets into the difference between classical, frequentist, and Bayesian perspectives on probability).

If you like this approach so far, you can see how I did the next lecture (Discourse won’t let me post the links but they are just the following ones on YouTube and Canvas) where we use the same bowling setup to get into expectations of functions of discrete random variables.


#7

I love poker for teaching probability, and I use it in a homework problem . It is easy to find videos of interesting poker hands on YouTube and then ask probability questions about what happens or could have happened. You can ask fun questions about card combinatorics, but I especially like it for distinguishing between perspectives on probability:

  1. Classical: Each card that is not already visible to a player has an equal chance of being turned up next
  2. Bayesian: Every act in poker is a decision problem and it is easy to see that poker players are making decisions in order to maximize their expected utility, although the utility may be for winning prize money rather than winning plastic poker chips in a particular hand. And you update beliefs several times in a hand as cards get turned over and bets get made.
  3. Frequentist: More so than in almost all situations in the social / medical sciences, poker actually has repeated sampling of cards from a randomly shuffled deck. To be a professional poker player, you sort of do need to choose a rule (strategy) for how you will play in a given situation and adhere to it in order to obtain the expectation of the strategy over many hands where that situation arises. Of course, it isn’t a perfect frequentist setup because one hand is not independent of the others (with the same players), the sizes of the chip stacks are always changing in important ways, as well as who is (to the left / right of) the dealer.

#8

Nice. I think you can extend this to teach backwards probabilities and why we usually want to avoid them. What is the probability that 7 pins were knocked down on the first roll when the ultimate result was a spare?

I think that the Fibonacci sequence may cloud the issues.


#9

I think a general barrier is that MDs sometimes see probability as removed from what they do. The more technical the explanation of probability, the more removed it seems. Some of this is shell shock from jumping through years of (expensive) hoops that require absorption of highly technical material never seen again. For example, the premedical curriculum contains, eg, organic chemistry and calculus, which clinicians almost never use. The graduate curriculum contains, eg, memorization of all of the enzymes in the Kreb cycle, which appear on esoteric board questions, but are often forgotten soon after. (Many really enjoy these subjects, but can understand how they might appear orthogonal at times to clinical medicine.) These experiences lead to shell-shocked MDs who are highly suspicious of technical concepts that seem unrelated to patient care. For this reason, many MDs shut down when they see the symbols and calculations involved in probability. (They are highly capable of learning these materials, but at face value it might be difficult to connect the symbols to what they do every day.) This is compounded by initial exposures that involve something other than patient care. (It is also compounded by perhaps the difficulty of manually using tools to estimate conditional probabilities in the standard clinical workflow (now with EHR, already, the interest is starting to increase).) One probably failsafe way to bring probability closer to patient care is to use clinical vignettes.. Here is my attempt:

A patient presents for a routine checkup. He is eligible for a statin but resistant because his coworker suffered from statin myopathy. You advise the patient that statins are known to decrease 10-year risk of heart attack, but he still refuses.You would like to convince him by showing how much the statin will reduce risk.

This can be done by estimating the conditional probabilities \text{P(heart attack | statin)} (read as “the probability of a heart attack given that the patient is on a statin”) and \text{P(heart attack | no statin)}.

\text{P(heart attack | statin)} = \frac{\text{P(heart attack AND stain)}}{\text{P(statin)}}

How is the right hand side obtained?

One way is to take a sample of 10,000 patients. Randomly assign to statin, or not, and follow for ten years.

\text{P(statin)} involves the number of the 10,000 that were put on a statin. Say 5,000. Hence

\text{P(statin)} = \dfrac{5,000}{10,000}

\text{P(heart attack AND statin)} can be obtained at the 10 year follow up. Say that 300 of the patients on a statin had a heart attack.

\text{P(heart attack AND statin)} = \dfrac{300}{10,000}

Hence \text{P(heart attack | statin)} = \frac{\text{P(heart attack AND stain)}}{\text{P(statin)}} = \dfrac{300/10,000}{5,000/10,000}=\dfrac{300}{5,000}=0.06

Similarly, suppose that 600 patients had heart attacks when not on a statin.

\text{P(heart attack | no statin)} = \dfrac{600}{5,000}=0.12.

After showing the patient that a statin would halve his risk, he agrees to try the statin.


#10

Great information. And you mastered \LaTeX! One question is whether we need to envision large groups of patients to be able to pull this off. The 10,000 is a bit arbitrary, and probabilities apply to individuals.


#11

Big fan of \mathbb{\LaTeX}. Would you be willing to show how the conversation would look without using the large groups of patients?


#12

A good challenge. I’d start with P(statin) = 1/2 and show a thermometer in which 1/2 of the area of the rectangle is shaded. Then I would try the same with P(heart attack AND statin) although here is where the language gets trickier. Instead of having the simple concept of the 300 patients to hang our hat on, we’d have to just say something like “Suppose that the chance that a patient is put on a statin and ultimately has a heart attack is 0.03.” then have the thermometer next to it. In some cases, Venn diagrams help.

But from that wording you can see that the problem is set up more to illustrate Bayes’ rule than to be “linear”, because you could just say “Suppose that of patients put on statin the probability of a heart attack is 0.06.” Why not just start with that?


#13

With all due respect, I think if you want to teach probability to physicians you need to move away from the following perspectives:

MDs sometimes see probability as removed from what they do.

Some of this is shell shock from jumping through years of (expensive) hoops that require absorption of highly technical material never seen again.

shell-shocked MDs who are highly suspicious of technical concepts that seem unrelated to patient care

many MDs shut down when they see the symbols and calculations involved in probability

You are making assumptions and generalizations about a large audience of learners that just isn’t true. The problem with that is you then end up speaking down to them instead of just treating them as another audience that needs to learn the material. In my view, as a physician, is that this just isn’t stressed well in training…at all. It has nothing to due with scared, uninterested, shell-shocked, or “shut-down” physicians. Heck, stats and prob aren’t even required for med schools (most don’t require calculus either). Then in training you aren’t taught probability, you are taught “Epi,” because as you say, board exams require you memorize “cohort,” case-control," “RCT,” etc. so you can get a high score.

My colleagues aren’t scared or averse, but when you have to learn a mountain of information a day you need to justify why you need the physicians time. For instance, my specialty is surgery, and you need to prove to my colleagues why they should dedicate time to this instead of getting better with surgical technique, reading for the cases they want perform, making sure they can answer consults, and still study for boards. Not suspicious, but you have to show why learning Bayes rule is more valuable than something that they will need every day. Again, training has taught them that they don’t need training in probability, so you need to overcome that.

For educators of probability, using examples that the student can relate to is “always” better than not. The problem with learning probability though is that it requires some dedicated time, not a half hour or hour lecture–and all time once you enter med school is valuable. There is a bit of coercion that will be necessary to get all physicians on board to dedicate time to learn this material. It certainly would help if you illustrate why their intuition is wrong, and how to improve it. But in truth, to make this training work you need to start with curriculum changes or entrance requirement changes.


#14

I think Maarten is spot on here. I didn’t take probability in undergrad, but this course and Joe’s book are EXCELLENT for self learning probability. Also, if you feel spendy there is a Harvard Ext School course you can take too!


#15

Here is another exceptionally good resource for learning probability that I learned of over the weekend: http://wpressutexas.net/coursewiki/index.php/OpinionatedLessons.org/


#16

That is fantastic Brian, thanks for sharing!


#17

Sure thing! He’s got a wonderful teaching style.


#18

His discussion in video 2 of sometimes being unable to define exhaustive & mutually exclusive hypotheses is helpful, too, in understanding the limits of Bayesian analysis.


#19

For the record, did not mean to say anyone was scared and averse. Was meant as a compliment to say medical students shut out what might appear irrelevant (though probability is certainly not)—it just means they are learning how to extract signal from noise. It’s what makes for good students and later good doctors. The onus might unfortunately fall on the instructor to convince that probability is not noise. (This is not ideal, and to many here it’s clear that probability is highly important and growing in importance, but it may be a reality. My personal experience in medical school was that I usually overwhelmed by the amount of material I was expected to retain for, eg, board exams, and did not have time to learn new subjects that I was not convinced would relate to patient care or a board exam, so the “fact importance filter” was of necessity. I cannot speak for those who have graduated medical school, however.)

More requirements would help, but effecting that kind of change is not feasible for someone in my shoes. Also, often, to some degree, probability is already taught, but in a research context. For example, it’s introduced to show how hypothesis testing works, how statistics can describe a sample, and how population-level screening tests are evaulated. Most MDs spend more of their time on patient care than research, however. Relating probability to patient care is therefore, IMHO, a stronger motivator. Also, teaching research and probabiilty at once amounts, it seems, to teaching probability and mathematical statistics at once, which is too much. Finally, probability takes years to settle in, often beyond any required course. For example (n=1), only long after I learned about testing did I finally go back and use Bayes rule to derive PPV from sensitivity and specificity. It wasn’t some external medical course or a board exam that motivated me; it was my own sense that it was important for patient care (perhaps this is much less important for patient care than I thought…). Teaching students how probability relates to patient care, which they engage in every day, will give them motivation to take it beyond the required course, into their careers, where it will be solidified.

The crucial application of probability to clinical reasoning is often all but ignored. In my opinion, this is the biggest missed opportunity. Probabilistic reasoning should be emphasized not only in years 1 and 2, but also heavily throughout clinical training. Physicians reason probabilistically constantly, formulating priors and updating with new information. You could practically teach probability just by telling them what they do all day. However, I think the key here is to de-emphasize indirect things like post-test probability derived from tests and emphasize direct conditional probability P(dz | data). Also, it is important to teach, in addition to jarring examples like when PPV is very low despite high sens and spec, more realistic applications —most patients actually have much higher priors conditional on presentation. In this sense, I would actually argue that physician intuition is mostly right, and that studying probability allows them to justify, refine, and include the patients in medical decisions.


#20

I can’t agree with this. Understanding probability is essential in a field where almost everything that goes on in a sick human body is uncertain. This doesn’t change by the fact that your probability instructor might not be engaging. To say that as a medical student you are already good in separating relevant material from the irrelevant (‘noise’) is naive, in my opinion. You are certainly going to need to understand probability to be able to do research well. My prediction for the future is that the importance of probability will increase as well when more and more algorithms will inform medical decision making. But there is a chance I am wrong about that.