How to teach probability

f2harrell · August 11, 2018, 11:39am

Understanding probability is key to understanding uncertainty, quantifying evidence for effects, and making decisions in the face of uncertainty. In teaching many clinicians over many years I’ve found the lack of sufficient understanding of probability results in misunderstanding probabilistic diagnosis, sensitivity, specificity, prevalence, probabilistic prognostication, p-values, and number needed to treat. The problem is made more acute when patients have even more difficulties with probability than their physicians. Some patients and physicians even believe that probabilities do not apply to individuals, when in fact “playing the odds” is how we make most of our important decisions. Poker players and sports gamblers can master probability, so why can’t everyone else? Some persons misinterpret a probability of disease of 0.2 as saying “I’m the one in five who will be lucky” even though he acts appropriately when wagering on a football game.

There seems to be three general areas that users of probability need to understand:

Given that a probability has been correctly stated, what does that probability mean?
What is conditional probability, what do you condition on, and how do you compute it? [I’ve seen more physician confusion on conditioning than on any other aspect of probability, with many physicians not even clear on when they are conditioning on unknowable conditions, or on the future. Even more common is the use of incomplete conditioning, e.g. computing the risk of disease given a patient is > 60 years old when you already know the patient is 61 years old.]
How to operate on probabilities using the laws of probability to reverse the conditioning, compute probabilities of unions and intersections of conditions, and understand independence of events.

An attempt at a crash course in probability is in Section 3.8 of BBR. Suggestions for improving that section are most welcomed. The short section introduces the fundamentally different but all valid meanings of probability (limiting relative frequencies, subjective, etc.).

What suggestions do you have for resources and approaches for teaching probability to physicians and patients?

byrdjb · August 11, 2018, 12:57pm

Complicated topic. For example, I would say it’s definitely the case that there is no unique probability of a given event that applies to an individual since it must be conditioned on one or some of an infinite number of reference classes to which each person belongs. This could be interpreted as saying "probabilities don’t apply to individuals,’ but they’re not actually equivalent statements. Subtle topic, in my opinion.

“Your understanding is thoroughly bankrupt” is a tough starting place for the learner, whether it’s true or not. There is perhaps a different way to approach the gap between where people are & where they need to be to be effective, lest it seem like the old joke: “Oh, you want you go to London? If I were you, I wouldn’t start from here.”

Step 1 in my opinion: never ridicule, only build new capacities.

Step 2 IMO is to keep in mind that if the learners already have ideas, and you want to replace them because they’re not the right ideas, it’s different from an ordinary instance of teaching. It’s more akin to converting someone than it is like teaching physics to someone who has no beliefs about physics.

Step 3 IMO: keep the jargon to a bare minimum, and assume no one knows the nomenclature you would usually use to express probabilities.

Step 4: Let’s see what others say… Great topic.

f2harrell · August 11, 2018, 2:38pm

Thanks. If you have particular notation suggestions for the BBR section that would help. Regarding the philosophical point you raised I don’t want to get too much into that but you’ve hit the nail on the head. Probability is always subjective because it depends on what is in the mind of the beholder, e.g., what information is available to her. It’s all relative. I want to deal with the situation where we have enough information to condition on to make it interesting, and I never assume we know all the conditions that are relevant. Some conditioning is better than no conditioning, and conditioning on 3 variables is better than on 1 variable.

byrdjb · August 11, 2018, 3:17pm

Thanks. BBR probabilities section reads well, explains the nomenclature nicely. No recommendations for changing it. Could definitely be used to teach those ideas successfully.

MaartenvSmeden · August 11, 2018, 8:00pm

Excellent topic. Let me start by pointing out a MOOC by Joseph Blitzstein about probability: Introduction to probability. Free to follow unless you need a certificate. From what I have seen from it, it is super clear, and I really like the animated video’s.

Conditional probabilities and how they can be useful (and when they might not be) is definitely something I regularly talk about in my teaching. Conditional dependence of test outcomes and their impact on post-tests probability is another topic that comes up when talking about “independent” diagnostic tests.

I think “prediction” is one area where I think it is becoming increasingly relevant to talk about probability (surprisingly). The trend of machine learning predictive models that make blackbox classifications without probability prediction suggests an under-appreciation for probability (that this type of prediction seems to be accepted is mind-boggling). Another reason to think about how to improve our teaching about probabilities such that their value is better appreciated.

bgoodri · August 11, 2018, 11:48pm

I, for one, like to use a model for bowling when teaching probability (to graduate students). You can see / hear the videos I do at
https://youtu.be/-FeGc4c0Am8 (about 9:30 in) with corresponding markdown at
https://courseworks2.columbia.edu/courses/54170/files/folder/LectureMaterial/Week01

In short, bowling is something that is simple enough for almost everyone to understand but minimally complicated enough to illustrate almost all of the concepts in discrete probability. I like the example because how many pins get knocked down is plausibly independent across frames, but it should be fairly obvious to students that two rolls in the same frame of bowling are not independent; i.e. the probability of knocking down x_2 pins on the second roll depends explicitly on number of pins still standing, which is ten minus the number of pins knocked down on the first roll. So, right from the start you have to think about — and build up — a bivariate probability mass function for the two rolls in a frame of bowling. I like this better than starting with coins / dice / cards because when we actually use probability for modeling, it is always with PMFs and PDFs that have non-trivial functional forms.

Fortunately, David Neal and his master’s student Jennifer Hohn have worked out a simple PMF for bowling that I use. Basically, if you take the first 11 Fibonacci numbers (starting from the zero-ith) and divide by the thirteenth Fibonacci number less 1, you have a PMF for the first roll of a frame of bowling. Now you are ready to talk about conditional PMF for the second roll, given what happens on the first roll. In this example, you can use the same Fibonacci scheme but instead of normalizing the first 11 Fibonacci numbers, you use Fibonacci numbers up to one plus the number of pins still standing.

From there, you can introduce the multiplication rule to get the bivariate PMF for the probability of knocking down x_1 pins on the first roll of a frame and x_2 pins on the second roll of that frame. This bivariate PMF can be represented as a familiar table that is 11 \times 11 and you can talk about the two marginal distributions using the addition rule.

If you go fast, then by the end of the first lecture you can get to Bayes Rule for the probability of knocking down x_1 pins on the first roll of a frame given that x_2 pins were knocked down on the second roll of that frame, which is a bit of a contrived example but I like it better than the false positive rate of a medical test. I also do some Monte Carlo simulation that approximates the analytical solution. This usage of Bayes Rule isn’t really Bayesian, but it sets up the rest of a Bayesian course (I’ll reply to myself with a poker example that I use for homework that gets into the difference between classical, frequentist, and Bayesian perspectives on probability).

If you like this approach so far, you can see how I did the next lecture (Discourse won’t let me post the links but they are just the following ones on YouTube and Canvas) where we use the same bowling setup to get into expectations of functions of discrete random variables.

bgoodri · August 12, 2018, 12:06am

I love poker for teaching probability, and I use it in a homework problem . It is easy to find videos of interesting poker hands on YouTube and then ask probability questions about what happens or could have happened. You can ask fun questions about card combinatorics, but I especially like it for distinguishing between perspectives on probability:

Classical: Each card that is not already visible to a player has an equal chance of being turned up next
Bayesian: Every act in poker is a decision problem and it is easy to see that poker players are making decisions in order to maximize their expected utility, although the utility may be for winning prize money rather than winning plastic poker chips in a particular hand. And you update beliefs several times in a hand as cards get turned over and bets get made.
Frequentist: More so than in almost all situations in the social / medical sciences, poker actually has repeated sampling of cards from a randomly shuffled deck. To be a professional poker player, you sort of do need to choose a rule (strategy) for how you will play in a given situation and adhere to it in order to obtain the expectation of the strategy over many hands where that situation arises. Of course, it isn’t a perfect frequentist setup because one hand is not independent of the others (with the same players), the sizes of the chip stacks are always changing in important ways, as well as who is (to the left / right of) the dealer.

f2harrell · August 12, 2018, 12:17pm

Nice. I think you can extend this to teach backwards probabilities and why we usually want to avoid them. What is the probability that 7 pins were knocked down on the first roll when the ultimate result was a spare?

I think that the Fibonacci sequence may cloud the issues.

f2harrell · August 12, 2018, 2:23pm

Great information. And you mastered \LaTeX! One question is whether we need to envision large groups of patients to be able to pull this off. The 10,000 is a bit arbitrary, and probabilities apply to individuals.

samw235711 · August 12, 2018, 8:12pm

Big fan of \mathbb{\LaTeX}. Would you be willing to show how the conversation would look without using the large groups of patients?

f2harrell · August 13, 2018, 12:30pm

A good challenge. I’d start with P(statin) = 1/2 and show a thermometer in which 1/2 of the area of the rectangle is shaded. Then I would try the same with P(heart attack AND statin) although here is where the language gets trickier. Instead of having the simple concept of the 300 patients to hang our hat on, we’d have to just say something like “Suppose that the chance that a patient is put on a statin and ultimately has a heart attack is 0.03.” then have the thermometer next to it. In some cases, Venn diagrams help.

But from that wording you can see that the problem is set up more to illustrate Bayes’ rule than to be “linear”, because you could just say “Suppose that of patients put on statin the probability of a heart attack is 0.06.” Why not just start with that?

TFeend · August 13, 2018, 5:31pm

With all due respect, I think if you want to teach probability to physicians you need to move away from the following perspectives:

MDs sometimes see probability as removed from what they do.

Some of this is shell shock from jumping through years of (expensive) hoops that require absorption of highly technical material never seen again.

shell-shocked MDs who are highly suspicious of technical concepts that seem unrelated to patient care

many MDs shut down when they see the symbols and calculations involved in probability

You are making assumptions and generalizations about a large audience of learners that just isn’t true. The problem with that is you then end up speaking down to them instead of just treating them as another audience that needs to learn the material. In my view, as a physician, is that this just isn’t stressed well in training…at all. It has nothing to due with scared, uninterested, shell-shocked, or “shut-down” physicians. Heck, stats and prob aren’t even required for med schools (most don’t require calculus either). Then in training you aren’t taught probability, you are taught “Epi,” because as you say, board exams require you memorize “cohort,” case-control," “RCT,” etc. so you can get a high score.

My colleagues aren’t scared or averse, but when you have to learn a mountain of information a day you need to justify why you need the physicians time. For instance, my specialty is surgery, and you need to prove to my colleagues why they should dedicate time to this instead of getting better with surgical technique, reading for the cases they want perform, making sure they can answer consults, and still study for boards. Not suspicious, but you have to show why learning Bayes rule is more valuable than something that they will need every day. Again, training has taught them that they don’t need training in probability, so you need to overcome that.

For educators of probability, using examples that the student can relate to is “always” better than not. The problem with learning probability though is that it requires some dedicated time, not a half hour or hour lecture–and all time once you enter med school is valuable. There is a bit of coercion that will be necessary to get all physicians on board to dedicate time to learn this material. It certainly would help if you illustrate why their intuition is wrong, and how to improve it. But in truth, to make this training work you need to start with curriculum changes or entrance requirement changes.

TFeend · August 13, 2018, 5:32pm

I think Maarten is spot on here. I didn’t take probability in undergrad, but this course and Joe’s book are EXCELLENT for self learning probability. Also, if you feel spendy there is a Harvard Ext School course you can take too!

byrdjb · August 13, 2018, 6:26pm

Here is another exceptionally good resource for learning probability that I learned of over the weekend: http://wpressutexas.net/coursewiki/index.php/OpinionatedLessons.org/

TFeend · August 13, 2018, 6:39pm

That is fantastic Brian, thanks for sharing!

byrdjb · August 13, 2018, 6:47pm

Sure thing! He’s got a wonderful teaching style.

byrdjb · August 13, 2018, 6:57pm

His discussion in video 2 of sometimes being unable to define exhaustive & mutually exclusive hypotheses is helpful, too, in understanding the limits of Bayesian analysis.

MaartenvSmeden · August 13, 2018, 10:30pm

I can’t agree with this. Understanding probability is essential in a field where almost everything that goes on in a sick human body is uncertain. This doesn’t change by the fact that your probability instructor might not be engaging. To say that as a medical student you are already good in separating relevant material from the irrelevant (‘noise’) is naive, in my opinion. You are certainly going to need to understand probability to be able to do research well. My prediction for the future is that the importance of probability will increase as well when more and more algorithms will inform medical decision making. But there is a chance I am wrong about that.

samw235711 · August 14, 2018, 1:10am

@MaartenvSmeden I could not agree with you more (see edits).

samw235711 · August 14, 2018, 11:53am

@MaartenvSmeden referring to your first post: I have a question that is slightly unrelated to teaching probability. I was wondering whether a classification is a prediction or something else entirely. A prediction is always a probability because it involves uncertainty. Would you say that a prediction must be a probability?

@f2harrell

I agree that this is more direct, but still having some trouble understanding. In this case, where would the 0.06 be from? Eg, would it be the true population parameter instead of an estimate?