Comparing predicted outcomes to measured outcomes, study design

We want to look at a particular subset of patients, and our primary interest is: What is the difference between the accuracy of predictions by healthcare practitioners of an outcome compared to the actual 6-month outcome. The outcome is measured on a yes/no basis. (ie: will this patient die, yes or no). Each patient will have several different practitioners that have predicted an outcome (so there will be several predictions for each true outcome).

Let’s say we are estimating that, on average, 20% of practitioners will predict a worse outcome than is actually encountered, how would I calculate the power needed to show this outcome. Power of 0.8 as standard, and p<0.05 for significance. For the record, for this subset of patients 0.48 will be the proportion with the measured outcome of interest.

Also, what would the best statistical test be to compare predicted outcomes to actual outcomes?

Thank you so much for your help. I’m a much better clinician than a statistician so I am grateful for your input.

1 Like

I don’t think your setup is symmetric. To compare accuracy you’ll need another method that competes with the clinicians. Then if time is not important so you can treat the outcome as binary, you can compute the Brier score for the clinicians and for the competing methods, then use a method for testing the difference in two correlated Brier scores.

Thank you for your reply. I apologize, what do you mean that it isn’t symmetric? Also, I am sorry if my wording was not clear. It is not so much that we want to compare accuracy of two different groups, but that we want to determine how accurate clinician predictions are (in comparison to the actual outcome).

Since we don’t know the actual probabilities of outcomes, we have two choices:

  1. Acquire a huge number of clinician estimates so you can estimate a smooth calibration curve against the Y=0,1 outcomes
  2. Assess the overall accuracy using a proper scoring rule such as Brier (hard to judge its absolute value though)
  3. Test for calibration accuracy of the clinicians using the Spiegelhalter test

All of these things are provided in the R rms package val.prob function.

1 Like

Thank you again for the input and clarification. 1 is certainly not feasible for this project. Of 2 and 3, which do you believe would be more practical in this case?

I’d do both. Make sure you don’t prompt the clinicians in a way that produces a lot of ties in the prognostic assessments.

1 Like