Presentation of the results of a prediction model

f2harrell · February 10, 2023, 1:16pm

Regarding “2 patients” I was trying to point out that it can be dangerous to not include overall uncertainies in estimates. On the main part of your note, I think that the simulation of 0/1 realizations is just a long way to deal with posterior draws of the predicted risk. You’re approach will take a lot more draws than the original method to do the same thing, no?

Minor point on nomenclature: I think it’s cleaner to use \text{logit}(p) and \text{expit}(x).

js592 · February 10, 2023, 6:00pm

Thank you for your input. This has been an interesting discussion. I wrote a simulation to see whether presenting an uncertainty interval for p versus a posterior probability would lead to different decisions:

Using an intercept only logistic regression model for simplicity, assume posterior is \pi(\theta|\mathbf{y}) \sim \text{Normal}(\mu = -1, \sigma = 1)

set.seed(10)
theta <- rnorm(100000,-1,1)
hist(theta)

Represent posterior distribution of p with samples \{p^{(i)}\} = \{\text{expit}(\theta^{(i)})\}

p <- plogis(theta)
hist(p)
quantile(p, probs = c(0.025,0.975))

Or consider posterior predictive probability \text{Pr}(Y=1)

set.seed(10)
y_rep <- rbinom(100000,1,p)
hist(y_rep)
mean(y_rep)
p_pred <- mean(y_rep)

Are the results contradictory? To investigate, let’s see how the competing interpretations might affect our decision making.

With utility function U(y=1) = \$500 and U(y=0) = \$-1000 use Monte Carlo simulation to represent distribution of utility:

Interval Method

util <- rep(0, 100000)
for (i in 1:100000){
  y <- rbinom(1,1,p[i])
  util[i] <- y*500-1000*(1-y)
}
hist(util)
mean(util)
sd(util)

Posterior Probability Method

util <- rep(0,100000)
for (i in 1:100000){
  y <- rbinom(1,1,p_pred)
  util[i] <- y*500-1000*(1-y)
}
hist(util)
mean(util)
sd(util)

Gives the same results (to simulation accuracy). I think this would hold in general, which if true would entirely resolve my confusion.