BBR Session 12: Correlation

f2harrell · January 29, 2020, 3:23pm

This is a topic for questions, answers, and discussions about session 12 of the Biostatistics for Biomedical Research web course airing on 2020-01-31. Session topics are listed here. The session covers correlation coefficients, precision and sample size calculations for them, and the use of the bootstrap and Monte Carlo simulation to study and compensate for the effects of fishing expeditions based on examining many possible correlations.

Stanley_Kay · June 27, 2020, 10:54pm

This question is of a general nature.

I am planning to report the correlation between a predictor and overall survival (OS). My approach is to fit a cox model with this variable as the sole predictor. My next step will be to use the validate() function which is implemented in the rms package to report the Somers Dxy or the C-statistic (using 1/2Dxy + 0.5). Alternatively, I can pull the correlation measure from R^2 by applying a square root function i.e. sqrt(R^2).

Is this a good approach and is there anything I am missing about how validate() works under-the-hood?

Thanks!

f2harrell · June 28, 2020, 1:45am

I think it’s a good approach. The only tricky part is that validate needs to be handed all the candidate variables that were used in any supervised learning steps.

James_Smith · April 14, 2021, 8:37pm

In section 8.4 there is a simulation of the correlation between x and y:
rho ← 0.7; n ← 50
var.eps ← rho^-2 - 1
x ← rnorm(n, 5, 1)
y ← x + rnorm (n , 0 , sqrt (var.eps))
cor(x,y)

What is var.eps here? And why does taking the root of that and using it as the sd give the desired correlation between x and y?

f2harrell · April 14, 2021, 10:35pm

Let Y = X + U where X \sim n(5, 1) and U \sim n(0, \sigma^2) where X and U are independent. We want \sigma^2 such that cor(X, Y) = cor(X, X+U) = r. V(Y) = V(X) + V(U) = 1 + \sigma^2 since X and U are independent. cov(X, Y) = cov(X, X+U) = E(X(X+U)) - E(X)E(X+U) = E(X^2) - E(X)^2 since E(U)=0 and X and U are independent. So cov(X, Y) = V(X) = 1.

The correlation between X and Y is \frac{cov(X, Y) }{\sqrt{V(X)V(Y)}} = \frac{1}{\sqrt{1 + \sigma^2}}. Solving for \sigma^2 gives you \frac{1}{r^2} - 1.

So sampling U with that \sigma^2 will induce correlation r between X and Y.

James_Smith · April 22, 2021, 1:29pm

Thanks a lot for this response.

I follow the second part, i.e. that cov(X,Y) = 1 (though I think there might be a typo at the beginning of the explanation, and that it should read X \sim n(5,1) rather than n(1,5)?)

I’m struggling to follow these steps: cor(X, X+U) = r . V(Y) = V(X) + V(U)

And also, why can we substitute 1 + \sigma^2 for V(X)V(Y) in the denominator of the correlation between X and Y?