Canonical Correlation Analysis Question

venkmurthy · July 31, 2020, 4:41pm

Hi Everyone!

I have been interested in using sparse canonical correlation analysis as implement in the R PMA package. I’m finding that the correlations output by the package seem slightly inconsistent with the ones you would get by taking the correlations between the sparse CCA scores. There is also some output from the PMA:CCA function which seems like debugging output during its run.

Here is a reproducible example:

# CCA with penalization function imported from PMA
library(PMA)

# Use CCA only for sample dataset nutrimouse
library(CCA)

# Grab sample data
data("nutrimouse")

# Set up X and Y matrices
X=as.matrix(nutrimouse$gene)
Y=as.matrix(nutrimouse$lipid)

# Run penalized CCA - also has some strange ?debugging? output
pma <- PMA::CCA(X,Y,K=3)

# Calculate scores
score.x <- X %*% pma$u
score.y <- Y %*% pma$v

# Calculate correlations between scores
sapply(1:3,FUN=function(x) cor(score.x[,x],score.y[,x],method="pearson"))

# Scores output by module - shouldn't this be identical to output from line above?
pma$cors

Which gives the following output:

> 
> # Grab sample data
> data("nutrimouse")
> 
> # Set up X and Y matrices
> X=as.matrix(nutrimouse$gene)
> Y=as.matrix(nutrimouse$lipid)
> 
> # Run penalized CCA - also has some strange ?debugging? output
> pma <- PMA::CCA(X,Y,K=3)
12
123456789
12345
> 
> # Calculate scores
> score.x <- X %*% pma$u
> score.y <- Y %*% pma$v
> 
> # Calculate correlations between scores
> sapply(1:3,FUN=function(x) cor(score.x[,x],score.y[,x],method="pearson"))
[1] 0.8438400 0.8200629 0.8200418
> 
> # Scores output by module - shouldn't this be identical to output from line above?
> pma$cors
[1] 0.8802205 0.8460601 0.8477227

Shouldn’t the last two lines output be the same? Any hints?

Thanks in advance,

Venk

edited to add output

f2harrell · July 31, 2020, 7:13pm

Venk if you don’t get any answers here I’d suggest stackoverflow.com tag: r.

venkmurthy · August 1, 2020, 1:58pm

Thanks! Cross-posted to stackoverflow here: https://stackoverflow.com/questions/63199824/penalized-canonical-correlation-in-r-with-pma-module

venkmurthy · August 28, 2020, 8:56pm

Solution identified and posted on Stack Overflow link above . Rookie error. Turns out I mistakenly forgot to restandardize the data prior to calling CCA. On the actual data I was working with the data was ever so slightly out of N(0,1) so the discrepancy was small.

The other strange output is apparently a progress meter, though I do not understand the scale.