Test/Teaching Data Suitable for Canonical Correlation Analysis

Does anyone know of any large, freely available datasets suitable for canonical correlation analysis (CCA) teaching/testing examples?

I know of nutrimouse from #RStats, but I would prefer one where the number of observations exceeds the number of variables in each of the domains.

Thanks in advance!


Here is a strange idea for a type of data you might seek (if they exist). NHANES has wonderful data on various body size measurements and HbA1c and many other lab parameters. In the chapter on ordinal analysis of continuous Y in RMS I use body size variables and others to predict HbA1c. If there is another lab value (e.g., triglyceride or glucose) available from the same NHANES panel you could jointly predict two or more lab values from all the body size measurements. It would be interesting to show, for example, that it doesn’t help to predict glucose once you are predicting HbA1c.

1 Like

Great suggestion. Thanks!