Principal Components analysis - any useful hints and references

SueMallett · March 1, 2021, 3:17pm

With radiologist colleagues, I have been writing a short article on a fews do’s and don’ts in prediction modelling for radiologists.

Can anyone recommend articles with good guidance on when and how to use Principal Component Analysis in prediction models? Articles discussing radiomics would be particularly useful. Reviewers have asked for some more detail on this aspect.

Thank you for your suggestions and thoughts.

BW Sue Mallett

mshapiro123 · March 5, 2021, 6:54pm

I’m sure you’ve seen this old review: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4040248/. I think a lot of people who start machine learning do an exercise with the Wisconsin breast cancer dataset and PCA is presented as a way to speed the ML classification algorithm. It can also be very useful for presentation of classes. Here is a great vignette: https://www.datacamp.com/community/tutorials/principal-component-analysis-in-python. There is a review on this topic here: Detection of Breast Cancer Through Clinical Data Using Supervised and Unsupervised Feature Selection Techniques - IEEE Journals & Magazine. If you’re talking about a large number of observed/ or measured features from an image it can be helpful to reduce the dimentionality when many of the features are correlated. What I’m not sure about is how it is used in the context of pixel or voxel level image analysis. For those applications, the field has moved to pre-processing with more advanced dimentionality reduction techniques. In the last two years, UMAP has been supplanting t-SNE but both have properties that are better-suited to understanding distant and non-linear relationships (Comparison of Dimension Reduction Techniques — umap 0.5 documentation). The main advantage for PCA over almost any other technqiue is speed when dealing with very large high-dimensional data, but even there competing techniques are close (Performance Comparison of Dimension Reduction Implementations — umap 0.5 documentation). PCA may also have the advantage that it is easier to explain than probabilistic or manifold techniques.