Observations on Big Data, Precision Health, and Machine Learning

Pertinent to this discussion thread, I wish to draw attention to a special issue of Perspectives in Biology and Medicine. Volume 61, Number 4, Autumn 2018 with the title: The Precision Medicine Bubble. The issue takes up the topic of precision medicine in some depth. It has a refreshingly cynical take on the contributions of precision medicine to date.”

https://muse.jhu.edu/issue/39661

Particular attention is drawn to the paper by Sui Huang.

Huang S. The Tension Between Big Data and Theory in the “Omics” Era of Biomedical Research. Perspect Biol Med. 2018;61(4):472-488. doi: 10.1353/pbm.2018.0058. PMID: 30613031.

Contrasting the value of “big data” in commerce and biomedicine, Huang notes that:

The problem is that the spectacular success of the internet-based applications of Big Data has tempted biomedical researchers to think that using clever algorithms to mine and comb the vast amount of data produced by the omics revolution will recapitulate the success of Google, Amazon, or Netflix.

Going further, Huang says:

The fundamental differences between the natural sciences, on the one hand, which seek new understanding of organisms, and “data sciences” on the other hand, which serve consumer applications, give rise to formidable challenges for a quick adoption of the Big Data approach to biology and medicine. The human body and its (mal)functions are more complex than recognizing cats in photos or predicting client habits from purchase history. The latter tasks can, without denigration, be considered “superficial”: here data directly maps to utility without the need of a theory that formalizes our understanding of the mechanism of how data translate into useful knowledge. But such heuristics does not lead far in basic sciences, notably the life sciences.

In the same issue of this journal, epidemiologists Nigel Paneth and Sten Vermund highlight major advances in public health made over the last century. None used big data, precision medicine, or machine learning.

Considering precision medicine, Paneth and Vermund go so far as to posit that:

Precision medicine built on a foundation of host genetics can benefit some patients, but it has no realistic chance of linking human genetics to population-level health improvement. There are too few diseases where human genetic variation will make a substantial difference in approaches to screening, diagnosis, or therapy to justify the disproportionate investments into this approach as a principal priority for the NIH and for the private sector.

Paneth N, Vermund SH. Human Molecular Genetics Has Not Yet Contributed to Measurable Public Health Advances. Perspect Biol Med. 2018;61(4):537-549. doi: 10.1353/pbm.2018.0063. PMID: 30613036.

Also in this issue, epidemiologist Richard S. Cooper reviews the history of the “cardiovascular prevention movement” over the last 40 years, highlighting its spectacular success in reducing mortality due to cardiovascular disease. He mentions two pivotal observational studies that might be called “little data”–the Framingham Study with 5,209 men and women and the “Seven Countries Study” with about 12,000 men—yet identified the modifiable risk factors for cardiovascular disease (serum cholesterol, hypertension, cigarette smoking) that are the cornerstone of interventions that account for a substantial proportion of the decline in cardiovascular disease mortality.

Cooper notes that:

It will not escape the notice of the reader of this issue of the journal that genomics and “precision medicine” have, to date, made no contribution whatsoever to control of CVD as a mass disease.”

Cooper RS. Control of Cardiovascular Disease in the 20th Century: Meeting the Challenge of Chronic Degenerative Disease. Perspect Biol Med. 2018;61(4):550-559. doi: 10.1353/pbm.2018.0064. PMID: 30613037.

3 Likes

Useful articles but some of the claims are overstated. Big data and traditional machine learning are in many ways the opposite of, or at least orthogonal to, precision medicine which bases its inferences on mechanistic considerations regarding the biology of the disease and other covariates of each patient seen in clinic. There is a fair number of my patients that should have been dead from their cancer based on population-level guidelines and recommendations. But they are alive thanks to patient-specific interventions derived from biological knowledge. One such recent example is described here (from 1:23:00 onwards).

A good summary from a data science perspective of the distinction and trade-offs between patient relevance (a focus of precision medicine) and population-level robustness can be found here. Some inferential approaches will typically provide better balances between relevance and robustness. However, different stakeholders will irrespectively have different opinions on the optimal trade-offs between the two. And that is OK.

3 Likes

I really like the Huang article cited above. It focuses primarily on the unrealistic expectations of “Big Data” proponents and doesn’t really discuss precision medicine per se.

I didn’t understand much of your talk in the link you provided (not being an oncologist). But I think your general point is that there are situations in medicine where obtaining very specific knowledge about an individual patient can meaningfully alter therapeutic decisions and prognosis. This is particularly true in oncology, where, for example, a deep understanding of the genetics/biology of a given patient’s tumour can (sometimes) suggest more or less rational treatment choices. If we know that the tumour in a given patient is not being driven by a certain biologic pathway, then giving the patient a drug that inhibits that pathway will be futile.

Your assertion that the “Big Data” hype is in many ways “orthogonal” to precision medicine feels on point. It feels like the (relatively) small number of success stories in precision medicine (e.g., trastuzumab) might have been misconstrued by those with careers focused on data analysis (rather than biology/medicine) as evidence of the inherent value of simply gathering more and more biologic data in a hypothesis-free manner. Far from understanding the intentionality (painstaking triangulation of data sources) that likely underlies development of effective targeted therapies, Big Data proponents seem to be under the illusion that such therapies arose simply through “brute force” computerized analysis of reams of biologic data.

History is a great teacher- in science, we don’t spend enough time taking stock of how we got to where we are. Someone should write a paper, using examples of precision medicine “success” stories, highlighting the key role of intentionality in development of targeted therapies, contrasting these stories with the hypothesis-free data-dredging exercises being proposed by many Big Data proponents.

5 Likes

Could not have said it better. The bigger the data, the more pertinent it becomes that they are anchored by contextual knowledge. Otherwise big data serve as an excellent way to fool ourselves in ways that can harm patient care.

4 Likes

it’s interesting that the recent call from the fda inviting companies to suggest the use of RWD refers to ‘new study designs’ and innovative designs: https://www.raps.org/news-and-articles/news-articles/2022/10/fda-starts-pdufa-vii-programs-for-real-world-evide, showing that statisticians in industry have the opportunity to advance methods, and this is not done entirely in academia. Although we have to wait some time to see what new study designs they come up with. I hope people share in this thread any technical papers that begin to appear

4 Likes

i just noted a paper on linkedin: Paul Brown on LinkedIn: Principles of Experimental Design for Big Data Analysis

“experimental design for big data”, they describe a sequential design approach, ie an algorithm to subset the data using experimental design methodology following savage - select the design that maximises the expected utility: “our objective is to avoid the analysis of the big data of size N by selecting a subset of the data of size n using the principles of optimal experimental design where the goal of the analysis is predefined” Principles of Experimental Design for Big Data Analysis

i find it interesting/ appealing, a pseudo-Bayes approach that discards the prior

1 Like

open access: Contribution of Real-World Evidence in European Medicines Agency’s Regulatory Decision Making

quite a list of inherent and seemingly intractable problems: “The main issues discussed with respect to RWE were around methodological weaknesses, including missing data, lack of population representativeness, small sample size, lack of an adequate or prespecified analysis plan, and the risk of several types of confounding and bias (mostly selection bias), which was in line with previous studies.”

2 Likes