Jan 4, 2019
Ivo Abraham, PhD, RN, is a professor of Pharmacy and Medicine at the University of Arizona (Tucson, AZ); he is also affiliated with the Center for Health Outcomes & PharmacoEconomic Research and the Arizona Cancer Center. He is a member of the Big Data Working Group.
The promises of Big Data are intuitively appealing: (virtually) unlimited data that will enable us to answer (virtually) any questions that we may have. Unfortunately, by and of themselves, Big Data are rather useless. They require Deep Analytics: inquiring people equipped with engines of analysis to explore, discover, and invent.
What should these inquiring people focus on? In The Emperor of
All Maladies, Siddhartha Mukherjee identifies three new directions
for cancer medicine: therapeutics, prevention, and explaining the
(genetic) behavior of cancer. With Big Data, we can cover these
three fronts simultaneously: molecules to models of care; patients
to populations; and empirics to evidence.
What are the engines of analysis in Deep Analytics? Conventional
biostatistics will continue to be useful but only to generate more of
the same: more description, though with greater precision; more
comparisons between groups, just more and larger groups; more
Kaplan-Meier curves, but still dipping down against time; and more regressions predicting one variable from other variables, but with greater accuracy. We need to bridge over to disciplines outside healthcare and integrate their analytical methods.
To give some examples, complexity reduction analytics help us
find embedded structures, patterns, and trends in patients, diseases, treatments, and outcomes—in time and over time. Signals of interest may be crowded over by other signals; discrimination analytics assist us in distinguishing between signals and extracting the signals of interest. Aggregation methods help us find patients, symptoms,
diseases, treatments, and outcomes that are similar and dissimilar, and cluster together or differentiate themselves. We may be able to identify profiles of patients at risk of poor treatment outcomes, or most likely to benefit from a given treatment. We can shift from identifying patient risk factors to anticipating, identifying, and managing patients at risk. We can detect patterns of variables and processes that explain why some patients respond to treatments, why others do not, and why most do to some extent.
In this, we should use analytics that let data talk for themselves;
rather than have them say what we want them to say. We can test
“causal” models that help us understand the interplay of various
factors in treatment outcomes. We should let data sketch out patterns of cause and consequence, of predisposition and exception, of treatment and outcomes. We may let data draw themselves out into flow charts that help us understand what happens as patients are treated; or in decision trees that assist us in deciding which patients would benefit most from an array of treatment options. To better plan treatment, we can develop complex and targeted simulations of treatments and treatment outcomes based on patient and disease characteristics. Lastly, we should combine “old” engines with the more recent generation of artificially intelligent engines. As much of Big Data is unstructured, natural language processing engines can extract data out of text or speech. Machine-learning engines work from data presented to them to construct prediction and decision models and algorithms.