Big Data, Deep Analytics, Better Outcomes

By Ivo Abraham, PhD, RN

  January 4, 2019

The promises of Big Data are intuitively appealing: (virtually) unlimited data that will enable us to answer (virtually) any questions that we may have. Unfortunately, by and of themselves, Big Data are rather useless. They require Deep Analytics: inquiring people equipped with engines of analysis to explore, discover, and invent.

What should these inquiring people focus on? In The Emperor of All Maladies, Siddhartha Mukherjee identifies three new directions for cancer medicine: therapeutics, prevention, and explaining the (genetic) behavior of cancer. With Big Data, we can cover these three fronts simultaneously: molecules to models of care; patients to populations; and empirics to evidence.

What are the engines of analysis in Deep Analytics? Conventional biostatistics will continue to be useful but only to generate more of the same: more description, though with greater precision; more comparisons between groups, just more and larger groups; more Kaplan-Meier curves, but still dipping down against time; and more regressions predicting one variable from other variables, but with greater accuracy. We need to bridge over to disciplines outside healthcare and integrate their analytical methods.

To give some examples, complexity reduction analytics help us find embedded structures, patterns, and trends in patients, diseases, treatments, and outcomes—in time and over time. Signals of interest may be crowded over by other signals; discrimination analytics assist us in distinguishing between signals and extracting the signals of interest. Aggregation methods help us find patients, symptoms, diseases, treatments, and outcomes that are similar and dissimilar, and cluster together or differentiate themselves. We may be able to identify profiles of patients at risk of poor treatment outcomes, or most likely to benefit from a given treatment. We can shift from identifying patient risk factors to anticipating, identifying, and managing patients at risk. We can detect patterns of variables and processes that explain why some patients respond to treatments, why others do not, and why most do to some extent.

In this, we should use analytics that let data talk for themselves; rather than have them say what we want them to say. We can test “causal” models that help us understand the interplay of various factors in treatment outcomes. We should let data sketch out patterns of cause and consequence, of predisposition and exception, of treatment and outcomes. We may let data draw themselves out into flow charts that help us understand what happens as patients are treated; or in decision trees that assist us in deciding which patients would benefit most from an array of treatment options. To better plan treatment, we can develop complex and targeted simulations of treatments and treatment outcomes based on patient and disease characteristics. Lastly, we should combine “old” engines with the more recent generation of artificially intelligent engines. As much of Big Data is unstructured, natural language processing engines can extract data out of text or speech. Machine-learning engines work from data presented to them to construct prediction and decision models and algorithms.