Blog
Article
The promises of Big Data are intuitively appealing: (virtually) unlimited data that will enable us to answer (virtually) any questions that we may have.
The promises of Big Data are intuitively appealing: (virtually) unlimited data that will enable us to answer (virtually) any questions that we may have. Unfortunately, by and of themselves, Big Data are rather useless. They require Deep Analytics: inquiring people equipped with engines of analysis to explore, discover, and invent.
What should these inquiring people focus on? In The Emperor ofAll Maladies, Siddhartha Mukherjee identifies three new directionsfor cancer medicine: therapeutics, prevention, and explaining the(genetic) behavior of cancer. With Big Data, we can cover thesethree fronts simultaneously: molecules to models of care; patientsto populations; and empirics to evidence.
What are the engines of analysis in Deep Analytics? Conventionalbiostatistics will continue to be useful but only to generate more ofthe same: more description, though with greater precision; morecomparisons between groups, just more and larger groups; moreKaplan-Meier curves, but still dipping down against time; and more regressions predicting one variable from other variables, but with greater accuracy. We need to bridge over to disciplines outside healthcare and integrate their analytical methods.
To give some examples, complexity reduction analytics help usfind embedded structures, patterns, and trends in patients, diseases, treatments, and outcomes—in time and over time. Signals of interest may be crowded over by other signals; discrimination analytics assist us in distinguishing between signals and extracting the signals of interest. Aggregation methods help us find patients, symptoms,diseases, treatments, and outcomes that are similar and dissimilar, and cluster together or differentiate themselves. We may be able to identify profiles of patients at risk of poor treatment outcomes, or most likely to benefit from a given treatment. We can shift from identifying patient risk factors to anticipating, identifying, and managing patients at risk. We can detect patterns of variables and processes that explain why some patients respond to treatments, why others do not, and why most do to some extent.
In this, we should use analytics that let data talk for themselves;rather than have them say what we want them to say. We can test“causal” models that help us understand the interplay of variousfactors in treatment outcomes. We should let data sketch out patterns of cause and consequence, of predisposition and exception, of treatment and outcomes. We may let data draw themselves out into flow charts that help us understand what happens as patients are treated; or in decision trees that assist us in deciding which patients would benefit most from an array of treatment options. To better plan treatment, we can develop complex and targeted simulations of treatments and treatment outcomes based on patient and disease characteristics. Lastly, we should combine “old” engines with the more recent generation of artificially intelligent engines. As much of Big Data is unstructured, natural language processing engines can extract data out of text or speech. Machine-learning engines work from data presented to them to construct prediction and decision models and algorithms.