Big Data Can Save Health Care—But at What Cost to Privacy?

Medical research would benefit greatly from massive, publicly shared sets of patient information.

big health data-body.jpg


Last month, 30 experts from various backgrounds convened by the Kauffman Foundation issued a report, "Valuing Health Care," that offered some familiar and some not-so-familiar recommendations for improving health-care outcomes and cost-effectiveness. Several of the lesser-known ones focused on how to best collect, aggregate, and share more and better patient health data.

The data at issue includes: patients' medical records, which are now typically held by multiple doctors and hospitals; information that only patients know (e.g., behaviors and life and job histories); genetic information that is only just becoming cheaply available; and health data generated by researchers, pharmaceutical companies, and insurers.

Although health data are highly sensitive and thus require protection, they are also a public good. The more data that researchers are able to analyze, the better chances they have for detecting patterns that can lead to fewer wasteful (and often painful) procedures and tests, and for finding new causes, treatments, and even cures for diseases.

Today, however, the best data analyzers tend to work for financial firms, Google, Facebook, and other high-tech companies, because that's where both the money and the data are. An imperative both for cutting costs and improving treatments in health care is to find creative ways to encourage individuals to divulge data about themselves on a depersonalized basis to databases that can be easily accessed by health care researchers. Once the data are available, they will attract the analytical talent much like honey attracts bees.

Solving the nation's most entrenched problems See full coverage

Ironically, patient medical records are of only limited value to researchers. Apart from lab and test results, which can be useful, physicians' notes are collected episodically and prepared largely with an eye toward getting paid. Making these records "electronic" won't change their fundamental nature. What researchers could most profit from are databases that link genetic information, medical histories, areas where patients have lived and worked, and how patients behave.

The best way to collect such information is simply to ask people for it. That is exactly what Dr. Susan Love is doing through her "Army of Women," which has so far compiled data on more than 365,000 women concerned about breast cancer. Importantly, just 30 percent of those who have signed up actually have had the disease -- so by following the histories of the recruits to "Love's Army," researchers can compare individuals who do and not have breast cancer, making it easier to isolate the causal factors.

More philanthropies and other organizations battling disease should follow Dr. Love's lead and begin to ask patients for data, too. Employers, for their part, should encourage employees to sign up with organizations of their choice. Further, the organizations initially collecting the data should ask patients who sign up to let the organizations share their data with other research groups. Eventually there may be multiple databases, some of whose members have consented to share their data with other organizations and others who have not. But collectively the organizations should have millions of volunteers whose histories can be tracked by researchers who look for clues to disease origins and possible cures.

Still, at an Atlantic-sponsored event held on April 19 at which the Kauffman report was unveiled, some audience members raised legitimate concerns about the security of these data and whether they could be truly depersonalized. John Wilbanks, a member of the Kauffman Task Force on Cost-Effective Health Care Innovation, which prepared the report, and one of the nation's leading thinkers about data-mining and its risks and benefits, said that there are plenty of gifted hackers who, with the limited public information currently available, can identify each of us with a high degree of likelihood already. He implied that there is a large generational divide between younger people who essentially know or suspect this and just live with it -- look at what's exchanged on Facebook every day -- and older people who are deeply concerned about their privacy and fearful of putting their sensitive health data in any database, however seemingly trustworthy.

My view is that depersonalization at least should be tried, and even if it always doesn't work, many people who want faster cures will be willing to provide their data anyhow.

Would-be or actual patients are not the only ones who should be asked to share their data. A more aggressive mandate to share should exist for researchers themselves, especially when they are funded by taxpayers. The National Institutes of Health (NIH) currently requires research grantees receiving at least $500,000 in federal support to file plans for making their data available to other researchers. This minimalist requirement should not only be more effectively enforced, but NIH should also adopt policies requiring grantees to actually share their data.

Of course, there are many other fixes required to make the health care market look like the markets for other goods and services in our economy -- like transparent pricing and better information about quality of services. But enabling researchers to have access to data about patients -- much as Google, Facebook, Amazon, and many other retailers have data about our buying habits -- is a critically important, but so far overlooked, early step. Lives and money are at stake.