Big Data Analytics Will Help Us to Explore the Universe

Harnessing data from 500,000 antennas, scientists are building a system to measure radio signals from unexplored parts of the universe.
Ton Engbersen, PhD, IBM Research


It's not often that you get to work on one of the most ambitious science projects ever.

But I'm lucky: that's how I spend my day at the office. My job is to help design the technology roadmap for the Square Kilometre Array, or SKA. Backed by 10 nations, thousands of scientists, and hundreds of companies, the SKA will be the largest, most powerful radio telescope ever built, a technological undertaking that will help usher in a new era of cognitive computing.

It's not just that this massive telescope -- made up of 3,000 dishes sprinkled across thousands of miles in southern Africa and western Australia -- will tackle the most basic questions about the Big Bang and evolution of the universe, peering back to a time before the stars first lit up and investigating unexplored parts of the universe.

It's that building SKA's systems, much of the technology for which doesn't exist today, is the ultimate big data challenge. Each day, 500,000 antennas will pull in a deluge of radio signals -- an unprecedented 14 exabytes of data -- from outer space. That's double the amount of data the Internet produces daily.

How do you pull together that much data from so many different antennas? How do you store it, sift through it? At the same time, cost effectively and without consuming the energy required to run a small city? Answering those questions will require huge breakthroughs in system design, storage, analytics and machine learning. The process will lead to the creation of cognitive computing systems that will learn as they process data, becoming smarter in understanding what we need to know, and which connections we need to be alerted to.

SKA is leading the way in cognitive computing because it has to. Even after the radio signals are pared down to an expected petabyte of data a day, that's still a lot of data. There's simply no way to handle it without systems that think and learn by themselves over time to distinguish important from less important.

One step researchers are working on is how to teach a system to recognize certain patterns of data that astronomers are interested in so it automatically sifts that information out. The next logical step would be developing a machine that learns that every time it pinpoints a certain kind of data and shows it to the researcher, the researcher keeps that information. So the system uses automatic learning to understand what data to flag.

Or consider another challenge. Think about that petabyte of usable data. In three years, that's an exabyte of data, more than three in 10 years. How do you store that much data? If you make it instantly accessible on hard discs, you quickly run up an impressive power bill. Meantime, computers are so fast that most people don't understand that they aren't scaling as quickly as they used to. As we collect more data, we'll need smart ways to get around these roadblocks of performance and cost.

Cognitive computing has a big role to play here. We can create machines that learn which data they should store where, whether it's on instantly accessible hard disks, backup magnetic tapes, or next-generation storage class memory, and how to dynamically and cost-effectively manage data use.

About the author of this Post

Ton Engbersen, PhD, IBM Research
Ton Engbersen is the Scientific Director of the ASTRON & IBM Center for Exascale Technology in conjunction with the DOME project.
Sponsor Content Presented By

Join the Discussion

The Atlantic does not moderate these reader comments, except to the same extent comments are moderated pursuant to the Terms & Conditions generally applicable to all content on The Atlantic's sites. blog comments powered by Disqus

Applications of Cognitive Computing