This isn't your grandfather's stargazing: The amount of data we have on our universe is doubling every year thanks to big telescopes and better light detectors.
Think of all the data humans have collected over the long history of astronomy, from the cuneiform tablets of ancient Babylon to images---like the one above---taken by the Hubble Space Telescope. If we could express all of that data as a number of bits, our fundamental unit of information, that number would be, well, astronomical. But that's not all: in the next year that number is going to double, and the year after that it will double again, and so on and so on.
There are two reasons that astronomy is experiencing this accelerating explosion of data. First, we are getting very good at building telescopes that can image enormous portions of the sky. Second, the sensitivity of our detectors is subject to the exponential force of Moore's Law. That means that these enormous images are increasingly dense with pixels, and they're growing fast---the Large Synoptic Survey Telescope, scheduled to become operational in 2015, has a three-billion-pixel digital camera. So far, our data storage capabilities have kept pace with the massive output of these electronic stargazers. The real struggle has been figuring out how to search and synthesize that output.
Alberto Conti is the Innovation Scientist for the James Webb Space Telescope, the successor to the Hubble Space Telescope that is due to launch in 2018. Before transitioning to the Webb, Conti was the Archive Scientist at the Space Telescope Science Institute (STScI), the organization that operates the Hubble. For almost ten years, he has been trying to make telescope data accessible to astronomers and to the public at large. What follows is my conversation with Conti about the future of, and the relationship between, big telescopes and big data.
Last year I was researching the Hubble Deep Field (pictured below) and I interviewed
Bob Williams, the former head of STScI who originally conceived of and
executed the deep field image. He told me that the deep field, in
addition to its extraordinary scientific value, had changed the way that
data is distributed in astronomy. Can you explain how?
It's interesting, one of the very first papers I wrote as a graduate
student in astronomy was on the Hubble Deep Field. I was a graduate
student in 1995 when it came out, and of course there was this "wow"
factor---the fact that this was one of the deepest images ever taken,
the fact that you have thousands of galaxies in this tiny patch of
sky---you would take out your calculator and try to calculate how many
galaxies there are in the universe and you would come up with a hundred
billion, and it was mind-boggling. It still is.
it also changed the data regime. Before the Hubble Deep Field, data
(raw images) would be deposited in some archive and you would just tell
astronomers to "go get the images." Astronomers would then have to
download the images and run software on them in order to find all of the
objects using certain parameters, and then they'd have to assess the
quality of the data, for instance whether an object that was thought to
be a star was actually a star. So you had to do a lot of analysis before
you could really get into your research.
decided that this data was so overwhelmingly powerful, in terms of what
it was telling us about the universe, that it was worth it for the
community to be able to get their hands on the data immediately. And so
the original deep field team processed the data, found the objects in
it, and then catalogued each of them, so that every object in the deep
field had a description in terms of size, distance, color, brightness
and so forth. And that catalogue was available to researchers from the
very start---it started a whole new model, where the archive does all
I can tell you firsthand how
incredible it was at the time, because as a graduate student studying
quasars, I was able to identify all of the quasars within the data in
just a few minutes. What Bob did, which I thought was brilliant, was
enable us to do the science much quicker. If you take a look at what's
happening with these massive archives now, it's being done in the exact
same way; people realized that you aren't going to be able to download
and process a terabyte of images yourself. It's a huge waste of time.
The other thing Bob did was he released the data to the world almost
immediately; I remember it took forever to download, not because the
data set was especially large, but because there were so many people
accessing the archive at the same time. That was one of astronomy's
first open source exercises, in the sense that we use that term today.
Has data always been an issue for astronomy? Did Galileo ever run out of log books? I remember reading about William Herschel's sister Caroline, an accomplished astronomer in her own right, spending these long, cold nights underneath their wooden telescope, listening for her brother, who would scream these numbers for her to write down in a notebook. How have data challenges changed since then?
Conti: That's a good question. Astronomy has changed quite a bit since Galileo and Herschel. Galileo, for instance, had plenty of paper on which to record his observations, but he was limited in his capacity for observation and so was Herschel to some extent. Today we don't have those same observational limits.
There are two issues driving the current data challenges facing astronomy. First, we are in a vastly different data regime in astronomy than we were even ten or fifteen years ago. Over the past 25 to 30 years, we have been able to build telescopes that are 30 times larger than what we used to be able to build, and at the same time our detectors are 3,000 times more powerful in terms of pixels. The explosion in sensitivity you see in these detectors is a product of Moore's Law---they can collect up to a hundred times more data than was possible even just a few years ago. This exponential increase means that the collective data of astronomy doubles every year or so, and that can be very tough to capture and analyze.
You spent part of your career working with GALEX, the Galaxy Evolution Explorer. How did that experience change the way you saw data and astronomy?
Conti: GALEX was a big deal because it was one of the first whole sky ultraviolet missions. I want to stress "whole sky" here, because taking measurements of ultraviolet sources all over the sky is a lot more data-intensive than zooming in on a single source. Whole sky ultraviolet measurements had been done before, but never at the depth and resolution made possible by GALEX. This had tremendous implications for data archives at the time. When I started working on GALEX nine years ago, the amount of data it produced was gigantic compared with anything that we had in-house at the Space Telescope Science Institute, and that includes the Hubble Space Telescope, which of course doesn't take whole sky images.
What we were able to do was create a catalog of objects that were detected in these whole sky images, and the number was quite large---GALEX had detected something close to three hundred million ultraviolet sources in the sky. That forced the archive to completely revisit the way it allowed users to access these very large catalogs. There were large databases in astronomy ten years ago, but databases that would allow you to search large collections of objects were not common. GALEX helped to pave the way with this new searchable archive. I can remember when we first introduced the data, we had people all over the world trying to download all of the data, because they thought that was the only way they could access it. They were thinking that to use the data you had to have it locally, which was the old way of thinking. The big leap was that we created an interface that allowed you to get to your data, to a level where you're one step away from analysis, and we were able to do that without you having to download it. We did it by creating interfaces that allowed you to mine all three hundred million sources of ultraviolet light in just a few seconds. You could ask the interface to show you all of the objects that had a particular color, or all of the sources from a certain position in the sky, and then you could download only what you needed. That was a big shift in how astronomers do research.
How much data are we talking about?
Conti: Well, GALEX as a whole produced 20 terabytes of data, and that's actually not that large today---in fact it's tiny compared to the instruments that are coming, which are going to make these interfaces even more important. We have telescopes coming that are going to produce petabytes (a thousand terabytes) of data. Already, it's difficult to download a terabyte; a petabyte would be, not impossible, but certainly an enormous waste of bandwidth and time. It's like me telling you to download part of the Internet and search it yourself, instead of just using Google.
Would something like the exoplanet-hunting Kepler Space Telescope have been possible with the data mining and data storage capacities of twenty years ago?
Conti: Well, Kepler is an extraordinary mission for many reasons. Technologically, it would not have been possible even just a few years ago. Kepler measures the light of 170,000 stars very precisely at regular intervals looking for these dips in light that indicate a planet is present. The area that they sample is not very large---it's a small patch of sky---but they're sampling all of those stars every thirty minutes. So that's already a huge breakthrough, and it creates a lot of data, but it's still not as much as a whole sky mission like GALEX.
What's different about Kepler, from a data perspective, is that it's opening up the time domain. With a mission like GALEX, we collect data and store it in the database, but it's relatively static. It sits there and it doesn't really change, unless we get a new dump of data that helps us refine it, and that may only happen once a year. With Kepler you have these very short intervals for data collection, where you have new images every thirty minutes. That really opens up the time domain. We're working hard to figure out how to efficiently analyze time domain data. And of course the results are spectacular: a few years ago we had less than twenty exoplanets, and now we have thousands.
Is there a new generation of telescopes coming that will make use of these time domain techniques?
Conti: Oh yes. With Kepler we've developed this ability to make close observations of objects in the sky over time, but if you add millions or even billions of objects, then you get into the new regime of telescopes like the Large Synoptic Survey Telescope (LSST) which we expect to come online at the end of this decade. These telescopes are going to take images of the whole sky every three days or so; with that kind of data you can actually make movies of the whole sky. You can point to a place in the sky and say "there was nothing there the other day, but today there's a supernova." You couple that kind of big data, whole sky data, with the time domain and you're talking about collecting terabytes every night. And we don't have to wait that long; ALMA, the Atacama Large Millimeter Array is going to have its first data release very soon and its raw data is something like forty terabytes a day. Then in 2025, we're going to have the Square Kilometre Array (SKA), the most sensitive radio instrument ever built, and we expect it will produce more data than we have on the entire Internet now---and that's in a single year. This is all being driven by the effect that Moore's Law has on these detectors; these systematic advances let us keep packing in more and more pixels.
In my view, we've reached the point where storage is no longer the issue. You can buy disk, you can buy storage, and I think that at some point we may even have a cloud for astronomy that can host a lot of this data. The problem is how long it's going to take me to get a search answer out of these massive data sets. How long will I have to wait for it?
Has citizen science played a meaningful role in helping astronomy tackle all of this data?
Conti: I think so. I'm part of a group that has done a lot of work on citizen science, especially with the folks over at Galaxy Zoo and CosmoQuest on an in-house project called Hubble Zoo. The original Galaxy Zoo was a galaxy classification project, where volunteers could log on to the server and help to classify galaxies by shape. Galaxy shapes give you a lot of information about their formation history; for instance, round galaxies are much more likely to have cannibalized other galaxies in a merger, and on average they're a little older. Spiral galaxies are structures that need time to evolve; generally, they're a little younger than round galaxies. And so when you have thousands of ordinary, non-scientists classifying these galaxies you can get some great statistics in a short period of time. You can get the percentage of round galaxies, elliptical galaxies, spiral galaxies, irregular galaxies and so forth; you can get some really interesting information back. What's great about citizen science is that you can feed images to citizens that have only been fed through machines---no human eyes have ever looked at them.
There's another citizen science project that I'm trying to get started in order to to make use of all the old GALEX data. With GALEX we took these whole sky images in ultraviolet, and we did it at certain intervals, so there is a time domain at work, even if it's not as rapid as the Kepler. But as I said before, we have over three hundred million sources of UV light in these images. There was a professor who had a graduate student looking at this data at different intervals with the naked eye, and they were able to find four hundred stars that seemed to be pulsating over time. When I saw the data, I said "this is interesting, but it should be an algorithm." So we made an algorithm to detect these pulsating stars, and we ran it inside the entire database of 300 million sources, and we found 2.1 million pulsating star candidates. And of course, this is just the first pass at this; who knows how many of those candidates will convert. But it's an illustrative case---the idea is to feed these kinds of projects to the next generation of citizen scientists, and to have them to do what that graduate student did, and then in some cases they'll be able to find something remarkable, something that otherwise might never have been found.
Can we talk about image-processing? What percentage of Hubble images
are given the kind of treatment that you see with really iconic shots
like the Sombrero Galaxy or the Pillars of Creation?
It depends. There's an image coming out for the 22nd anniversary (of
the Hubble) here in a few days, and as you'll be able to see, it's a
very beautiful image. I'm a little biased in the sense that I tend to
think that every image from the Hubble is iconic, but they aren't all
treated equally. There's a group of people here in the office of public
outreach at STScI that think a lot about how images are released. But if
you go back to the Hubble Deep Field, or even earlier, you can see that
the imaging team really does put a lot of care into every Hubble image.
And that's not because each one of those images is iconic; rather it's
because we have this instrument that is so unbelievable and each piece
of data it produces is precious, and so a lot of work goes into
And now, with the Hubble
Legacy Archive, people can produce their own Hubble images, with new
colors, and they can do it on the fly.
Like Instagram filters?
Kind of, yeah. As you know, all data in astronomy is monochrome
data---it's black and white---and then the processing team combines it
into layers of red, green and blue, and so forth. Zolt Levay, the head
of the imaging team, takes these colored layers and combines them and
tries to make them as accurate as possible in terms of how they would
look to the human eye, or to a slightly more sensitive eye. This program
lets you take three monochrome images, which you can then make any
color you like, and it let's you make them into a single beautiful
image. There's actually a contest being held by the office of public outreach to see who can upload the most beautiful new image.
Their peaceful premises and intricate rule systems are changing the way Americans play—and helping shape an industry in the process.
In a development that would have been hard to imagine a generation ago, when video games were poised to take over living rooms, board games are thriving. Overall, the latest available data shows that U.S. sales grew by 28 percent between the spring of 2016 and the spring of 2017. Revenues are expected to rise at a similar rate into the early 2020s—largely, says one analyst, because the target audience “has changed from children to adults,” particularly younger ones.
Much of this success is traceable to the rise of games that, well, get those adults acting somewhat more like children. Clever, low-overhead card games such as Cards Against Humanity, Secret Hitler, and Exploding Kittens (“A card game for people who are into kittens and explosions”) have sold exceptionally well. Games like these have proliferated on Kickstarter, where anyone with a great idea and a contact at an industrial printing company can circumvent the usual toy-and-retail gatekeepers who green-light new concepts. (The largest project category on Kickstarter is “Games,” and board games make up about three-quarters of those projects.)
Corporate goliaths are taking over the U.S. economy. Yet small breweries are thriving. Why?
The monopolies are coming. In almost every economic sector, including television, books, music, groceries, pharmacies, and advertising, a handful of companies control a prodigious share of the market.
The beer industry has been one of the worst offenders. The refreshing simplicity of Blue Moon, the vanilla smoothness of Boddingtons, the classic brightness of a Pilsner Urquell, and the bourbon-barrel stouts of Goose Island—all are owned by two companies: Anheuser-Busch InBev and MillerCoors. As recently as 2012, this duopoly controlled nearly 90 percent of beer production.
This sort of industry consolidation troubles economists. Research has found that the existence of corporate behemoths stamps out innovation and hurts workers. Indeed, between 2002 and 2007, employment at breweries actually declined in the midst of an economic expansion.
When the government shuts down, the politicians pipe up.
No sooner had a midnight deadline passed without congressional action on a must-pass spending bill than lawmakers launched their time-honored competition over who gets the blame for their collective failure. The Senate floor became a staging ground for dueling speeches early Saturday morning, and lawmakers of both parties—as well as the White House and political-activist groups—flooded the inboxes of reporters with prewritten statements castigating one side or the other.
Led by President Trump, Republicans accused Senate Democrats of holding hostage the entire government and health insurance for millions of children over their demands for an immigration bill. “This is the behavior of obstructionist losers, not legislators,” the White House said in a statement issued moments before the clock struck midnight. In a series of Saturday-morning tweets, Trump said Democrats had given him “a nice present” for the first anniversary of his inauguration. The White House vowed that no immigration talks would occur while the government is closed, and administration officials sought to minimize public anger by allowing agencies to use leftover funds and by keeping national parks and public lands partially accessible during the shutdown—in effect, by not shutting down the government as fully as the Obama administration did in 2013.
Allegations against the comedian are proof that women are angry, temporarily powerful—and very, very dangerous.
Sexual mores in the West have changed so rapidly over the past 100 years that by the time you reach 50, intimate accounts of commonplace sexual events of the young seem like science fiction: You understand the vocabulary and the sentence structure, but all of the events take place in outer space. You’re just too old.
This was my experience reading the account of one young woman’s alleged sexual encounter with Aziz Ansari, published by the website Babe this weekend. The world in which it constituted an episode of sexual assault was so far from my own two experiences of near date rape (which took place, respectively, during the Carter and Reagan administrations, roughly between the kidnapping of the Iran hostages and the start of the Falklands War) that I just couldn’t pick up the tune. But, like the recent New Yorker story “Cat Person”—about a soulless and disappointing hookup between two people who mostly knew each other through texts—the account has proved deeply resonant and meaningful to a great number of young women, who have responded in large numbers on social media, saying that it is frighteningly and infuriatingly similar to crushing experiences of their own. It is therefore worth reading and, in its way, is an important contribution to the present conversation.
The website made a name for itself by going after Aziz Ansari, and now it’s hurting the momentum of #MeToo.
Fifteen years ago, Hollywood’s glittering superstars—among them Meryl Streep— were on their feet cheering for Roman Polanski, the convicted child rapist and fugitive from justice, when he won the 2003 Academy Award for Best Director. But famous sex criminals of the motion picture and television arts have lately fallen out of fashion, as the industry attempts not just to police itself but—where would we be without them?—to instruct all of us on how to lead our lives.
The Golden Globes ceremony had the angry, unofficial theme of “Time’s Up,” which quickly and predictably became unmoored from its original meaning, as excited winners tried to align their entertaining movies and TV shows with the message. By the time Laura Dern—a quiver in her voice—connected the nighttime soap opera Big Little Lies to America’s need to institute “restorative justice,” it seemed we’d set a course for the moon but ended up on Jupiter: close, but still 300 million miles away. And then Oprah Winfrey climbed the stairs to the stage, and I knew she wouldn’t just bat clean-up; she’d bring home the pennant.
Stories of gray areas are exactly what more men need to hear.
The story of Aziz Ansari and “Grace” is playing out as a sort of Rorschach test.
One night in the lives of two young people with vintage cameras is crystallizing debate over an entire movement. Depending on how readers were primed to see the ink blot, it can be taken as evidence that the ongoing cultural audit is exactly on track—getting more granular in challenging unhealthy sex-related power dynamics—or that it has gone off the rails, and innocent men are now suffering, and we are collectively on the brink of a sex panic.
Since the story’s publication on Saturday (on the website Babe, without comment from Ansari, and attributed to a single anonymous source), some readers have seen justice in Ansari’s humiliation. Some said they would no longer support his work. They saw in this story yet another case of a man who persisted despite literal and implied cues that sex was not what a woman wanted.Some saw further proof that the problems are systemic, permeating even “normal” encounters.
Entertainment glorifying or excusing predatory male behavior is everywhere—from songs about “blurred lines” to TV shows where rapists marry their victims.
Edward Cullen. Chuck Bass. Lloyd Dobler. Spike from Buffy the Vampire Slayer. That guy from Love Actually with the sign. The lead singers of emo bands with their brooding lyrics. Many of the romantic heroes that made me swoon in my youth followed a pattern and, like a Magic Eye picture, only with a little distance did the shape of it pop out to me. All of these characters in some way crossed, or at least blurred, the lines of consent, aggressively pursuing women with little or no regard for their desires. But these characters’ actions, and those of countless other leading men across the pop-culture landscape, were more likely to be portrayed as charming than scary.
Romance often involves a bit of pursuit—someone has to make a move, after all. And there’s certainly a spectrum of pursuit: Sometimes supposedly romantic gestures in pop culture veer toward the horrendous or illegal; sometimes they’re just a bit creepy or overzealous. But revisiting some of these fictional love stories can leave one with the understanding that intrusive attention is proof of men’s passion, and something women should welcome. In a number of cases, male characters who were acknowledged to have gone too far—by, for example, actually forcing themselves on women—were quickly forgiven, or their actions compartmentalized and forgotten.
An infamous gap in Interstate 95 will finally be closed this summer.
PENNINGTON, N.J.—The past few years have been thick with promises of shiny new infrastructure and the revival of American greatness.
Funny, then, that so little has been made of a quiet victory for U.S. infrastructure due later this year. By September 2018, one of the country’s most famous civil-engineering projects will finally complete construction, six decades after work on it began.
Interstate 95, the country’s most used highway, will finally run as one continuous road between Miami and Maine by the late summer. The interstate’s infamous “gap” on the Pennsylvania and New Jersey border will be closed, turning I-95 into an unbroken river of concrete more than 1,900 miles long. In so doing, it will also mark a larger milestone, say transportation officials—the completion of the original United States interstate system.
Guillaume Dumas attended classes, made friends, and networked on some of America's most prestigious campuses—for free. What does this say about the value of a diploma?
If you want to start taking classes at an Ivy League university unenrolled and undetected, says Guillaume Dumas, a 28-year-old Canadian, start with big lecture courses. If you must sit in on a smaller seminar class, it’s important to show up consistently starting with the first session, instead of halfway through the semester. Also, one of the best alibis is that you’re enrolled as a liberal-arts student. “That's the kind of program that's filled with everything and that you expect people to be a bit weird, a bit confused about what they do,” he says.
From 2008 to 2012, Dumas claims he did stints on a number of elite North American universities—Yale, Brown, UC Berkeley, Stanford, and McGill, to name a few—sitting in on classes, attending parties, and living near campus as if he were an enrolled student. This deception may sound like a lead-up to a true-crime story, but Dumas’s exploits appear to be harmless, done in a spirit of curiosity. "A lot of students are bored in class," he observes, "so if you participate, if you ask questions, if you are genuinely interested in the class, I think the teacher will like you."
401(k) fees are costing you hundreds of thousands of dollars over your lifetime.
Humans are horrible at understanding compound interest, and it's making our golden years much less so.
Think about your 401(k). The first thing you probably look at when you pick your funds is their returns. It's only human nature. Everybody likes to think about their nest eggs growing and growing and growing—especially if they're growing a little bit faster than everybody else's. But, in this case, human nature is costing you hundreds of thousands of dollars.
The sad fact is that returns aren't certain, but fees are. Now, maybe everything will go according to plan, and your 401(k) will be partying like it's 1999. Maybe the 1 percent—or more—that you're paying in fees will actually buy you market-beating returns. But probably not. You can see this in the chart to the left from Vanguard. It shows the percentage of actively managed funds that have underperformed index funds over the short and longer hauls, net of fees. Which is to say, most of them. It's hard enough for funds to beat their benchmarks over just one to three-year periods. But that gets damn near impossible the longer you go. Once you account for survivorship bias—that bad funds go bust, and disappear from the sample—almost 80 percent of actively managed funds don't beat simple index funds over 10 to 15-year periods.