Although the exact URL of a page accessed through HTTPS is hidden to the provider, the provider can still see the domain the URL is on: For example, your ISP can’t tell what exactly story you’re reading right now, but it can tell that you’re somewhere on theatlantic.com. That may not reveal much other than your (excellent) taste in news sources—but a user who visited a page on plannedparenthood.com and then a page on dcabortionfund.com may have revealed much more sensitive information.
That’s an example from a 2016 report prepared by Upturn, a think tank that focuses on civil rights and technology. The Upturn report also sets out some of the sneaky ways that user activity can be decoded based only on the unencrypted metadata that accompanies encrypted web traffic—also known as “side channel” information. (These methods probably aren’t widely in use right now, but they could be deployed if ISPs decided it’s worthwhile to try and learn more about encrypted traffic.)
Website fingerprinting, for example, relies on the unique characteristics of a particular web page to reveal when it’s being accessed. When a user visits a page, his or her browser pulls data from various servers in a particular order. Based on that pattern, a network provider might be able to tell what page the user is visiting, even without having access to any of the actual data streams it’s transporting. (For this to work, the network operator would have to have already analyzed the loading pattern associated with the particular website the user is visiting.)
In November, a group of researchers from Israel’s Ben-Gurion and Ariel Universities demonstrated a way to extend the idea behind website fingerprinting to videos watched on YouTube. By matching the encrypted data patterns created by a user viewing a particular video to an index they’d created previously, they could tell what video the user was watching from within a limited set, with a startling 98 percent accuracy.
Ran Dubin, a Ph.D. candidate at Ben Gurion and the research paper’s primary author, told me that the discovery came out of work he’d been doing to optimize video streaming. He wanted to know if he could figure out the quality at which users were watching YouTube videos, so he analyzed the way devices received data as they streamed.
He quickly realized he’d stumbled into something bigger. “The network patterns that belong to each video title have very, very strong meaning,” Dubin said. “I found out that I could actually recognize each stream.”
The giveaway, he found, was embedded in the way devices choose a bitrate—an indicator of video quality—at which to stream the video. At the beginning of a stream, the player receives quick spurts of data, which begin to space apart after the video has been playing for a while and the player has settled on a bitrate. The pattern of these spikes helps identify each individual video.