Temperature data from hundreds of years ago is no easier to process if it’s been obtained at sea. In 2004, for example, a team of researchers analyzing data from 18th- and 19th-century ships’ logs had to correct for the fact that in those days, sea-surface temperatures were measured with a thermometer in a bucket of seawater. Today, water temperatures are measured by intake pipes where seawater is taken aboard a ship, a practice that’s been in place since the 1940s. (A 2008 study showed that this switch in methodology exaggerated the global temperature dip seen in the 1950s.)
Data inconsistencies aren’t just a pain for the people measuring—they also make it harder to analyze the ways in which the planet is changing over time. Temperatures have been officially recorded in almost all regions of the world since the early 20th century. By the 1930s, records from individual temperature stations around the globe numbered in the millions. But using these records to unearth any long-term global trends involves pooling several different data sets, collected with very different methodologies across wide expanses of time and space. Unavoidably, there are gaps. In some cases, technological breakthroughs have made it impossible to directly compare readings separated by a few decades. And temperature records from many places are scattered and fragmented: Historical events often disrupt data collection (during World War II, for example, recordings from Pacific Island thermometers dropped sharply), and some areas have sparser coverage than others.
And then there are the thermometers themselves. Thermometer enclosures, which shield the temperature sensor from direct sunlight and other sources of radiation, can be wooden or plastic; the variation in materials can, in turn, introduce discrepancies in the results (which some stations in the U.S. discovered firsthand in the 1980s when they switched from traditional enclosures to electronic screens). The instruments are also sensitive to their surroundings: If you measure the temperature on a sunny day versus a cloudy day, direct sunlight on the thermometer will record a higher temperature, even if the two days are equally warm.
As a result, it can be hard for climate researchers to get the historical data they need. The regular fluctuations in global temperatures mean that just a few years’ worth of information isn’t enough; things like volcanic eruptions, solar activity, and El Niños can all throw short-term measurements out of whack, while pollution haze has been known in the past to cause a temporary cooling effect. To make sure the patterns they observe aren’t just side effects of these other phenomena, weather statisticians make inferences about global trends by averaging 10-year temperatures.
Nowadays, climate scientists have a few different tools for correcting these sorts of artificial discrepancies as the information is collected. Some have developed algorithms that identify and separate climate-change-related weather fluctuations from those attributable to some other cause. The National Oceanic and Atmospheric Administration’s Pairwise Homogenization Algorithm, for example, compares monthly temperatures from a network of stations, comparing the data from each one to that of its neighbors. To identify abnormalities in temperature data, the algorithm looks for abrupt shifts at one station that are absent from those surrounding it. The NOAA also runs its daily meteorological observations through quality-control measures to eliminate duplicate data, outliers, and other inconsistencies. When all of these numbers are corrected, the globe can be divided into a grid of boxes, and researchers can fill in the blanks based on satellite readings and temperature measurements from surrounding areas.