Why didn't I know this before? (Math dept: Benford's law)

One reason math is so satisfying is that it allows you to see order in what is otherwise the randomness of life. For instance, the famous Fibonacci sequence, which shows up in countless natural patterns like this:


Math is also satisfying when it helps you understand what parts of life truly are random or "chaotic," rather than adhering to patterns you haven't yet figured out. The most obvious example is the minute-by-minute movement of weather systems. The world's vast weather-forecasting computers can assess the layers and eddies of heat and moisture in the air and tell you where "convective activity" -- thunderstorms -- is more and less likely to occur. (An example from NOAA here. I spent hours looking at such stuff in my pre-China piloting days.) But a day before landfall, they can't really be sure whether a hurricane will hit New Orleans or someplace in the next state.

So I was grateful to discover, via Michael Ham's Later On blog, another mathematical tool with surprising usefulness in daily life -- and one that, to my chagrin, I had never heard of before. It is called Benford's law, and it has to do with the distribution of numbers we use to count many naturally-occurring phenomena.

It turns out that if you list the population of cities, the length of rivers, the area of states or counties, the sales figures for stores, the items on your credit card statement, the figures you find in an issue of the Atlantic, the voting results from local precincts, etc, nearly one third of all the numbers will start with 1, and nearly half will start with either 1 or 2.  (To be specific, 30% will start with 1, and 18% with 2.) Not even one twentieth of the numbers will begin with 9.

This doesn't apply to numbers that are  chosen to fit a specific range -- sales prices, for instance, which might be $49.99 or $99.95 -- nor numbers specifically designed to be random in their origin, like winning lottery or Powerball figures or computer-generated random sums. But it applies to so many other sets of data that it turns out to be a useful test for whether reported data is legitimate or faked.

Items in a real expense account, over time, will conform to the Benford pattern. They will look like this chart, from the Journal of Accountancy, showing the populations of US counties in the 1990 census:


But if there are lots of items starting with 5 or 7, someone is making things up.  Below, from T.P. Hill, one of the modern masters of Benford law-ism, a comparison of real with faked data:


To me, all of this is very interesting in its own right, in a "can it possibly be true?" sense. (Note to self: no more fake expense items beginning with "8.") It's not exactly news, in that the NYT ran a story about it ten years ago, but I submit that it is far from common knowledge. There is very extensive online commentary and demonstration that in fact it is true in places like this, and this, and this, and this, and this, and this, and this, and this, and this, for starters.

It also has a very practical use, worth remembering as the hair's-breadth recount in the Minnesota senate race drags on. When all those re-tabulated figures from the precinct boxes come in?  Half of the vote totals had better begin with 1 or 2, or else...

Surely the Minnesota officials are above such hanky panky. But think if the teams covering the Florida recount in 2000 had heard of Benford's law.