Software Runs the World: How Scared Should We Be That So Much of It Is So Bad?

When software works, you can buy an airline ticket and sell a stock. When it fails, you can miss a flight and a bank can lose a billion dollars. Do we respect the power of software as much as we should?

615 trading floor software.jpg

What do most people think of when they think of software? A decade ago, probably Microsoft Word and Excel. Today, it's more likely to be Gmail, Twitter, or Angry Birds. But the software that does the heavy lifting for the global economy isn't the apps on your smartphone. It's the huge, creaky applications that run Walmart's supply chain or United's reservation system or a Toyota production line.

And perhaps the most mission-critical of all mission-critical applications are the ones that underpin the securities markets where a large share of the world's wealth is locked up. Those systems have been in the news a lot recently, and not for good reasons. In March, BATS, an electronic exchange, pulled its IPO because of problems with its own trading systems. During the Facebook IPO in May, NASDAQ was unable to confirm orders for hours. The giant Swiss bank UBS lost more than $350 million that day when its systems kept re-sending buy orders, eventually adding up to 40 million shares that it would later sell at a loss. Then last week Knight Capital -- which handled 11 percent of all U. S. stock trading this year -- lost $440 million when its systems accidentally bought too much stock that it had to unload at a loss.* (Earlier this year, a bad risk management model was also fingered in JP Morgan's $N billion trading loss, where N = an ever-escalating digit.)

The underlying problem here is that most software is not very good. Writing good software is hard. There are thousands of opportunities to make mistakes. More importantly, it's difficult if not impossible to anticipate all the situations that a software program will be faced with, especially when--as was the case for both UBS and Knight--it is interacting with other software programs that are not under your control. It's difficult to test software properly if you don't know all the use cases that it's going to have to support.

There are solutions to these problems, but they are neither easy nor cheap. You need to start with very good, very motivated developers. You need to have development processes that are oriented toward quality, not some arbitrary measure of output. You need to have a culture where people can review each other's work often and honestly. You need to have comprehensive testing processes -- with a large dose of automation -- to make sure that the thousands of pieces of code that make up a complex application are all working properly, all the time, on all the hardware you need to support. You need to have management that understands that it's better to ship a good product late than to ship a bad product on time. Few software companies do this well, and even fewer of the large companies that write much of their software.

This is why there is so much bad software out there. In most cases we learn to live with it. Remember the blue screen of death? Ever stood at an airline counter waiting interminably for the agent to make what should be a simple switch from one flight to another? Ever been on the phone with a customer service representative who says his computer is slow or not working? That's what living with bad software looks like.

But in our increasingly complex and interconnected financial system, it's not clear we can live with it. When software programs make mistakes, they don't do it in predictable, linear ways. To take a simplistic example, if a program is adding a zero at the end of every number, it could just as easily be adding three zeros. Today, when the stock market has become a battleground for sophisticated trading algorithms (see this astonishing chart by Nanex, courtesy of Felix Salmon), programming errors can quickly blow up into the hundreds of millions or billions of dollars.

The immediate problem is that as computer programs become more important to the financial system and hence the economy, there is insufficient incentive for trading firms to make sure their software works properly. Sure, everyone would prefer to have programs that don't break over broken programs. The question is how much you're willing to sacrifice in the name of quality.

Software failures are low-probability but can be catastrophic. Stress-testing software to prevent catastrophes is expensive. So risk-seeking individuals in a cost-conscious organization are more likely to accept the risk. Financial institutions are run either by testosterone-driven traders or back-slapping client managers, neither of whom know the first thing about technology. The incentive is always to get a trading edge and roll it out quickly to beat the competition and maximize profits. The same short-term, take-the-upside-and-offload-the-downside attitude that helped make the financial crisis possible means that trading firms will systematically underinvest in software quality. The fact that they don't bear the costs of systemic fragility only makes things worse. UBS and Knight are just the most obvious proof.

This is one problem that regulation probably can't solve directly. How can you write a rule saying that companies have to write good software? The only real solution is to acknowledge that computer programs are going to fail and try to minimize the damage they can cause in advance. That could include a small trading tax to discourage high-frequency trading, or higher capital requirements to increase the odds that too-big-to-fail banks won't blow themselves up.

Because what if this had happened at JP Morgan instead of at Knight Capital?


*Of course, Knight called the SEC and asked to be let off the hook for those trades but, to her credit, Mary Schapiro said no. What would Knight have done if it had mistakenly bought stock that later went up in value?