Based on our rough calculations, less than $1 out of every $100 of government spending is backed by even the most basic evidence that the money is being spent wisely. As former officials in the administrations of Barack Obama (Peter Orszag) and George W. Bush (John Bridgeland), we were flabbergasted by how blindly the federal government spends. In other types of American enterprise, spending decisions are usually quite sophisticated, and are rapidly becoming more so: baseball’s transformation into “moneyball” is one example. But the federal government—where spending decisions are largely based on good intentions, inertia, hunches, partisan politics, and personal relationships—has missed this wave.
Allow us to share some behind-the-scenes illustrations of what our crazy system of budgeting looks like—and to propose how the lessons of moneyball could make our government better.
When one of us (Peter) began his tenure as the director of the Congressional Budget Office in 2007, he took a Willie Sutton approach to the nation’s huge and growing fiscal mess: he went after health care, which makes up roughly a quarter of the federal government’s spending, because that’s where the money is.
The moneyball formula in baseball—replacing scouts’ traditional beliefs and biases about players with data-intensive studies of what skills actually contribute most to winning—is just as applicable to the battle against out-of-control health-care costs. According to the Institute of Medicine, more than half of treatments provided to patients lack clear evidence that they’re effective. If we could stop ineffective treatments, and swap out expensive treatments for ones that are less expensive but just as effective, we would achieve better outcomes for patients and save money.
Both parties should find much to like in such an approach. It would offer Republicans a way to constrain the growth of government spending and take pressure off private businesses weighed down with health expenses. And it would offer Democrats a means of preserving the integrity of Medicare and Medicaid and thereby restoring faith in a core government function.
And yet getting funding for the research needed to assess and compare medical treatments has been like pulling teeth. As a rule, legislators seem to lack a natural affinity for economists and budget analysts (alas, they are hardly alone). But Peter made himself exceptionally unpopular with some Democrats and many Republicans by insisting on such funding in the 2009 stimulus bill, and then working to expand it in the 2010 “Obamacare” legislation. Despite these modest successes, less than $1 out of every $1,000 that the government spends on health care this year will go toward evaluating whether the other $999-plus actually works.
Getting the right information is less than half the battle. Acting on it, once it’s in hand, is harder still. As one small example, some evidence suggests that moving toward “bundled” payments for all services needed by a patient during a course of medical treatment could produce better value than paying piecemeal for each service and procedure, because the piecemeal approach creates an incentive for more care rather than better care. During one meeting with members of Congress in 2008 to discuss how to expand bundling and include a performance incentive in kidney dialysis, Shelley Berkley, a Democratic congresswoman from Nevada, accused Peter, as he remembers it, of trying to destroy the dialysis industry. “You and your staff may have your Ph.D.s, but you have no clue,” he recalls her saying. “We don’t need any of your fancy analysis.” (Berkley says she does not remember the meeting, or those comments.) Berkley had received campaign contributions from several dialysis companies and organizations, and her husband owned a dialysis business. Whether these factors may have influenced her thinking is a question we will leave for the reader.
It is indisputable, however, that a move toward payments based on performance would harm some business interests. If most of your profits come from, say, a medical device or procedure that is covered by Medicare but doesn’t work all that well, you’re likely to resist anyone sorting through what works and what doesn’t, never mind changing payment accordingly. Health-care interests are wise to invest millions of dollars in campaign contributions and lobbying to protect billions of dollars in profits.
Your other author (John) received his own lessons on why moneyball doesn’t play in Washington a few years earlier, when he joined the administration of George W. Bush. Bush, of course, was not only a former baseball-team owner but also the first president to hold an M.B.A. In every domestic-policy briefing John led in the first term of the Bush administration, the president would ask some version of the following questions: How do we know this program will achieve the results as advertised? Who will run this program, and is that person an effective manager? How will that person and the program be held accountable for producing results?
Year after year, with the help of the Program Assessment Rating Tool (PART) introduced by Bush’s Office of Management and Budget in 2002, the administration identified federally funded and administered programs that were not working as advertised, and tried to get them to improve or be discontinued. And yet these efforts rarely gained traction on Capitol Hill with either party.
The Bush administration initially had high hopes for this project. The goal was to build on a Clinton-era law called the Government Performance and Results Act, which aimed “to provide for the establishment of strategic planning and performance measurement in the Federal Government,” but did not attempt to tie performance directly to continued financing.
By the end of Bush’s second term, PART had assessed about 1,000 programs. Of them, 19 percent were rated “effective,” 32 percent “moderately effective,” 29 “adequate,” 3 percent “ineffective,” and 17 percent “results not demonstrated” (meaning that the programs couldn’t be assessed, because of insufficient data). This information was used to develop the president’s budget proposals to Congress, but PART was not developed in cooperation with Congress, and Congress gave its assessments little heed.
The White House Task Force for Disadvantaged Youth, which John co-chaired, highlights the extreme disconnect between effectiveness ratings and appropriation decisions. For the first time ever, in 2003, the task force tallied and studied all 339 federally funded programs addressing disadvantaged youth—a confusing and costly tangle—to find ways to improve the system. With help from 10 federal departments and from experts inside and outside of government, the task force found that the federal government was spending $223.5 billion every year on programs with aims ranging from promoting health and nutrition to preventing teen pregnancy, high-school dropouts, and youth violence. Despite this wide range, overlap among the programs was common. The task force’s final report documented 67 different youth programs that promoted “character education,” 89 that purported to build “self-sufficiency skills,” and 97 that sought to “prevent substance abuse”—with little or no coordination or knowledge-sharing among programs with similar goals.
More troubling, the vast majority of the programs could not provide any meaningful information indicating how well they served young people. Some reported basic operational data, but few had undergone rigorous evaluations looking at how the programs affected participants. At the time of the White House Task Force report, fewer than 10 percent of the programs had been assessed by PART, and more than half had not been evaluated at all in the previous five years.
With so little performance data, it’s impossible to say how many of the programs were effective. But you don’t have to be a Tea Party organizer to harbor skepticism. Since 1990, the federal government has put 11 large social programs, collectively costing taxpayers more than $10 billion a year, through randomized controlled trials, the gold standard of evaluation. Ten out of the 11—including Upward Bound and Job Corps—showed “weak or no positive effects” on their participants. This is not to say that all 10 programs deserve to be eliminated. But at a minimum, collecting rigorous evidence could help spur programs to improve over time.
One of the programs studied by the Task Force for Disadvantaged Youth that did collect meaningful performance data and was rated by a PART assessment was the Even Start Family Literacy Program, a Department of Education project aimed at improving the literacy of low-income parents and their children. Unfortunately, the data showed that “children and parents … did not gain more than children and parents in the control group.” PART rated the program “ineffective.”
In 2003, John and officials at the Office of Management and Budget began trying to redirect funding from Even Start to better-performing programs. But Even Start was founded in 1989 by Bill Goodling, a well-liked Republican congressman who had been the chairman of the House Education and the Workforce Committee, and had previously served as a teacher, principal, and school superintendent in Pennsylvania. So Congress continued to fund this ineffective, if well-meaning, program to the tune of more than $1 billion over the life of the Bush administration.
Even Start is no different from most federal programs. Evidence of success is barely considered when legislation is proposed and discussed in committee and on the floor of Congress. There is no systematic way in which members of Congress or other key decision makers are informed about that evidence, or lack thereof. They instead tend to rely on ad hoc assessments provided by lobbyists and interest groups. And once legislation is passed and a program is up and running, there is no mechanism for automatically tracking its effectiveness, beyond counting the number of people served by a program, no matter the impact it has on their lives.