Do CEOs Matter? Absolutely.

Harris Collingwood has an article in the current edition of The Atlantic that raises the question "Do CEOs Matter?"  His answer to this question, as far as I can tell, is some mix of "maybe", "not much" and "sometimes."  I have a simpler answer to his question: "Yes."

Collingwood begins with a long anecdote about Steve Jobs, the gist of which is that Apple's stockholders care a lot about Steve Jobs' health.  This seems pretty understandable to me.  I've been a CEO, and been a member of Boards that have hired and fired CEOs; the hiring and firing of the CEO is generally seen to be the single most important duty of the Board of any company.  As we nationalize various financial institutions and automobile companies, the entire political system seems to think that picking the right CEO for these institutions is a pretty important decision.  Very successful private equity funds typically install new CEOs as part of investments predicated on improving corporate performance.  So are most institutional shareholders, Boards, the federal government and private equity funds suckers?

To be sure, none of these entities believe that any CEO ever completely determines firm performance.  The rational standard, it seems to me, that ought to be applied to determine whether "the cult of the CEO has gone too far" is whether the formal and informal compensation provided to CEOs is justified by their contribution to value.  More concretely: would firms be worth more money if they paid CEOs less, reduced the prestige of the position or otherwise took CEOs down a peg?  Shareholders, Boards and private equity funds, when voting with their own money, say no. Conventional wisdom is sometimes wrong, however, and maybe it is in this case.  What evidence does Collingwood present to challenge the very widely-held belief that CEOs matter?

Speaking of the CEO's ability to drive company performance, he starts with this:

But how strong is this power--or any executive power? In their groundbreaking "Leadership and Organizational Performance: A Study of Large Corporations," first published in 1972 in American Sociological Review, Stanley Lieberson and James O'Connor argued that it's weak indeed. Perhaps reflecting the anti-authoritarian spirit of the times, the authors asserted that the CEO's influence was seldom decisive in a company's performance. They had the numbers to back up this view. Working with a database of 167 companies, they teased out the effects that various factors had on corporate profitability, from the competitive state of the industry to the size and structure of an individual company to the CEO's managerial decisions. "Industry effects," such as the amount of available capital and the stability of the market, accounted for almost 30 percent of the variance in corporate profits. "Company effects," such as the firm's historical place in the corporate pecking order, explained about 23 percent. "CEO effects" explained just 14.5 percent. And even this impact should be viewed skeptically: it unavoidably bundles CEO actions that were genuinely smart and skillful with those that were merely lucky.

Other scholars have attempted to replicate and extend Lieberson and O'Connor's findings, and many have likewise concluded that external forces influence corporate performance far more than CEOs do. Indeed, more-recent studies have tended to find a smaller CEO effect than Lieberson and O'Connor did--ranging from 4.5 percent to 12.8 percent of profit variance. (The scholar Alison Mackey, at Ohio State University, is a prominent dissenter. In a recent paper, she criticizes the number-crunching methods of Lieberson and O'Connor and, using a different methodology, concludes that CEOs have a dominant influence on performance that may well justify their high pay.)

Let's start with the observation that even if we assume that the choice of CEO drives on the order of 10% of the variation in corporate performance (as per Collingwood's interpretation of these studies), that is a very big number in absolute dollars.  If we apply the rational standard of whether owners would be better or worse off by paying CEOs less and treating them less well, this creates a pretty big umbrella for CEO comp and pomp.  Collingwood doesn't present the basic numbers that would be required to evaluate this question, especially how big is "variance in performance", so we could take a tenth of that, assign it to the CEO, and decide what the person is worth economically.

Here's some simple illustrative math.  I picked the median company on the most recent Fortune 500 (i.e., number 250), Smith International.  It has about $11 billion in sales and $1.6 billion in operating income.  A 1% swing in $1.6 billion is $16 million.  As context the median Fortune 500 CEO recently had total annual comp of about $6 million.  So as a shareholder of Smith International going into the market to hire a CEO, the question I would ask myself if presented with the choice of paying $6 million per year or, say, doubling this to $12 million per year, is not "Will the CEO I get for $12 million fundamentally transform my business?" or whatever; instead, I'd rationally ask myself, "Can the $12 million dollar CEO drive about 0.6% more operating profit than the person I would hire at $6 million?".

Even more fundamentally, Collingwood's interpretation of the quantitative analysis of what impact CEOs have on performance is extremely naïve.  Start with his lead analysis, Lieberson and O'Connor.  Notice that, according to Collingwood, what happened when other scholars attempted to replicate Lieberson and O'Connor's findings: "many have likewise concluded that external forces influence corporate performance far more than CEOs do."  Collingwood does not address what was probably the most influential attempt to replicate their findings, by Weiner and Mahoney, that showed that these results are highly dependent on unverifiable assumptions.

To understand why this is so, we need to consider, at least is rough terms, how you do this analysis.  Imagine by analogy that you have a list of 500 election precincts from the 2008 U.S. presidential election, and you know what percentages of votes were cast in each precinct for Obama, McCain and Other.  In addition, you have a list of 100 facts that describe each precinct, i.e., average income, average age, population density, and so on.  You want to measure the "impact" of average income on likelihood of voting for Obama.  You might start by asking which descriptor is most correlated with voting for Obama.  Let's say you discover that it's population density.  You could then observe the mathematical relationship that "for every additional hundred people per square mile in a precinct, Obama's vote percentage is about 0.1% higher".  You could then use this relationship to "adjust" the result for each of your 500 precincts to get a population density-adjusted vote percentage for Obama.  You might then observe that average age is correlated with the population density-adjusted Obama vote percentage, and in a similar fashion now further adjust the Obama percentage by precinct to get a population density and age-adjusted Obama vote percentage, and so on.  Once you reach some stopping condition, you could then measure the correlation between income and the adjusted-for-all-other-relevant-variables Obama vote percentage, and call this the estimated impact of income on likelihood to vote for Obama after adjusting for all relevant factors.

The problem should be obvious.  If income is correlated with population density, age and the other factors for which you adjusted, then how do you know whether you should "adjust" for any of these factors first?  That is, how do you know that it wasn't the income difference that was driving the difference in voting behavior, and therefore when you "adjusted" by these other factors, you were really under-estimating the causal impact of income on voting?  What order you choose to put the factors into the model can have a huge impact on the estimate for the impact of each factor.  This is most easily seen in a limit case.  Suppose across my 500 precincts age and income are perfectly correlated (e.g., the precinct with average age 18 has an average income of $18,000, the precinct with an average age of 19 has an average income of $19,000, all the way up to the precinct with an average age of 60 that has an average income of $60,000). Further suppose that age (and by extension, for this example, income) is correlated with voting behavior and no other descriptors are correlated at all with voting behavior.  In this case if I "adjust" for age first, I will estimate that income has no effect on voting.  If I don't, I will estimate that income has an enormous effect on voting behavior.  Which is correct?  There is no way to determine the answer with this data set.

What Weiner and Mahoney showed in their replication attempt was that this is exactly the problem with Lieberson and O'Connor's analysis.  They showed that for a sample of about 200 companies that if you put CEO in as a factor after other correlates of performance (as had Lieberson and O'Connor), you estimate that the CEO drives about 9% - 19% of the variance in corporate performance, which is very much in line with Lieberson and O'Connor's estimate of 14.5%.  But if you put CEO in first, then other factors afterwards, you estimate that the CEO drives 78% - 96% of the variation in corporate performance.  In other words, the output of the analysis is merely a complex restatement of your assumption about the relative importance of CEO versus other factors, as embodied by your operational decision about the order in which you enter variables in the model.

Later in the article, Collingwood points to some more quantitative evidence:

Three Harvard professors--Noam Wasserman, Bharat Anand, and Nitin Nohria--say in a recent paper that the right question is, When does leadership matter? Using advanced statistical techniques that go by a wonderfully CSI-style name, "variance decomposition analysis," the authors examine 531 companies in 42 industries and isolate leadership effects from other determinants of corporate performance. They conclude that leadership matters sometimes. It doesn't make much difference at electrical-utility companies, which are so constrained by government regulations and the cost of fuel that there's very little room for the CEO to exercise any discretion. The professors used the term "Titular Figureheads" for such CEOs. In addition to utilities, you'll find them in stable, old-line industries--paper mills, meat wholesalers--where the pace of change is slow.

First, the Lieberson and O'Connor and Weiner and Mahoney papers also did it CSI-style by using variance decomposition analysis.  The difference is that, presumably in response to the exact criticisms leveled by scholars like Weiner and Mahoney, these later authors replaced "sequential variance decomposition" (i.e., order of variables matters) with "simultaneous variance decomposition" (i.e., order of variables doesn't matter).  Why didn't they think of that in 1972?  Not because they were dumb, but because there is a big problem with simultaneous variance decomposition, namely that you have to include interaction variables (e.g., does the CEO matter more in a specific industry than others?).  If you don't get the interaction estimates right, you can't get a reliable estimate for the causal impact of the CEO.  You can see that as: (i) the phenomenon becomes more complex, so that the number of potential interactions rises, (ii) the variables themselves are inter-correlated in complicated ways, and (iii) the number of data points is small, this becomes extremely difficult to do. All of these conditions obtain in spades for the problem of teasing out the effect of a CEO on the performance of a large corporation.

Wasserman, Anand and Nohria had 531 companies in their database.  Even when using tricks like considering different years and business segments within companies as data points, there is no way to conceivably specify all the plausible interactions and estimate them correctly.  The number of plausible interactions is easily in the thousands.  Further, as per the voting precincts thought experiment, we will simply lack many real compare-contrast opportunities to segment effects, no matter how many data points we had.  Suppose, hypothetically, we had a measure of "cunning" for each CEO, and observed that cunning CEOs tended to run more successful companies after adjusting (assuming we could reliably adjust) for other effects.  Would this mean that cunning enabled these CEOs to help their companies perform better or helped them make sure they ended up in charge of companies that would be doing well anyway?  The list of such issues is virtually endless, not just on a nit-picking level, but at the practical level of the issues that all Boards, in my experience, consider when evaluating potential CEOs.

Alison Mackey, cited as a "prominent dissenter" by Collingwood, has an analysis in which she takes a step toward what you would really have to do to isolate causal impacts in this situation, when she basically tries to look at CEO changes.  Even here, however, there is the huge selection bias problem created by the fact that CEO changes aren't random, but are somewhat and sometimes influenced by performance.  She is acutely aware of this problem, and uses the Heckman procedure, which is often cast as a cure for selection bias, to try to adjust for this.  Her unusually clear writing style provides an excellent example of why, while this procedure is extremely helpful, it simply moves the ball of assumptions, without really solving the problem.  She writes:

The Heckman 2-step procedure (1979) to correct for sample selection bias is used to ascertain whether the results presented in this paper are an artifact of sample selection bias. The first step of this procedure predicts what variables impact whether or not a firm has any turnover during the years it is in the sample. Identifying appropriate instruments to predict CEO turnover is based on Finkelstein and Hambrick's (1996) theoretical work on the determinants of executive turnover. Turnover is thus modeled as a function of firm performance (corporate ROA relative to the segment's median competitor), firm size (corporate sales), firm structure (diversification status), and environmental conditions (number of firms in a segment's industry as well as the percent of a segment's competitors in an industry that have had executive turnover during their time in the sample).

But of course this means that our correction for sample bias depends on the accuracy of our predictive model for CEO turnover.  How would we verify the accuracy of this model (which is for all practical purposes, just another way to "decompose variance", this time for the outcome called CEO turnover)?  We're back to pretty much the same problem we have with verifying our model for CEO impact on performance.

There's just no way out of the problem that what makes companies do well or badly is very, very complicated, and therefore isolating the impact of any one variable by lining up some descriptors for a few hundred companies and looking for patterns is like trying to grab liquid mercury.  The only way to really isolate the causal impact of CEOs on performance would be to randomly change a subset of them while holding others out as controls, which of course will never happen.  Even if we could do this, we would still have an almost insurmountable generalization problem:  How would this vary by industry?  Are some CEOs only good at turnarounds, independent of industry?  Are CEOs secularly more valuable in some time periods than others?  And so on.

Once the cited analytical paragraphs are removed what evidence does Collingwood have?  Some stories, some psychology professors and some interested parties.  No sale.

This shouldn't be a surprise.  There's a reason that we have this incredibly messy, expensive thing called a market.  Its job is to process disparate information in order to set prices.  Otherwise, we'd just hire a couple of professors to build us a model.