How to Write Better Code

Researchers take an empirical approach to the human processes of collaboration and creation, searching for ways to improve


Software engineers make mistakes, and those mistakes can prove costly. In a feature story in American Scientist, Greg Wilson and Jorge Aranda take a look at the science of improving coding. One interesting finding:

In 1967, only partly as a wry joke, Melvin Conway coined his eponymous law: "Any organization that designs a system ... will produce a design whose structure is a copy of the organization's communications structure. In other words, if the people writing a program are divided into four teams, the program they create will have four major parts."

Nachi Nagappan, Christian Bird and others at Microsoft Research evaluated the validity of Conway's law by examining data collected during the construction of Windows Vista. Vista consists of thousands of interrelated libraries and programs called binaries. When an error occurs, the breakdown can usually be traced to a fault in a single binary or to a breakdown in the interaction between binaries. Nagappan, Bird, and their team used data mining to explore which aspects of software construction correlated with faults. They found that when work occurred in alignment with Conway's law -- that is, when the structure of the team and the structure of the code mirrored each other -- code contained fewer bugs, whereas work that crossed team boundaries increased failure-proneness.

Nagappan and his collaborators then used their data to predict failure-proneness by locating code produced by multiple groups or at the interface of multiple groups. Contrary to digital folklore, they found that geographic separation between team members didn't have a strong impact on the quality of their work. What did matter was organizational separation: The farther apart team members were in the company organization chart, the greater the number of faults in the software they produced.

The article gives a solid overview of efforts to study, empirically, why and where engineers make mistakes. They ask "Can the quality of code be measured? Can data mining predict the location of software bugs?" With data about coding, bugs, how many people work on a project, and so on, now available in archives such as the University of Nebraska's Software Artifact Infrastructure Repository, NASA's Software Engineer Laboratory, and an online database called CeBASE, researchers can look for patterns of failure and success.

In certain instances, of which the work of computer coding is one example, even the messy, human processes of collaboration and creation can become data. Once that data exists, it can be crunched, and patterns can be found. What we find may merely confirm our intuition -- probably many would suspect that errors crop up when people work across team boundaries -- but to be able to back up intuitions with data can give them greater power.