Grace was a heroin addict who had been clean for about six months; I was a 34-year-old therapist in training. When we started psychotherapy, in 2006, Grace had a lot going against her. She was an unemployed single mother who had been in a string of relationships with violent men and was addicted to drugs. Yet despite these challenges, she was struggling bravely to put her life back together and retain custody of her young son. (I’ve changed my patients’ names and some details about them to protect their privacy.)

Our therapy focused on supporting Grace’s attendance at Narcotics Anonymous meetings and reducing the anxiety she said had driven her to drugs. The first few months seemed to go well. Every week, she told me about her successes: She attended the NA meetings, got a job, and found a boyfriend who respected her.

Listen to the audio version of this article:Feature stories, read aloud: download the Audm app for your iPhone.

We both knew the stakes—custody of her son, and perhaps her life—and we refused to consider failure. Frequently, I asked Grace for feedback about our work together. She always assured me that the therapy was proving productive. However, her enthusiasm had a desperate, hard edge; she often spoke quickly, with a tight, forced smile.

I received weekly supervision from a psychologist at my community-counseling training site. She was smart and perceptive, with decades of experience helping addicts; I was lucky to have her guidance. Three months into treatment, I told my supervisor Grace was doing so well that we had agreed to cut our sessions from weekly to biweekly. “It’s remarkable how quickly she’s improving,” I said. But my supervisor was cautious. “Getting clean is hard,” she told me, “but staying clean is harder.”

She was right. Soon thereafter, Grace no-showed for three straight therapy appointments. When she finally reappeared, she had relapsed on heroin. Over the next several months, everything she had built fell apart. She lost her job and her boyfriend, and kept going back to drugs. Yet she came faithfully to therapy, so I had a front-row seat to her painful unraveling. I tried every therapeutic technique I could find, but nothing stuck. Through it all, she insisted she could do it. “I’ve just got to stay positive,” she said.

A few months after relapsing, Grace died of a drug overdose, and her son was sent to foster care. I was devastated. The episode sparked a crisis in me: What could I have done differently? How could I become a more effective therapist?

Casting about for solutions, I recalled an idea that one of my professors had discussed in class a year earlier. He had read the book Moneyball, which described the Oakland Athletics’ revolutionary use of performance metrics, and he was curious whether psychotherapy could also benefit from more data and analytics. He showed us promising preliminary research, but also noted that many therapists were skeptical.

I’d had little interest in this topic when my professor first mentioned it. The very idea seemed too hypothetical, too academic, and almost insulting to the profession. Psychotherapy is unlike any other field, I’d thought, with the arrogance that comes from being untested. We work in a human relationship. What we do can’t be measured. However, after Grace died, I found myself more open to different approaches—to anything that might help me fix my blind spots and weaknesses.

A small mountain of clinical research shows that therapists—that is, anyone who provides talk therapy, from psychologists to social workers—vary widely in effectiveness. One study, led by John Okiishi of Brigham Young University, compared clinical outcomes from 91 therapists and found that the highest-performing among them helped clients improve 10 times faster than the overall average. On the other end of the spectrum, a study led by the psychologist David R. Kraus found that clients of the lowest-performing therapists were significantly worse off in the areas of violence and substance abuse at the end of treatment.

My introduction to the field came from my own therapist, who’d helped me greatly during my troubled teens. “Psychotherapy,” he once told me, “is a relational art. You can’t quantify personal growth.” I hadn’t really understood what he’d meant at the time, but meeting with him over a period of years had helped me considerably when I was depressed, angry, and anxious; whatever he did, it worked.

A decade and a half later—after many adventures and odd jobs in my 20s and early 30s—I entered graduate school with this same perspective on psychotherapy: that it was an art too nuanced and complex to be measured. Still, I couldn’t help but notice that, at my first training site, many of my clients remained stuck in neutral despite our best efforts together. A quarter or more of my clients dropped out without explanation a few weeks or months into treatment. And at least 10 percent were deteriorating. Because many of them had started treatment feeling suicidal or on the edge of needing hospitalization, they couldn’t afford to get worse. Unnervingly, I couldn’t predict which clients would stall, drop out, or deteriorate.

Psychotherapy, on the whole, can be very effective. This bears emphasis, because many people are still skeptical that it is a bona fide treatment. There is no shortage of empirical evidence demonstrating that psychotherapy helps patients with a wide range of problems, from the relatively simple (fear of flying, for example) to knotty and treatment-resistant conditions such as borderline personality disorder. It may not help everyone, but neither does a whole host of medicines for physical ailments. The point is, it does help a lot of people.

That said, as in any profession, there is still considerable room for improvement. My training experience was typical of broader trends: Across the field, dropout rates are estimated to be about 25 percent or more, and, most disheartening of all, 5 to 10 percent of clients deteriorate during treatment. These problems have been acknowledged since the birth of psychotherapy, when Freud himself wrote about “analysis terminable and interminable.”

In recognition of this challenge, psychotherapists have been working hard to boost outcomes. During the past three decades, much of this effort has focused on studying and debating which models of therapy are most effective. However, the results of these initiatives have been largely disappointing. Plenty of models—such as interpersonal therapy, emotion-focused therapy, and cognitive behavioral therapy—have performed well in studies. But larger meta-analyses suggest that most models are not consistently more successful than any other. This research was summarized in a 2012 statement by the American Psychological Association, which declared that “most valid and structured psychotherapies are roughly equivalent in effectiveness.”

Certainly, some models may be better or worse for individual clients. But encouraging therapists to generally favor one model over others hasn’t improved client outcomes. For example, a recent study in Britain examined the results of a major effort to train psychotherapists in cognitive behavioral therapy. Despite a massive investment of time and money, client outcomes did not improve.

If promoting one model over others doesn’t improve client outcomes, what does? As the APA put it, “Patient and therapist characteristics, which are not usually captured by a patient’s diagnosis or by the therapist’s use of a specific psychotherapy, affect the results.” In other words, more important than the model being used is the skill of the therapist: Can therapists engender trust and openness? Can they encourage patients to face their deepest fears? Can they treat clients with warmth and compassion while, when necessary, challenging them?

Doctors rely on a wide range of instruments—stethoscopes, lab tests, scalpels. Therapists, by contrast, are the main instruments of psychotherapy. But this merely brings us back to the central question I faced after Grace died: How can those instruments—the therapists themselves—be improved?

Most fields have experienced dramatic advancements over the past century. The story of how they moved forward often involved two closely related phenomena, both of which were brought about by technology.

The first of these is performance feedback, which gives individuals a heightened awareness of how well or poorly they’re doing their job. Consider the recent impact of slow-motion video technology on professional dance. In 2015, Wired argued that “for dancers, it’s become an incredibly useful tool for honing their craft. The newfound affordability of slow motion has enabled them to improve their technique, spruce up their audition reel, and isolate aspects of their performance that were once intangible.”

Unfortunately, perhaps no field faces higher barriers to incorporating performance feedback than psychotherapy. Because of the personal, sensitive nature of our work—which is protected by laws, regulations, and the general norms of the profession—therapists function largely in private, sheltered from objective feedback. Try to imagine a surgeon, a dancer, or any type of athlete learning without someone observing their work, but instead by simply sharing with their boss reflections on their recent performance. That’s the predicament many therapists are in.

Sure, we can ask our clients for feedback about what’s helping and what isn’t; most therapists do. However, asking only helps if clients are forthcoming with their answers. And many clients withhold critical feedback, especially when therapy is unhelpful. In a recent survey, Columbia University’s Matt Blanchard and Barry Farber asked 547 clients about their honesty in therapy. Seventy percent reported whitewashing feedback to their therapists, commonly by “pretending to find therapy effective” and “not admitting to wanting to end therapy.”* And if patients aren’t telling us the truth, how can we know whether they are likely to deteriorate, as Grace did before my eyes?

Which leads to the other 20th-century development that spurred many professions forward, while largely bypassing psychotherapy: the use of metrics to forecast likely outcomes. The most famous application of metrics is the “moneyball” concept that inspired my professor in graduate school: In the 1970s, a baseball fan named Bill James collected reams of performance data that had previously been ignored (or at least underappreciated) by professional teams, such as slugging percentage and on-base percentage. From this, he developed statistical tools for predicting the performance of baseball players. Ultimately, those tools transformed how baseball teams are managed. Could a similar approach—looking for statistical patterns among a vast array of psychotherapy outcomes—help therapists better predict our patients’ trajectories?

Guyco

over the past few decades, Michael Lambert, a researcher at Brigham Young University, has developed a system in which therapy clients take a 45-question survey before each appointment, and a computer tabulates their responses. The results are then displayed as a graph that quantifies the trajectory of each client’s symptoms, allowing his or her therapist to track the progress being made.

Lambert and his team have also been at the forefront of developing psychotherapy metrics. Drawing on historical data from thousands of cases, they created algorithms predicting when clients are at risk of deterioration. If, based on their answers to survey questions, clients appear to be at risk, their therapists are sent alerts that are color-coded for different concerns: red for risk of dropout or deterioration, yellow for less-than-expected progress. In an initial test, the algorithms were able to predict—with 85 percent accuracy and after only three therapy sessions—which clients would deteriorate.

Today, these surveys and algorithms are known as feedback-informed treatment, or fit. The system aids therapy in two primary ways. First, it provides an element of blunt performance feedback that therapists too often lack. Many clients are more willing to report worsening symptoms to a computer—even if they know that their therapist will see the results—than disappoint their therapist face-to-face.

The second benefit comes from the metrics: Risk alerts allow therapists to adjust treatment, and can help them compensate for natural overconfidence and clinical blind spots. In one study, 48 therapists, seeing several hundred clients at a single clinic, were asked to predict which of their patients would “get worse.” Only one of the therapists accurately identified a client at risk. Notably, this therapist was a trainee. The licensed therapists in the study didn’t accurately predict a single deterioration. Only three clients were predicted to get worse, despite therapists being informed by the researchers that the clinic-wide deterioration rate hovered around 8 percent—and despite the fact that 40 clients, or about 7 percent of those in the study, ultimately did deteriorate.

Some years after GRACE’S DEATH, I began working with a client named June. At that point—inspired by talks given by Scott D. Miller, who co-founded the International Center for Clinical Excellence and helped develop a fit system that uses algorithms built from 250,000 completed cases around the world—I was using fit as part of my approach to therapy.

June, who had recently dropped out of a local community college, was seeking help with anxiety, depression, and social isolation. She told me that she had been experiencing these symptoms her whole life. Her parents, with whom June still lived, were religious fundamentalists and very controlling.

Our therapy sessions seemed to start well. June was shy and quiet, and never made eye contact with me. But she seemed genuinely interested in learning skills to reduce her anxiety and reported practicing the skills between sessions. When I asked June for feedback at the end of each session, she told me the therapy was helpful. “The skills you’re teaching me are good,” she replied in her soft, careful voice.

Before each session, June took a few minutes to complete the fit survey on an iPad in the waiting room, responding to statements like “I feel fearful” and “I enjoy my spare time” with preset answers ranging from “never” to “almost always.” Though I had access to her clinical graph every session, I didn’t bother checking it at first, because she seemed to be progressing so well.

After a few sessions I finally checked the graph—more because I felt like I should than because I thought it would be helpful. I was shocked to see that June’s chart showed a red alert. Her symptoms had not improved since our first session. The algorithms reported that she was actually at a high risk of deterioration and suicide.

My gut reaction to the alert was skepticism—as it almost always is, to this day, when the program’s algorithms contradict my instincts. There must be a mistake in the software, I thought. June had repeatedly told me that therapy was helpful. At the beginning of our next session, I asked her how she was doing. Looking into the corner of the room, she replied that the skills I was teaching her were useful; but this time, I persisted: “I’m glad to hear the skills are helpful, but how are you doing?” June was silent for a while and shifted in her chair, clearly uncomfortable. I felt my own anxiety rise, and resisted the urge to change the subject. “Take your time,” I said. “There’s no rush.” After a period of silence, June looked me in the eye for perhaps the first time ever and said, “I’m sorry, but I think I’m worse. I just don’t want you to think it’s your fault; it’s mine. You’ve been really helpful.” June was deteriorating, but I never would have seen it without the program.

My experience mirrors that of therapists around the world. The success of Michael Lambert’s research sparked a surge in the creation of feedback systems: Close to 50 have been developed over the past two decades. As the systems have spread, they have accumulated ever larger banks of clinical data. Studies have shown that metrics significantly improve the effectiveness of psychotherapy, including reducing dropout rates and shortening the length of treatment. What’s not to like?

Unfortunately, in profession after profession, metrics have not been received with open arms. The history of the thermometer provides a classic example. In the mid-19th century, 250 years after the thermometer’s invention, Carl Wunderlich analyzed patient temperature data from more than 25,000 cases. He found that the average normal temperature of a healthy person ranged from 98.6 to 100.4 degrees. Going further, Wunderlich proposed the radical idea of tracking an illness by reading the patient’s temperature at regular intervals.

Many medical professionals were skeptical. Thermometers of that era were cumbersome—almost a foot long—and took 20 to 25 minutes to register a patient’s temperature. They had reliability problems, and doctors and nurses weren’t sure about the best ways to use them. Aside from the inconvenience, many physicians were affronted by the suggestion that they should use data from medical instruments to inform their diagnoses. Previously, physicians had diagnosed a fever by touching various parts of the patient’s body with their hands and making a determination from their blend of intuition and experience. Some worried that use of thermometers would lead to the “de-skilling” of physicians.

A century and a half later, psychotherapy metrics and feedback systems have met with much the same reaction. Dozens of studies attesting to the benefit of metrics and feedback have been published since the systems were first introduced. Yet therapists have been slow to adapt. One 2003 study led by Ann Garland of UC San Diego found that, among a sample of therapists in San Diego County who received client-outcome scores, 92 percent didn’t use them. And a 2013 paper by the University at Albany’s James Boswell and colleagues—citing research published in 2002, 2004, and 2008—noted, “Surveys spanning different countries indicate that few clinicians actually employ [fit] in their day-to-day work.”

Few, if any, more recent studies contain solid data on fit usage, but my anecdotal impression is that use of fit today remains disappointingly low among therapists. In my experience talking with peers, the most common reason for non-adoption is the belief that quantitative data—or worse, a computer—cannot possibly capture the nuances of psychotherapy; accordingly, many therapists feel that the whole idea of psychotherapy metrics should be rejected at face value.

The first part of this argument is correct: A single mental-health measure can’t identify the full range of psychological illnesses any more than a thermometer can detect cancer, diabetes, or heart disease. Moreover, the fit systems can give false positives and false negatives, thereby overstating or understating risks. But that isn’t a good reason to entirely ignore the data—just as the thermometer still provides valuable information even if it isn’t the final word on whether a patient is sick.

“It is probably true,” the historian A. J. Youngson wrote, “that one of the commonest features of new ideas—certainly of practical new ideas—is their imperfection.” Two hundred and fifty years elapsed between the invention of the thermometer and Wunderlich’s creation of a reliable protocol for clinical thermometry. Similarly, the refinement of fit will take time. For example, a recent meta-analysis suggested that the systems do not automatically improve therapy outcomes for all clients, only for clients at risk of deterioration (a limitation Michael Lambert had previously acknowledged). And, of course, the metrics are not helpful unless clinicians know how to use them to improve treatment. Collecting psychotherapy data is a key step in better understanding our patients. But it can’t cure mental illness any more than sticking a thermometer in a patient’s mouth can, by itself, treat the flu.

Robbie Babins-Wagner has experienced both the extraordinary potential and the severe growing pains associated with using metrics. She’s the CEO of the Calgary Counselling Centre, a large community mental-health organization in Western Canada with 24 staff therapists and 55 trainees. I first heard of the CCC when, a number of years ago, I asked Scott Miller for examples of clinics that were implementing fit. “You’ve got to talk with Robbie,” he said. “She’s at the leading edge, a decade ahead of everyone else.”

Babins-Wagner had 14 years of clinical experience when the CCC hired her as director of counseling in 1992. Looking for ways to improve the center, she read about the new metrics system created by Michael Lambert, and initiated a plan to implement psychotherapy metrics at the CCC—working collaboratively with the staff along the way. As Babins-Wagner put it in a paper she later co-authored, the hope was to use the fit data to help create a “climate for therapist improvement.”

At the conclusion of a four-year trial, Babins-Wagner aggregated and analyzed the data the CCC had collected. While the average outcomes were good, it turned out that only half of the therapists were using fit—even though everyone had been asked to. Because of the thick cloak of privacy that protects the therapy room, skeptics had been able to ignore the instructions they’d been given.

The most common complaints from therapists were “the data is wrong, we shouldn’t have to do it, and I know better,” Babins-Wagner says. “Meaning that my intuition tells me—my experience in the sessions tells me—that I know how my client is doing.”

Babins-Wagner listened to the therapists’ concerns and requested feedback on how to improve the metrics system. She also clarified that collecting outcome data was mandatory. Within a few months, 40 percent of the therapists resigned.

Yet Babins-Wagner was unyielding, and her perseverance has paid off. Simon Goldberg of the University of Wisconsin at Madison recently examined data from the CCC (I was one of eight co-authors on the study, but Goldberg did the vast majority of the work) and found a tiny but steady improvement in clinical effectiveness every year for seven years. As far as I can tell, this is only the second time year-over-year improvement in therapist effectiveness—measured by improved client outcomes—has been empirically demonstrated in psychotherapy research. (Other studies do show improvement in therapists’ “competence” in using models or “adherence” to those models—but a meta-analysis of 36 studies showed that “therapist adherence and competence play little role in determining symptom change.”)

Guyco

Despite these impressive results, adjusting to the use of data remains difficult for many. Michelle Keough, a counselor at the CCC, told me she had been skeptical of the system when she’d started as an intern a few years back. “I had some apprehension in terms of how a graph and how stats could be used in a way to benefit clients,” she recalled. She also worried that it could cause tension and impair her relationship with patients. But over time, she said, she came to realize the system actually improves communication: “Now I can’t imagine not using it in my practice.” She told me many of the trainees she supervises go through a similar journey—from early apprehension to embracing the system.

The intuitive reluctance to use metrics is something I understand well. It’s never pleasant to have my blind spots pointed out. It’s humbling at best, and humiliating at worst. It requires a daily fight with my own brain, which persistently tells me to ignore or distrust any new data that don’t fit my assumptions and expectations.

But while I know how difficult it is for therapists to override their gut instincts in favor of cold data, I also know, firsthand, how difficult it is for a patient when a therapist simply cannot see his or her condition accurately. In my early 30s, before I became a therapist, the anxiety and depression I had confronted as a teenager returned, and I started using drugs to self-medicate. When I realized I was in trouble, I reentered therapy with the psychologist who had previously helped me so effectively. However, this time around, our sessions didn’t seem to help. As had happened with Grace and me, I sat squarely in the middle of my own therapist’s blind spot. He did not use metrics, and he simply never believed that I was deteriorating, even when I started coming to sessions high.

Luckily, I had friends who encouraged me to seek out more-effective therapy. I used to be angry at my former therapist. But now I’m more understanding: I’ve failed to anticipate plenty of deteriorations and dropouts among my own patients. We therapists need to always remain aware that there is much we can’t see in the fog—and be open to tools that might compensate for our limited vision.

In June’s case, metrics and performance feedback may have saved her life. Like a psychological homing beacon, the feedback program drew my attention to her deterioration. And being alerted to the problem opened the door to finding a solution. I got June’s permission to record one of our sessions, and showed the video to a consultant, Jon Frederickson. Originally trained as a classical musician, Frederickson switched careers in his 30s. In graduate school, he was surprised that psychotherapy training didn’t use some of the principles—such as frequent performance feedback—that form the foundation of musical training. Now, with a few decades of experience as a therapist, Frederickson specializes in helping other therapists improve their effectiveness.

We watched the video of June’s therapy session together, and Frederickson spotted a few problems. For one thing, he observed that June was holding her stomach—suggesting that her anxiety was making her nauseated. He also noticed that during the session, June diligently practiced the skills I taught her, but never talked about how she actually felt while doing so. “You’ve unintentionally gotten into a top-down relationship with her, where you are in the teacher role, and she is trying to be a good student by minimizing her symptoms,” he explained. “She isn’t telling you about her discomfort out of deference to you.”

When I asked how I could help her, he counseled me to get out of the authority role, approach June as an equal partner, and help her acknowledge her pain and anxiety rather than defer to me. When I saw June next, I told her what Frederickson had said, and asked what she thought. She was quiet for a moment, then I saw a faint glimmer of a smile on her face. “He may be right,” she admitted.

We agreed to approach our work together with more attentiveness to her anxiety and more equal collaboration. This was not easy for either of us. June felt a constant internal pull to adopt the submissive role of a good student and minimize her painful symptoms, and I frequently felt a pull to teach her more skills rather than listen to her more carefully. Throughout this process, the feedback program served as an indispensable guide, helping us see what we were both tempted to ignore. Every time the system gave me an alert that June’s symptoms were worsening and she was back at risk of deterioration, I videotaped a session and got a consultation to help fix my errors.

Over the following year, June’s anxiety gradually eased. Two years later, she graduated from college with honors. In our last session, I asked her what about our therapy she thought had helped her the most. “You saw me,” she said with a shy smile, “from so far away.” Then she reached out and shook my hand for the very first time.


* This article originally stated that 93 percent of clients in the survey reported whitewashing feedback. We regret the error.