Chatbots Sound Like They’re Posting on LinkedIn
Large language models make things up, but the worse problem may be in how they present those falsehoods.
If you spend any time on the internet, you’re likely now familiar with the gray-and-teal screenshots of AI-generated text. At first they were meant to illustrate ChatGPT’s surprising competence at generating human-sounding prose, and then to demonstrate the occasionally unsettling answers that emerged once the general public could bombard it with prompts. OpenAI, the organization that is developing the tool, describes one of its biggest problems this way: “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.” In layman’s terms, the chatbot makes stuff up. As similar services, such as Google’s Bard, have rushed their tools into public testing, their screenshots have demonstrated the same capacity for fabricating people, historical events, research citations, and more, and for rendering those falsehoods in the same confident, tidy prose.
This apparently systemic penchant for inaccuracy is especially worrisome, given tech companies’ intent to integrate these tools into search engines as soon as possible. But a bigger problem might lie in a different aspect of AI’s outputs—more specifically, in the polite, businesslike, serenely insipid way that the chatbots formulate their responses. This is the prose style of office work and email jobs, of by-the-book corporate publicists and LinkedIn influencers with private-school MBAs. The style sounds the same—pleasant, measured, authoritative—no matter whether the source (be it human or computer) is trying to be helpful or lying through their teeth or not saying anything coherent at all.
In the United States, this is the writing style of institutional authority, and AI chatbots are so far exquisitely capable of replicating its voice, while delivering information that is patently unreliable. On a practical level, this will pose challenges for people who must navigate a world with this kind of technology suddenly thrust into it. Our mental shortcuts used for evaluating communicative credibility on the fly have always been less than perfect, and the very nature of the internet already makes such judgment calls more difficult and necessary. AI could make them nearly impossible.
ChatGPT and its ilk are built using what are known as large language models, or LLMs. That means they hoover up very large quantities of written language online and then, very crudely speaking, analyze that data set to determine which words would likely be assembled in which order to create a successful response. They generate text that’s been optimized for plausibility, not for truthfulness. Being right isn’t the goal, at least not now; sounding right is. For any particular query, there are many more answers that sound right than answers that are true. LLMs aren’t intentionally lying—they are not alive, and cannot produce results meaningfully similar to human thought. And they haven’t been created to mislead their users. The chatbots do, after all, frequently generate answers that are both plausible and correct, even though any veracity is incidental. They are, in other words, masters of bullshit—persuasive speech whose essence “is just this lack of connection to a concern with truth—this indifference to how things really are,” the philosopher Harry Frankfurt wrote in his book-length essay on this sort of rhetoric.
What LLMs are currently capable of producing is industrially scaled, industrial-grade bullshit. That’s troublesome for many reasons, not least of which is that humans have enough trouble discerning the age-old artisanal variety. Every human is required to make a zillion tiny decisions every day about whether some notion they’re presented with should be believed, and rarely do they have the opportunity or desire to stop, gather all the relevant information, and reason those decisions from first principles. To do so would pretty much halt human interaction as we know it, and even trying would make you pretty annoying.
So people instead rely on cognitive heuristics, which are little shortcuts that, in this case, help tip us toward belief or disbelief in situations where the full facts are unknown or unknowable. When you take medical advice from your doctor, you’ve employed an authority heuristic, which assigns trust in sources you believe have specialized knowledge and expertise. When you decide that something is probably true because it’s become the consensus among your family and friends, that’s the bandwagon heuristic at work. Even the best heuristics aren’t perfect: Your doctor might disbelieve your reported symptoms and misdiagnose you, or your social circle might be riddled with people who think the Earth is flat. But according to Miriam Metzger, a professor at UC Santa Barbara who studies how people evaluate credibility online, many of these shortcuts are, on balance, largely sound and extremely useful. Most people in most situations, for example, would be well served to listen to their doctor instead of taking medical advice from their weird cousin.
The growth of the internet has posed all kinds of issues for the accurate use of credibility heuristics, Metzger told me. There are too many potential sources of information vying directly for your attention, and too few ways to evaluate those sources or their motives quickly. Now your weird cousin is posting things on Facebook—and so are all of his weird friends, and their friends too. “The digital environment gives us a vastness of information in which it’s just harder for consumers to know who and what to trust,” Metzger said. “It’s put more of the burden on individuals to make their own credibility assessment practically every time they are confronted with new information.”
In the United States, this informational fragmentation is usually seen through the lens of politics, but it has also seeped into more mundane parts of life. On the internet, everyone can theoretically access expertise on everything. This freedom has some huge upsides, especially for people trying to solve small, manageable problems: There are enough instructional YouTube videos and Reddit threads to make you into your own travel agent, mechanic, plumber, and physical therapist. In many other scenarios, though, making judgment calls based on the internet’s conglomeration of questionably sourced knowledge and maybe-faux expertise can have real consequences. We often don’t have anywhere near the information we’d need to evaluate a source’s credibility, and when that happens, we generally start rummaging through our bag of heuristics until we find one that works with whatever context we do have. What we end up with might just be the fluency heuristic—which is to say, the sense that certain patterns of communication are inherently credible.
In mainstream American culture, good grammar, accurate spelling, and a large and varied vocabulary free of expletives, slurs, or slang are all prerequisites for credibility, and a lack of them can be used to discredit challengers to existing authority and malign people with less education or different cultural backgrounds. This heuristic also can be easily used against the people who employ it: The more the phishing email looks and sounds like real communication from your bank, the more accounts scammers get to drain.
This is where the tidy, professional corporate-speak of well-trained LLMs has serious potential to cause informational chaos, Metzger said. Among other sources, the best AIs are trained on editorial content from major media organizations, archives of academic research, and troves of government and legal documents, according to a recent report by The Washington Post. These are just the type of source that would employ a precise and highly educated communication style. ChatGPT and other chatbots like it are text-generation machines that make up facts and sever information from its source. They are also authority-simulation machines that discourage readers from ever doubting them in the first place.