Why Are We Letting the AI Crisis Just Happen?

Bad actors could seize on large language models to engineer falsehoods at unprecedented scale.

Illustration of a person falling into a swirl of text
The Atlantic

Updated at 12:40 p.m. ET on March 17, 2023

New AI systems such as ChatGPT, the overhauled Microsoft Bing search engine, and the reportedly soon-to-arrive GPT-4 have utterly captured the public imagination. ChatGPT is the fastest-growing online application, ever, and it’s no wonder why. Type in some text, and instead of getting back web links, you get well-formed, conversational responses on whatever topic you selected—an undeniably seductive vision.

But the public, and the tech giants, aren’t the only ones who have become enthralled with the Big Data–driven technology known as the large language model. Bad actors have taken note of the technology as well. At the extreme end, there’s Andrew Torba, the CEO of the far-right social network Gab, who said recently that his company is actively developing AI tools to “uphold a Christian worldview” and fight “the censorship tools of the Regime.” But even users who aren’t motivated by ideology will have their impact. Clarkesworld, a publisher of sci-fi short stories, temporarily stopped taking submissions last month, because it was being spammed by AI-generated stories—the result of influencers promoting ways to use the technology to “get rich quick,” the magazine’s editor told The Guardian.

This is a moment of immense peril: Tech companies are rushing ahead to roll out buzzy new AI products, even after the problems with those products have been well documented for years and years. I am a cognitive scientist focused on applying what I’ve learned about the human mind to the study of artificial intelligence. I’ve also founded a couple AI companies myself, and I'm considering founding another. Way back in 2001, I wrote a book called The Algebraic Mind in which I detailed then how neural networks, a kind of vaguely brainlike technology undergirding some AI products, tended to overgeneralize, applying individual characteristics to larger groups. If I told an AI back then that my aunt Esther had won the lottery, it might have concluded that all aunts, or all Esthers, had also won the lottery.

Technology has advanced quite a bit since then, but the general problem persists. In fact, the mainstreaming of the technology, and the scale of the data it’s drawing on, has made it worse in many ways. Forget Aunt Esther: In November, Galactica, a large language model released by Meta—and quickly pulled offline—reportedly claimed that Elon Musk had died in a Tesla car crash in 2018. Once again, AI appears to have overgeneralized a concept that was true on an individual level (someone died in a Tesla car crash in 2018) and applied it erroneously to another individual who happens to share some personal attributes, such as gender, state of residence at the time, and a tie to the car manufacturer.

This kind of error, which has come to be known as a “hallucination,” is rampant. Whatever the reason that the AI made this particular error, it’s a clear demonstration of the capacity for these systems to write fluent prose that is clearly at odds with reality. You don’t have to imagine what happens when such flawed and problematic associations are drawn in real-world settings: NYU’s Meredith Broussard and UCLA’s Safiya Noble are among the researchers who have repeatedly shown how different types of AI replicate and reinforce racial biases in a range of real-world situations, including health care. Large language models like ChatGPT have been shown to exhibit similar biases in some cases.

Nevertheless, companies press on to develop and release new AI systems without much transparency, and in many cases without sufficient vetting. Researchers poking around at these newer models have discovered all kinds of disturbing things. Before Galactica was pulled, the journalist Tristan Greene discovered that it could be used to create detailed, scientific-style articles on topics such as the benefits of anti-Semitism and eating crushed glass, complete with references to fabricated studies. Others found that the program generated racist and inaccurate responses. (Yann LeCun, Meta’s chief AI scientist, has argued that Galactica wouldn’t make the online spread of misinformation easier than it already is; a Meta spokesperson told CNET in November, “Galactica is not a source of truth, it is a research experiment using [machine learning] systems to learn and summarize information.”)

More recently, the Wharton professor Ethan Mollick was able to get the new Bing to write five detailed and utterly untrue paragraphs on dinosaurs’ “advanced civilization,” filled with authoritative-sounding morsels including “For example, some researchers have claimed that the pyramids of Egypt, the Nazca lines of Peru, and the Easter Island statues of Chile were actually constructed by dinosaurs, or by their descendents or allies.” Just this weekend, Dileep George, an AI researcher at DeepMind, said he was able to get Bing to create a paragraph of bogus text stating that OpenAI and a nonexistent GPT-5 played a role in the Silicon Valley Bank collapse. Asked about these examples, a Microsoft spokesperson said, “In addition to the practices Bing has developed over several years to mitigate misinformation in the search context, we have developed a safety system including content filtering, operational monitoring, and abuse detection to provide a safe search experience for our users. We have also taken additional measures in the chat experience by providing the system with text from the top search results and instructions to ground its responses in search results. Users are also provided with explicit notice that they are interacting with an AI system and advised to check the links to materials to learn more.”

Some observers, like LeCun, say that these isolated examples are neither surprising nor concerning: Give a machine bad input and you will receive bad output. But the Elon Musk car crash example makes clear these systems can create hallucinations that appear nowhere in the training data. Moreover, the potential scale of this problem is cause for worry. We can only begin to imagine what state-sponsored troll farms with large budgets and customized large language models of their own might accomplish. Bad actors could easily use these tools, or tools like them, to generate harmful misinformation, at unprecedented and enormous scale. In 2020, Renée DiResta, the research manager of the Stanford Internet Observatory, warned that the “supply of misinformation will soon be infinite.” That moment has arrived.

Each day is bringing us a little bit closer to a kind of information-sphere disaster, in which bad actors weaponize large language models, distributing their ill-gotten gains through armies of ever more sophisticated bots. GPT-3 produces more plausible outputs than GPT-2, and GPT-4 will be more powerful than GPT-3. And none of the automated systems designed to discriminate human-generated text from machine-generated text has proved particularly effective.

We already face a problem with echo chambers that polarize our minds. The mass-scale automated production of misinformation will assist in the weaponization of those echo chambers and likely drive us even further into extremes. The goal of the Russian “Firehose of Falsehood” model is to create an atmosphere of mistrust, allowing authoritarians to step in; it is along these lines that the political strategist Steve Bannon aimed, during the Trump administration, to “flood the zone with shit.” It’s urgent that we figure out how democracy can be preserved in a world in which misinformation can be created so rapidly, and at such scale.

One suggestion, worth exploring but likely insufficient, is to “watermark” or otherwise track content that is produced by large language models. OpenAI might for example watermark anything generated by GPT-4, the next-generation version of the technology powering ChatGPT; the trouble is that bad actors could simply use alternative large language models to create whatever they want, without watermarks.

A second approach is to penalize misinformation when it is produced at large scale. Currently, most people are free to lie most of the time without consequence, unless they are, for example, speaking under oath. America’s Founders simply didn’t envision a world in which someone could set up a troll farm and put out a billion mistruths in a single day, disseminated with an army of bots, across the internet. We may need new laws to address such scenarios.

A third approach would be to build a new form of AI that can detect misinformation, rather than simply generate it. Large language models are not inherently well suited to this; they lose track of the sources of information that they use, and lack ways of directly validating what they say. Even in a system like Bing’s, where information is sourced from the web, mistruths can emerge once the data are fed through the machine. Validating the output of large language models will require developing new approaches to AI that center reasoning and knowledge, ideas that were once popular but are currently out of fashion.

It will be an uphill, ongoing move-and-countermove arms race from here; just as spammers change their tactics when anti-spammers change theirs, we can expect a constant battle between bad actors striving to use large language models to produce massive amounts of misinformation and governments and private corporations trying to fight back. If we don’t start fighting now, democracy may well be overwhelmed by misinformation and consequent polarization—and perhaps quite soon. The 2024 elections could be unlike anything we have seen before.

This story has been updated to note the author's previous involvement in AI companies.

​When you buy a book using a link on this page, we receive a commission. Thank you for supporting The Atlantic.