Noam Chomsky on Where Artificial Intelligence Went Wrong

A very different approach, which I think is the right approach, is to try to see if you can understand what the fundamental principles are that deal with the core properties, and recognize that in the actual usage, there's going to be a thousand other variables intervening -- kind of like what's happening outside the window, and you'll sort of tack those on later on if you want better approximations, that's a different approach. These are just two different concepts of science. The second one is what science has been since Galileo, that's modern science. The approximating unanalyzed data kind is sort of a new approach, not totally, there's things like it in the past. It's basically a new approach that has been accelerated by the existence of massive memories, very rapid processing, which enables you to do things like this that you couldn't have done by hand. But I think, myself, that it is leading subjects like computational cognitive science into a direction of maybe some practical applicability... engineering?

Chomsky: ...But away from understanding. Yeah, maybe some effective engineering. And it's kind of interesting to see what happened to engineering. So like when I got to MIT, it was 1950s, this was an engineering school. There was a very good math department, physics department, but they were service departments. They were teaching the engineers tricks they could use. The electrical engineering department, you learned how to build a circuit. Well if you went to MIT in the 1960s, or now, it's completely different. No matter what engineering field you're in, you learn the same basic science and mathematics. And then maybe you learn a little bit about how to apply it. But that's a very different approach. And it resulted maybe from the fact that really for the first time in history, the basic sciences, like physics, had something really to tell engineers. And besides, technologies began to change very fast, so not very much point in learning the technologies of today if it's going to be different 10 years from now. So you have to learn the fundamental science that's going to be applicable to whatever comes along next. And the same thing pretty much happened in medicine. So in the past century, again for the first time, biology had something serious to tell to the practice of medicine, so you had to understand biology if you want to be a doctor, and technologies again will change. Well, I think that's the kind of transition from something like an art, that you learn how to practice -- an analog would be trying to match some data that you don't understand, in some fashion, maybe building something that will work -- to science, what happened in the modern period, roughly Galilean science.

I see. Returning to the point about Bayesian statistics in models of language and cognition. You've argued famously that speaking of the probability of a sentence is unintelligible on its own...

Chomsky: ..Well you can get a number if you want, but it doesn't mean anything.

It doesn't mean anything. But it seems like there's almost a trivial way to unify the probabilistic method with acknowledging that there are very rich internal mental representations, comprised of rules and other symbolic structures, and the goal of probability theory is just to link noisy sparse data in the world with these internal symbolic structures. And that doesn't commit you to saying anything about how these structures were acquired -- they could have been there all along, or there partially with some parameters being tuned, whatever your conception is. But probability theory just serves as a kind of glue between noisy data and very rich mental representations.

Chomsky: Well... there's nothing wrong with probability theory, there's nothing wrong with statistics.

But does it have a role?

Chomsky: If you can use it, fine. But the question is what are you using it for? First of all, first question is, is there any point in understanding noisy data? Is there some point to understanding what's going on outside the window?

Well, we are bombarded with it [noisy data], it's one of Marr's examples, we are faced with noisy data all the time, from our retina to...

Chomsky: That's true. But what he says is: Let's ask ourselves how the biological system is picking out of that noise things that are significant. The retina is not trying to duplicate the noise that comes in. It's saying I'm going to look for this, that and the other thing. And it's the same with say, language acquisition. The newborn infant is confronted with massive noise, what William James called "a blooming, buzzing confusion," just a mess. If say, an ape or a kitten or a bird or whatever is presented with that noise, that's where it ends. However, the human infants, somehow, instantaneously and reflexively, picks out of the noise some scattered subpart which is language-related. That's the first step. Well, how is it doing that? It's not doing it by statistical analysis, because the ape can do roughly the same probabilistic analysis. It's looking for particular things. So psycholinguists, neurolinguists, and others are trying to discover the particular parts of the computational system and of the neurophysiology that are somehow tuned to particular aspects of the environment. Well, it turns out that there actually are neural circuits which are reacting to particular kinds of rhythm, which happen to show up in language, like syllable length and so on. And there's some evidence that that's one of the first things that the infant brain is seeking -- rhythmic structures. And going back to Gallistel and Marr, its got some computational system inside which is saying "okay, here's what I do with these things" and say, by nine months, the typical infant has rejected -- eliminated from its repertoire -- the phonetic distinctions that aren't used in its own language. So initially of course, any infant is tuned to any language. But say, a Japanese kid at nine months won't react to the R-L distinction anymore, that's kind of weeded out. So the system seems to sort out lots of possibilities and restrict it to just ones that are part of the language, and there's a narrow set of those. You can make up a non-language in which the infant could never do it, and then you're looking for other things. For example, to get into a more abstract kind of language, there's substantial evidence by now that such a simple thing as linear order, what precedes what, doesn't enter into the syntactic and semantic computational systems, they're just not designed to look for linear order. So you find overwhelmingly that more abstract notions of distance are computed and not linear distance, and you can find some neurophysiological evidence for this, too. Like if artificial languages are invented and taught to people, which use linear order, like you negate a sentence by doing something to the third word. People can solve the puzzle, but apparently the standard language areas of the brain are not activated -- other areas are activated, so they're treating it as a puzzle not as a language problem. You need more work, but...

You take that as convincing evidence that activation or lack of activation for the brain area ...

Chomsky: ...It's evidence, you'd want more of course. But this is the kind of evidence, both on the linguistics side you look at how languages work -- they don't use things like third word in sentence. Take a simple sentence like "Instinctively, Eagles that fly swim", well, "instinctively" goes with swim, it doesn't go with fly, even though it doesn't make sense. And that's reflexive. "Instinctively", the adverb, isn't looking for the nearest verb, it's looking for the structurally most prominent one. That's a much harder computation. But that's the only computation which is ever used. Linear order is a very easy computation, but it's never used. There's a ton of evidence like this, and a little neurolinguistic evidence, but they point in the same direction. And as you go to more complex structures, that's where you find more and more of that.

That's, in my view at least, the way to try to discover how the system is actually working, just like in vision, in Marr's lab, people like Shimon Ullman discovered some pretty remarkable things like the rigidity principle. You're not going to find that by statistical analysis of data. But he did find it by carefully designed experiments. Then you look for the neurophysiology, and see if you can find something there that carries out these computations. I think it's the same in language, the same in studying our arithmetical capacity, planning, almost anything you look at. Just trying to deal with the unanalyzed chaotic data is unlikely to get you anywhere, just like as it wouldn't have gotten Galileo anywhere. In fact, if you go back to this, in the 17th century, it wasn't easy for people like Galileo and other major scientists to convince the NSF [National Science Foundation] of the day -- namely, the aristocrats -- that any of this made any sense. I mean, why study balls rolling down frictionless planes, which don't exist. Why not study the growth of flowers? Well, if you tried to study the growth of flowers at that time, you would get maybe a statistical analysis of what things looked like.

It's worth remembering that with regard to cognitive science, we're kind of pre-Galilean, just beginning to open up the subject. And I think you can learn something from the way science worked [back then]. In fact, one of the founding experiments in history of chemistry, was about 1640 or so, when somebody proved to the satisfaction of the scientific world, all the way up to Newton, that water can be turned into living matter. The way they did it was -- of course, nobody knew anything about photosynthesis -- so what you do is you take a pile of earth, you heat it so all the water escapes. You weigh it, and put it in a branch of a willow tree, and pour water on it, and measure you the amount of water you put in. When you're done, you the willow tree is grown, you again take the earth and heat it so all the water is gone -- same as before. Therefore, you've shown that water can turn into an oak tree or something. It is an experiment, it's sort of right, but it's just that you don't know what things you ought to be looking for. And they weren't known until Priestly found that air is a component of the world, it's got nitrogen, and so on, and you learn about photosynthesis and so on. Then you can redo the experiment and find out what's going on. But you can easily be misled by experiments that seem to work because you don't know enough about what to look for. And you can be misled even more if you try to study the growth of trees by just taking a lot of data about how trees growing, feeding it into a massive computer, doing some statistics and getting an approximation of what happened.

In the domain of biology, would you consider the work of Mendel, as a successful case, where you take this noisy data -- essentially counts -- and you leap to postulate this theoretical object...

Chomsky: ...Well, throwing out a lot of the data that didn't work.

...But seeing the ratio that made sense, given the theory.

Chomsky: Yeah, he did the right thing. He let the theory guide the data. There was counter data which was more or less dismissed, you know you don't put it in your papers. And he was of course talking about things that nobody could find, like you couldn't find the units that he was postulating. But that's, sure, that's the way science works. Same with chemistry. Chemistry, until my childhood, not that long ago, was regarded as a calculating device. Because you couldn't reduce to physics. So it's just some way of calculating the result of experiments. The Bohr atom was treated that way. It's the way of calculating the results of experiments but it can't be real science, because you can't reduce it to physics, which incidentally turned out to be true, you couldn't reduce it to physics because physics was wrong. When quantum physics came along, you could unify it with virtually unchanged chemistry. So the project of reduction was just the wrong project. The right project was to see how these two ways of looking at the world could be unified. And it turned out to be a surprise -- they were unified by radically changing the underlying science. That could very well be the case with say, psychology and neuroscience. I mean, neuroscience is nowhere near as advanced as physics was a century ago.

That would go against the reductionist approach of looking for molecules that are correlates of...

Chomsky: Yeah. In fact, the reductionist approach has often been shown to be wrong. The unification approach makes sense. But unification might not turn out to be reduction, because the core science might be misconceived as in the physics-chemistry case and I suspect very likely in the neuroscience-psychology case. If Gallistel is right, that would be a case in point that yeah, they can be unified, but with a different approach to the neurosciences.

Presented by

Yarden Katz is a graduate student in the Department of Brain and Cognitive sciences at MIT, where he studies the regulation of gene expression in the developing nervous system and in cancer. 

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register with Disqus.

Please note that The Atlantic's account system is separate from our commenting system. To log in or register with The Atlantic, use the Sign In button at the top of every page.

blog comments powered by Disqus