m_topn picture
Atlantic Monthly Sidebar

A U G U S T  1 9 5 9

The Translating Machine
Even with increased staffs of translators, the United States is able to put into English less than half the year's grist of scientific material from other countries. But much has been accomplished toward creation of an electronic computer that will read and translate the printed page.

by David O. Woodbury

FOUR months before Sputnik I split the skies, the Soviet Academy of Sciences published a description of its plans for tracking that nation's first scientific satellite. The article appeared in an amateur radio magazine, putting all Russian "hams" on the alert. The inference was that the little missile would soon be fired.

Many American libraries subscribed to the publication and duly received it early in June, 1957. But there was only one translation known to have been made of the satellite article -- by the Air Force -- and that was kept secret. Details of the power and wave length of the satellite's broadcasting unit were therefore unavailable to American science. Nor were they known, apparently, to the Naval Research Laboratory, doggedly struggling with Project Vanguard, nor to its tracking experts of the Bureau of Standards and the Smithsonian Institution.

When Sputnik circled the globe on October 4 no one was more embarrassed than the Navy. Vanguard's elaborate "radio fence," stretching five thousand miles to southern Chile, could not listen in. It had been designed for a very different wave length. Over that first weekend of man's conquest of space, technicians from Washington to Santiago worked without sleep to convert their equipment in order to find out what Sputnik was doing. Meanwhile the first American contact was established by radio amateurs, by chance. Officially we had been caught napping because no one had thought it important to translate what the Russians had published.
Discuss this article in Post & Riposte.

Return to "Lost in Translation," by Stephen Budiansky (December, 1998)
Computers that can translate are actually getting to be quite good.

The incident sharply illuminates the unhappy fact that the world has no organized means of surmounting its language barrier, a situation rapidly becoming more dangerous as science and technology leap ahead. Thousands of languages and dialects are spoken around the globe. At the very least, essential material in fifty tongues must be available to all if civilization is to share its progress. Nothing remotely approaching this has been accomplished. Even if there were ten or a hundred times as many human translators as there are, and funds to pay them, they could not deal with the flood of material that is pouring from the world's technical minds and is lost.

In a report issued in 1957, UNESCO said:

At least 50% of scientific literature is in languages which more than half the world's scientists cannot read. Nearly two thirds of engineering literature appears in English, but more than two thirds of the world's professional engineers cannot read English and a still larger proportion of English-reading engineers cannot read scientific literature in other languages.... The greater part of what is published is inaccessible to most of those who could otherwise benefit from it.
The lack is serious not only in the scientific disciplines but in law, medicine, agriculture, architecture, economics, and the arts. When a specialist in any field publishes but cannot be read, others in the same field must rediscover his findings for themselves. Patents in one language, if unintelligible generally, cannot be licensed for the general good.

Since Sputnik I, a crash program by the National Science Foundation has been turning out an estimated 100 million words per year of technical Russian translation alone, but this is less than half of the text we should be handling. Little has been done with other important tongues. It is impossible to obtain enough translators, especially the trained people required for work in technical fields. Such people as there are cannot be spared from their chosen professions.

About ten years ago Dr. Warren Weaver of the Rockefeller Foundation, foreseeing the situation, suggested that the enormous routine job of translation might be done by machines. The idea was quickly taken up by linguists and electronics specialists in America and England. Today, the computer, or electronic brain, is well along toward picking up the burden of machine translation, known as "MT."

THE computer is a highly complex electronic device, often filling many rooms, based on the principle of the adding machine. It does not merely add, however. It can make comparisons and choices between numbers and deduce which of many intricate patterns will fulfill specified requirements. Since numerical quantities can symbolize entities which are not numbers, problems in logic are well within the machine's range. This is where machine translation will come in.

The modern computer's speed is so great that, even though the steps of computation may be difficult and numerous, it can handle investigations far too laborious for human beings. A large aircraft maker, for example, can now program an electronic brain with the design details of a score of airplane wings, "fly the ship" under all conceivable airborne conditions, and discover in a few days which is the best wing. Such thoroughness would require years of human analysis and has never been possible before. Electronic impulses travel at a speed of nearly 186,000 miles per second. Comparable human nerve stimuli, biologists have found, can achieve no more than 260 miles an hour. A single item of machine computation may be completed in a few millionths of a second and is instantly followed by the next.

Stuart Chase once pointed out whimsically that IBM's 701 computer was 100,000 times faster at arithmetic than he was. "After receiving 963 instructions," he wrote, "the 701 can calculate the path of a guided missile, performing eleven hundred thousand calculations, in two minutes. This sum I could toss off in a matter of fifteen years." The newer IBM-704 and the Univac are much faster. Computers are aiming at speeds a million times faster than human beings. It will tax even this speed to solve the problems of machine translation.

The heart of the computer is its "memory," which stores tens of thousands of bits of information comprising a statement of the problem to be solved and instructions for solving it. A difficult problem may take months of human effort to arrange in machine language and place in the memory. But once the button is pushed, the answers may be out in minutes. Thus, the computer is best suited to the solution of great numbers of problems using a single formula covered by one set of instructions. In addition to numerical solutions, the computer can search through incredible quantities of data and find a desired item, relating it to as many other items as may be demanded. Language translation falls in this class of operation.

In such use, the machine's memory becomes a dictionary or glossary of words in any two tongues desired. On the average, it takes about twenty-five separate bits of stored information to encode and hold a single word. This is not prohibitive, because of the speed with which the computer can assemble these bits and operate upon them. To translate, the machine will receive a word, like the German mein, and will hunt through its entire collection for it, locate the equivalent "my" in English, and deliver it to storage to await the rest of the translation. When all the words of the original have been run through, the converted text will be printed out or recorded on tape or cards in code form. The exact order of occurrence will be preserved.

Experimental word-for-word translations have been made in this way and constitute the first attempts to reach the "hardware" stage in machine translation. For the most part they have confined themselves to narrowly scientific fields such as brain surgery, genetics, and chemistry. A glossary of about five thousand words most often used in a particular discipline will give a fairly intelligible result in technical subjects.

Most of the earlier experiments involved no computer at all. In one of these, for example, the words of a few Russian sentences were numbered consecutively and each was written on a separate slip of paper. The slips were then shuffled to destroy their order and translated, all English meanings being entered on each slip. When they were reassembled in numerical order, a rough translation resulted, and this could be improved by a proper choice of meanings made by a person familiar with the subject matter but with no knowledge of Russian. The test showed that a computer could handle this elementary type of word-for-word translation. A successful paper exploration of this sort was made by Dr. A. D. Booth of the University of London and Dr. R. H. Richens of the Commonwealth Bureau of Genetics. They arranged glossaries for the subject of genetics in twenty different languages, then turned out simulated machine translations of a few sentences in each. After suitable postediting, the results were claimed to be very good indeed. Subsequently, an actual hardware experiment was conducted by the Institute of Languages and Linguistics of Georgetown University, under the direction of Professor Leon Dostert and his staff. International Business Machines Corporation contributed a 701 computer, and some two hundred and fifty words of a few Russian sentences were translated. It was a carefully groomed experiment termed a "pilot Russian-English conversion scheme" and couldn't very well fail. However, it did demonstrate a mechanical robot in action in the MT field, probably for the first time. Generous postediting made the translation a success.

The need for this postediting was humorously presented in Mark Twain's hilarious essay in which he translates his story of the jumping frog back from its French rendering. "Eh bien!" goes his word-for-word version. "I no saw that that frog had nothing of better than each frog." Such a translation would be dangerous for a surgeon seeking technical background in a foreign tongue for some risky new operation.

Nevertheless, there is a strong tendency today to use word-for-word MT for the preliminary drudgery of classifying foreign writings. Of the millions of words turned out by the world's technicians in all fields, only a very small part is truly informative. Somebody must winnow out the genuine contributions by reading everything, good and bad alike. To avoid this, it is proposed that computers be set to work merely indexing foreign articles as to subject and general content. To do this the computer will need to be provided with a list of key words in a given subject. When a whole article is fed to it, the machine will locate and assemble translations of the key words with page references. By inspecting a finished index of many such articles, an expert can quickly tell which items will interest him.

AN EXTENSION of this idea will be the abstracting of specific articles after indexing. Much more elaborate machine preparation will be necessary, because abstracting involves choice. The instructions will direct the computer to look for ideas. As it finds them it will assemble them in order and finally print them out. Any person familiar with the subject can then decide which article will be of use to him and can have them translated in the usual way.

Abstracting, however, does not face up to the real problem of machine translation. The final goal is high-grade automatic communication between tongues that can be relied upon for accuracy and, indeed, will make a reasonable job of preserving style. The computer must come close to functioning as a human translator functions; it must do something perilously like thinking. The obvious difficulty is that a word-for-word rendering is not good enough. It fails to take into account differences in word order, syntax, and grammar. Thus, the mechanical dictionary in the advanced machine must contain not only all forms and meanings of all words but must be able to interpret the significance of varying word arrangements.

It is impossible to accomplish this without loading the memory with a great number of rules telling the robot what to do in an enormous variety of situations. Computer designers expect that a really adequate machine memory will need a capacity of a billion or more separate bits of information. But they are surprisingly comfortable about this staggering requirement and have said, simply, "Tell us exactly what your rules are and how they work, and we will build a machine that will handle them."

The trouble is that linguists and grammarians have not yet discovered the rules, as they apply between two different languages. In fact, researchers in the machine translation art have spent nearly all of their first decade in trying to find them. Their problem has broken down into two parts: first, the painstaking discovery of the patterns of dependence between the rules of one language and those of another, and second, the translation of these patterns into rigorously logical structures with which a computer can deal. So far, a method of establishing logical rules is all that has come to light. The method consists of comparative studies of each pair of languages which will yield, in the end, a complete description of the interplay between their structures.

An enormous amount of routine research is needed for this. The sense of a text in one language is obtained by words, word prefixes and suffixes, grammatical relationships, and syntax. In another the same elements are used quite differently, sometimes not at all, to produce the same sense. Any attempt to transliterate the rules would end in gibberish. Some sort of bridge must be discovered between them -- a bridge that will always remain firm. The search is vastly complicated by the large number of exceptions, all of which must be rounded up and reduced to logical rules of their own. The early workers in machine translation have been dismayed to find that no adequate knowledge has been accumulated in this vital area. Comprehensive studies have never been made.

These studies must be done statistically by collecting large numbers of examples of each rule in each language and codifying them as to authenticity and frequency of infraction. Rules must then be set up to care for the exceptions. The most skilled linguist would need to devote a lifetime to a small fraction of this work. Fortunately, a short cut is offered by the computer itself, and a great deal of preliminary exploring is in progress at a number of universities. Two typical institutions among the many hard at work in this new field are the computation laboratories of the Massachusetts Institute of Technology, led by Professor V. H. Yngve, and Harvard University, under the direction of Dr. A. G. Oettinger. MIT is developing a special system of instructions for the conventional computer so that it can "read" large quantities of common newspaper text, search out the occurrences of particular language structures, and list them. It will soon be possible, for instance, to feed a million words of German into the computer memory and then ask the machine to search for a particular structural phenomenon and indicate whether or not it is a standard form. The machine will readily do this once the text it is to work on has been pretranslated into the punch-card or taped form.

For example, demands like these will come up: What is the sort of context that determines a specific German word order? Pick out all cases where a subject follows a verb. Show how punctuation is related to sense, how and when a compound word is used, the significance of capital initial letters. Search for the exceptions to all these things and devise rules that will reduce them to logic.

To investigate all important languages, even by rapid machine methods, may take a long time. But a start has been made with a few language pairs so that the scheme can be verified. Harvard is working from Russian to English and MIT from German to English.

The blue-ribbon translating machine that will operate on the rules thus learned will work something like this: it will have a kind of "outer office," or reception room, into which individual sentences in a foreign language will be introduced. Here the text will be scrutinized, and all of its structural factors will be recognized and codified into computer-type data. The code will say, in effect, "for this group of words, Rule 948 applies," and so on. These code elements will then go on through the computer to a department of structural transfer. Here, equivalent rules in the terminal language will be looked up and codified. With this new codification the original word cargo will pass along to a third department, where the code will be broken down into a construction routine for the equivalent English sentence. At this point the glossary will come into play and the original words will be translated and meanings assigned to them compatible with their sentence structure. The final operation will be simply the printing out of the solution to the problem. Sentences will be used as the basic units because they are, in general, self-contained and complete as to sense. The translation should be near perfect.

IF THE laboratories and the many helping hands from other universities and companies succeed, and the linguistic research is thorough enough, it will then be up to computer designers to produce a machine built specially for this one service. The general-purpose engineering computer is not well adapted to the stiff MT requirements at present. Its main shortcoming is its memory. The human mind, which achieves a large part of its functioning by consulting stored experience, requires about 10 billion neurons, or storage cells. No computer so far has anything like such a capacity. Mechanical memories today are usually magnetic, storing their bits of information as tiny areas which are either magnetized or not. Each of these states represents information when it is located at a particular physical "address." The simplest magnetic memory is a whirling drum of steel a few feet long and a foot or two in diameter. Its polished surface has room for tens of thousands of activated magnetic spots. As it revolves, at nearly a mile a minute, the surface sweeps beneath tiny coils of wire which can read out the magnetic condition of any spot in terms of a small electric pulse. These pulses can be combined to signify numerical "words," or code sequences.

A newer type of memory involves thousands of magnetic rings of about pinhead size, strung on a grid of wires. Since nothing moves but the electric pulses, these storage units respond virtually at the speed of light. The arrangement is so much more compact and rapid in its response that its storage capacity is many times greater than that of the revolving drum. But even so, the physical space requirements are large and the cost very high. New steps are being taken in the metallurgy of magnetism; soon we shall see disks of tremendous capacity, whirling at tens of thousands of r.p.m., and beyond that photographic memories stepping up storage facilities still more. A new order of efficiency is expected, too, from a tiny transistorlike device known as the cryotron. This can do extraordinary feats of memory storage when chilled by a bath of liquid helium to some 450 degrees below zero Fahrenheit. It is the first practical electronic device to come out of the wonderland of near-absolute zero, where the behavior of the elements is entirely different from that in the ordinary world.

Even so, computer scientists will not be satisfied. They are already talking about using the individual atoms themselves as memory storage units. Atoms are tiny magnets, so small that 100 million of them, end to end, occupy only an inch. If computers can use them, translating machines no bigger than desks will be possible, and the billion-sized information load required for the perfect mechanical dictionary will be at hand.

Memories, of course, are essential, but the crux of the problem is to make the computer human in its abilities. The late Dr. John von Neumann, one of the world's great mathematicians, spent years studying the possibilities of computers as independent thinkers. He thought it conceivable to design a machine capable of reproducing itself. Short of that, he felt machines could be trained to exercise a limited kind of personal judgment.

The continuing attempt among communications specialists has already yielded machines that can play games and "think" their way out of unexpected difficulties. Several have played games of checkers with human opponents and have sometimes beaten them. The famous Mouse, designed by Professor Norbert Wiener, can penetrate a mechanical maze in search of "cheese" (an electric contact which rings a bell), and, once it has figured out the correct path, can unerringly reach the goal again. If the maze is changed, Mouse utilizes past experience in solving the new difficulty more rapidly. Professor Wiener's machine and several other robots like it have proved that the act of learning is possible for electronic brains. The faculty can undoubtedly be extended and applied to the smooth translation of languages.

At present, a machine can accept only material expressed in special codes. Again, it is but a step to the computer that will read a printed page. The automatic reading of print is already a well-established science called "character recognition." Many banks use devices which sort checks by recognizing the numbers on them. An advanced machine goes through huge piles of travelers' checks, adds the digits of the numbers on them, and picks out any that may be fraudulent. Patents have already been obtained on electronic readers which actually view a page of type and print it out at a distance, without human intervention. As soon as such equipment can be applied to the translating computer, the time-consuming job of preparing texts in coded form will be eliminated. As for printing out translated texts, this is already old hat. In a big computer room one can see automatic printers racing along at the rate of six hundred lines per minute -- about one page every three seconds.

Professor William N. Locke, head of MIT's modern languages department and a prime mover in machine translation, is not going to be satisfied even with this kind of short cut. He would like to have a machine that will translate material that is merely spoken to it. This is not so fantastic as it sounds. Voice recognition techniques are far along in the laboratory already. The Bell Telephone Laboratories have a pet machine they call Audrey, which can listen to spoken numbers as well as sixteen of the principal phonetic sounds and repeat them. With the aid of a hiss-and-buzz generator, which produces the basic sound waves of speech, it can reconstruct simple conversation well enough to be understood. Its achievement does not seem so startling until one learns that it can transmit and make intelligible one hundred times as many telephone conversations as the average subscriber can. Audrey would make an admirable secretary for a translating machine, listening to a speech read aloud and then stuffing it into the computer for processing. Its only trouble is that it can't recognize a woman's voice.

At least half a dozen laboratories are hard at work in search of an electronic system for recognizing speech and coding it, no matter who talks, whether the speaker has a cold, is excited, or is underwater. When it succeeds, we shall be able to talk to a typewriter which has no stenographer in front of it or yell a number at a telephone and have it do its own dialing.

The ultimate in translation machines is thus not too hard to imagine. The final word -- and no informed engineer would be surprised -- will be the machine to which one can talk in one language while it simultaneously intones the translation in another, perhaps several others at once, depending upon what buttons are pushed. And if someday this ultimate robot, made a little tipsy by its own cleverness, should hear the lines

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe,
we might find out what Lewis Carroll really meant.

Copyright © 1959 by David O. Woodbury. All rights reserved.
The Atlantic Monthly; August 1959; The Translating Machine; Volume 204, No. 2; pages 60 - 64.

m_nv_cv picture m_nv_un picture m_nv_am picture m_nv_pr picture m_nv_as picture m_nv_se picture