Recognizable Characters

New software helps deliver on an elusive promise of the computer era: the paperless office

THE dawn of the computer era promised us a paperless world, in which everything we needed —memos, reports, even newspapers and books—would be available on a video terminal. Our file cabinets would contain only B.C. (before computer) documents, and we would once again be able to see the surface of our desks, on which would be only a terminal and a phone.

But, as anyone who works in an office knows, the haystacks of paper have proliferated. Studies have shown that paper waste has increased since printers made long documents so easy to pump out. Finally, though, two innovations, long promised and for the first time really practical, have brought us very close to the paperless world.

The kingdom of personal computers is divided into two great domains: “character-oriented” and “graphics-oriented” systems. This division lies beneath the more familiar ways of classifying computers— for instance, the Macintosh family versus that of IBM-compatible computers, or, within the IBM-compatible world, Windows versus DOS. Character-oriented systems (DOS is the best known) treat the screen as if it were an endless roll of typing paper and the computer itself as if it were a very sophisticated typewriter. In graphics-oriented systems (Mac and Windows) the screen is an animated display, like that of a video game, and the user can jump around to control different parts of the screen with a joystick or mouse.

Without graphics displays it would be difficult to use drawing programs or manipulate fancy fonts. But these advantages come at a tremendous cost in efficiency. On a standard character-based computer screen, like that for the original WordPerfect or Lotus 1-2-3, no more than 2,000 characters are visible at a time. It takes eight bits of computer data to specify each character; therefore, with 16,000 bits of information the computer can update the entire screen display. Any computer can do this almost instantaneously, which is why it was impossible to out-type the display even on the original IBM PCs, now thought of as lumbering dinosaurs. With IBM-compatible graphics displays the computer needs to specify each dot, or “pixel,” on the screen, with eight, sixteen, or even twenty-four bits of data per dotin effect drawing each character anew rather than calling it up from a video memory bank. The high-resolution display I now use (an XGA-2) has about 800,000 pixels, each separately controlled with sixteen bits of data. Keeping this screen display current with the computer’s activity requires about 800 times as much computing power as the original PCs required. It is only the advent of high-speed processing chips and video accelerators that has made graphically oriented systems like Windows and OS/2 feasible.

Most of what people create on computers is character-based—that is, it consists of letters or numbers—and you usually have to print it to get it to other people. (The main exception is electronic mail.) Now I have found ways to use computers to cut down on two of the most annoying forms of paper in the modern office: sheaves of greasy-feeling paper curling out of the fax machine, and piles of clippings from newspapers and magazines. Fax-handling systems in the tirst case and “optical character recognition” (OCR) programs in the second have improved enough to be practical alternatives to normal paper storage.

MODEMS that can send and receive fax signals (in addition to handling normal data streams) have been on the market for about live years. The big drawback of early models was that they monopolized the computer’s attention; if the computer was transmitting a fax, it couldn’t do anything else. The release of Intel’s SatisFAXtion board, in 1990. represented the first significant step through this barrier. The board contained its own processing chip, which allowed it to send faxes. or automatically answer the phone and receive them, without disturbing whatever other program the computer was running. Intel has now released a full line of SatisFAXtion boards, ranging in price from $100 to $270, depending on transmission speed and other features. The Intel boards also include built-in modems, with speeds ranging from 2,400 baud for the less expensive models to 14,400 baud. (Three years ago I bought the 2,400-baud model for $199. If I were buying a new board now, I’d get the SatisFAXtion 400, which offers a 14,400-baud modem and a 14,400-baud fax, for $270.) Several other manufacturers offer comparable products.

Even after the SatisFAXtion board solved the hardware problem, a software problem remained. The FAXability software that came with Intel’s board let you view received faxes on your computer’s screen, printing them only if you chose to, but it could do very little else. In the past two years this gap has been lilied by several excellent fax programs. I use FaxWorks OS/2, produced by SofNet, in Atlanta (800-329-9675), which also makes a Windows version, called FaxWorks PRO. Other leading programs are WinFax PRO, from Delrina (800-268-6082), and Eclipse Fax, from Phoenix Technologies (800-452-0120).

There are slight variations among these programs. I prefer FaxWorks OS/2, despite sometimes slow and surly responses from the technical-support staff, because it is exceptionally crash-proof and stable. Not once, in nearly two years of use, has the program failed or caused my computer to “hang.” But the other programs, especially WinFax PRO and FaxWorks PRO, offer more-sophisticated commands for viewing and editing faxes; they are also cheaper, at less than $100 versus about $140 for FaxWorks OS/2.

One great convenience of these programs is “virtual printing.” In effect, this allows you to send anything from your computer to be printed on someone else’s fax machine, wherever it might be. On most personal computers the printer is connected to the parallel port, usually called LPT1. These fax programs create an imaginary printer, which you assign to an imaginary printer port—usually LPT3. You tell your word-processing or database program to send a print job to LPT3, and then indicate, on a series of FaxWorks screens, the phone number to which the fax should be sent. The result can often be crisper-looking when it arrives as a fax than if you had actually printed the pages and sent them through a normal fax machine. This is especially true if your real printer is not a high-resolution laser printer. Most fax programs offer ways (sometimes for a slightly higher price) for the imaginary printer to emulate the fonts and resolution of laser printers when transmitting a fax.

Sending even a very long fax this way involves just a few seconds’ effort rather than the annoyance of printing out a long document and feeding pages into a fax machine. The transmission can be scheduled for times when you’re not there, or when the phone rates are low.

The other crucial feature of these programs is their ability to receive faxes without your having to print them out, thereby moving closer to the paperlessoffice ideal. Whether you actually use the programs this way will depend on how good a screen and display card you have. A bottom-of-the-line VGA screen offers screen resolution of 640 x 480 pixels; most faxes are hard to read and unpleasant to look at on such a screen. If you buy a new system, get one with a resolution of 1.024 X 768 pixels. At that resolution it is realistic to deal with most laxes in a paperless way—reading them, responding, printing out the few faxes you need to save, and erasing the rest from the disk. The lowest screen resolution at which you should think of using lax software this way is 800 x 600 pixels, that ol the standard Super VGA screen.

Today’s fax programs also let you manipulate the faxes that come in—filling in boxes, typing annotations and replies, copying text or pictures from one document to another. FaxWorks PRO, WinFax PRO, and Eclipse Fax have elaborate. Macintoshlike drawing tools, but even the simplest of these tools raise a disturbing possibility. As soon as you have used them, you realize that you can never trust any faxed, photocopied, or laser-printed document again. Using these programs you can forge copies of canceled checks, by copying signatures or numerals from one check and inserting them on another. You can invent or alter receipts in ways no one could detect. You can remove the word “not” from one part of a letter or contract and put it somewhere else. You can make inconvenient words disappear from a quotation. I have done all these things, for sport. There is no turning back from technology, but these programs make you remember to be on guard.

The fax programs work, of course, only when your computer is turned on. To receive faxes, therefore, you must either leave the computer on all the time or switch to a regular fax machine when the computer is off. (I turn my computer off at night, and my monitor off when I’ll be away more than five or ten minutes. Many people think they should always leave the machines on, to avoid the start-up surge of power that can damage chips. The disadvantage of this approach is needless wear on the hard drive, which spins constantly while the computer is running and will eventually wear out. Realistically, most computers become obsolete before either the chips or the hard drive fails. Therefore you might as well save power by turning off the computer when you can. Monitors consume a lot of electricity, and it’s better to turn them off than to run a “screen saver” when you’re away.)

FAX programs may receive handwritten documents, pages from books, and signed contracts, but they are not designed to send anything not directly generated by your computer. There is a way around this limitation, but it involves an expensive piece of hardware: a scanner.

Scanners come in two varieties, handheld and flathed. Each captures an image from a page, much as a photocopy machine does, and stores it in your computer. The hand-held models are cheaper, ranging from $100 to $300 versus at least $700 for a good flathed model. Their reliability has also improved in the past year, with new models that make it easier to scan in a straight line. Still, I regard all hand-held scanners as a waste of money. If you’re interested in scanning, wait until you’re ready to invest in a flathed model—from Hewlett-Packard, Microtek, UMAX, or other manufacturers. I bought a Hewlett-Packard ScanJet IIp almost two years ago for $900. It was a big investment, but having taken the plunge, I find that I use the machine several times a day, A scanner can be used with a tax board to turn it into a full-service fax machine: you can scan a book page or any document into the computer, and then send it out as a fax. It can serve as a Rube Goldbergish replacement for a photocopying machine: you can scan the document and then print it. It lets you copy drawings or logos and include them on stationery. And it makes possible the miracle of OCR.

Until about two years ago optical character recognition was less a miracle than a hoax—or, more charitably, a dream. The premise of such programs was that they could reverse the character-tographics transformation that takes place when you print a document. You would put a printed page into your scanner and the OCR program would convert its image into a normal text file. From the thousands of dots that make up a newspaper page, you could distill words and sentences—which you could then edit, save, index, and store like any other file.

Through the 1980s this process worked about as well as programs for checking grammar or translating English into Japanese—that is, very poorly. Obstacles that human readers would surmount without noticing—blurry type, run-together letters, unusual fonts—brought the OCR programs to a halt. That is not true anymore.

The first significant breakthrough came a year ago, with a modest program called TextBridge, from Xerox Imaging Systems (800-248-6550). This is a bare-bones program, offering only a few rudimentary commands and costing less than $100, but it was noticeably more accurate—and unquestionably more affordable—than most commercial predecessors. Before TextBridge, most OCR programs attained accuracy rates of 90 to 95 percent. This sounds impressive until you realize that it means four to eight errors in an average line of type. In practice OCR is useful only when the accuracy rate approaches 99 percent, as it seems to with TextBridge.

The two best-known OCR programs are WordScan Plus, from Calera Recognition Systems (800-224-0660), and OmniPage Professional, from Caere (pronounced “care”; 800-535-7226). Both programs offer more refined controls than TextBridge does. For instance, they can save text in a variety of popular wordprocessing formats, and they make it easier than TextBridge does to check characters the program may have misrecognized. They are also more expensive than TextBridge. The list prices for Calera and Caere products range from $295 to $695, depending on features; the “street prices,” from mail-order houses, are about a third less. Each program claims to be faster than the other, depending on the circumstances; in practice both are fast and accurate enough to make OCR worthwhile.

The other notable OCR program I have encountered is CharacterEyes, from a small Jerusalem-based company called Ligature Software (617-238-6734; 800888-0060). The program was originally designed to perform OCR on Hebrew. It now supports the character sets of all Western languages, and when working on English it is as fast and accurate as the Calera and Caere products. All the OCR programs mentioned here are designed to work under Windows 3.1 but also run under OS/2 2.1, which i use.

When I first bought WordScan Plus, someone who worked at Calera informed me that text recognition is “an art, not a science.” This was not the kind of comment I was used to hearing about the 0-or1 binary world of computing, but I now understand what he meant. You learn after a while that certain kinds of text lend themselves well to OCR. and other kinds do not. Articles from The Wall Street Journal, for instance, always come through with 100 percent accuracy (to speak only of typography). I no longer even bother to try articles from The Washington Post, because they’ll be riddled with (OCR) errors. For some magazines, especially those on shiny stock, humble TexlBridge still does the best job. I’ve become acutely sensitive to the difference between run-ofthe-mill faxes and those sent at “fine” or “high resolution.” High-res faxes work very well with OCR programs; standard faxes don’t. (The leading fax programs all offer OCR options—in some cases for an extra cost of $20 to $60. The option lets you convert incoming faxes into text—a process that is also much more successful with high-resolution transmissions.)

Having learned some of its art, I now have such faith in OCR’s powers that every few days I feed a pile of newspaper clippings into the scanner and convert them to text, which I can store in my computer and eventually fish out when I need to, using a powerful indexing program like Agenda or Magellan. Eve convinced myself that the whole cycle of storing and finding information goes much faster this way than if I slapped the clippings into hie folders to be thumbed through (or lost) later on. Now my office is free not only of fax paper but also of the previously omnipresent clippings. And I have more fun.