Have a friend who loves typography or emoji, but hates material gifts of any kind?
Now there’s a present for them.
Starting this week, the Unicode Consortium will let donors “adopt” any one of the more than 120,000 characters contained in Unicode, the international computer-text standard. The proceeds will support the consortium’s efforts to represent more languages on screen.
The letter A, the interrobang, the poop emoji: All can now be symbolically yours for as little as $100. Higher sponsorship levels are available at $1,000 and $5,000. For their trouble, donors will receive a certificate, their name on the sponsor webpage, and the satisfaction that comes with being eternally linked to the world’s most important computer-text standard.
“Since we’ve started, our goal has been to enable all the languages of the world on computers,” said Mark Davis, the co-founder and president of the Unicode Consortium. While Unicode long ago tackled the most prominent languages, this new campaign is meant to provide funding “to attack the less prominent languages and less prominent characters of the world,” he said.
The 24-year-old Unicode standard currently encompasses glyphs from about 130 different writing systems, or scripts. Around 30 of these scripts cover the most common languages, like English, Chinese, Arabic, and Greek. (Davis himself led the development of the bi-directional algorithm, which permits Arabic and Hebrew to appear correctly on screen.) Another 30 cover lesser-used languages, like Cherokee or Syriac. And 70 scripts are included for scholarly or historical purposes, including Linear B and ancient Egyptian hieroglyphs.
Yet not every language has fully made it into Unicode yet. Only in the past few years has Unicode started the work of coding the Javanese script, used by up to 98 million people; Pahawh Hmong, used by perhaps 2.7 million; and the N’ko alphabet, which is important throughout West Africa. Davis estimated 100 scripts still have to be entered into the standard, an expensive process that requires travel, expert consultation, and days of programming work.
“In order to support these lower-use languages, we really needed to have a supplemental form of funding,” he said. “A couple years ago, we came up with the idea for something like an adopt-a-highway program, but it was adopt-a-character.” This is the Unicode Consortium’s first-ever fundraiser.
These language exclusions can be jarring for native speakers, and politically controversial. Last year, the New York-based developer Aditya Mukerjee wrote that it was still impossible for him to correctly write his name in Unicode’s encoding of Bengali, whose alphabet is the sixth most-used writing system in the world.
“My name is not only a common Indian name, but one of the top 1,000 names in the United States as well. But the final letter has still not been given its own Unicode character, so I have to use a substitute,” said Mukerjee.
When it adds a language to the standard, Unicode also adds “language data.” These are metadata points that a computer needs to know to use a language in its operating system: It describes, for instance, what countries are called and how the language idiomatically represents times like “3 o’clock in the afternoon.” But Davis said this was much less controversial than encoding scripts and alphabets.
“We’ve been running the language-data side for something like a decade, and we’ve never actually had to have a vote in that committee. We’ve been able to hold everything by consensus, which is unusual,” he told me. “On the character-encoding side, we probably have votes every meeting.”
Every character that’s already made it in to the Unicode standard can be adopted an unlimited number of times, but there can only be one “gold” adoption—that is, at the $5,000 level—per glyph.