Emoji: A Big 🤬ing Mistake?

Unicode’s main principles are to offer a compatible encoding for every language in the world. From their website, the point is for Unicode to be:

  • universal (addressing the needs of world languages)
  • uniform (fixed-width codes for efficient access), and
  • unique (bit sequence has only one interpretation into character codes)

In essence, the goal is to define a codespace which allows for any and all written human language to be represented in a form understood by computers. Here I argue that emoji is a mistake, and not in line with these principles.

🔬 Out of scope

Unicode defines a character as “The smallest component of written language that has semantic value”. In other words, the smallest unit that humans have written down as part of writing conveys some meaning. Humans do not write down emoji. And if a human does doodle in their writing we would probably say they drew it, not wrote it.

In order for a proposal for a new codepoint assignment to be accepted there are criteria that first have to be met. It must be a single useful character that represents a single unit of human writing which can’t already be expressed in some other way.

Many emoji proposals include searches for the word associated with the proposed glyph, but no evidence of the proposed glyph itself being used in a published text or human language. This is pointless, as you could get millions of search results for literally any one word or short phrase. Some justify proposals with random requests made on social media along the lines of a screenshot of a tweet saying “wish there were a <name> emoji”, which doesn’t seem like a logical reason for inclusion. If this is all it takes, then we’ll have an emoji for every single noun and concept. These proposals often fail to offer any evidence that proposed emoji were ever included in running text.

I think that many of the proposals, which include images, are starting to blur the line of where the art of written language ends and where the art of visual art begins. We could create emoji for a great many visual concepts, allow people to reference famous paintings, imaginary creatures, but that just doesn’t seem like what character encoding should be for.

There have been instances where even established characters from written language have been rejected due to the proposals not accurately capturing their description. For example the Chinese character for biangbiang noodles 𰻞, which had to go through multiple stages before it was approved. And this is just an old character presented in the confines of the Chinese writing system. How on earth are Unicode to standardise images to accurately repesent visual concepts for things that can look wildly different to different people or cultures? For example, there is no standard “look” for an artist in real life, and yet Unicode have decided that this comprises a person with the paint palette emoji 👩‍🎨. Are sculptors not artists too? I know that this can be left up to font vendors to resolve, but the approach taken to define visual concepts strikes me as arrogant.

🚫 Control over Language

Unicode’s purpose is to ensure interoperability of existing language with computers. By starting to create and design new symbols for writing it is going far beyond this initial goal, and it is starting to create its own written language.

Given the level of distrust people have with technology companies, I would argue we should not be allowing them to decide the direction our written language takes. Unicode is an authoritative source on encoding and so has a lot of power, and I argue by starting to create new symbols it is abusing the power and trust it carries.

Imagine if a dictionary publisher accepted proposals for words from the public, rather than researching those in active use. Should dictionary makers be allowed to just invent new words? I would argue that goes in the opposite direction that language standards bodies should be operating in. At at a time when there are still unmapped languages that people do use, it strikes me as a bit of an odd thing to do.

I also wonder how long does it take a writing system to become standard enough to codify. Emoji are only a couple decades old at most, the most recent additions less than a year old. It remains to be seen whether they will last long or simply fade out of use as ephemera. Many writing systems went though a lot of changes before they became the standard forms they exist in today. Because emoji are created only in computer generated writings, by codifying how emoji should work so early have we pre-empted the final form and blocked it from evolving further?

In addition to the conceptual issues with emoji, I also have concerns with the display of emoji fonts. Many operating systems will only display the vendor’s font stack by default and if users wish to use their own emoji font they have to jump through hoops, if it is even possible to achieve this. And forget about conveying the same meaning in an emoji from one device to another when the fonts can differ so much.

🤷‍♀️ Who even uses emoji?

According to a publication by the Unicode emoji sub committee themselves, a small portion of emoji make up the overall use. Even the manufacturers of fonts are struggling to keep up with the demands of ever more emoji. To the point that the committee decided to:

  1. Craft a strategy to be more focused on what is useful
  2. Reduce the number of emoji we encode per year
  3. Slightly modify our process in support of these goals

It is also worth noting that in many of the places where people conceive emoji to be used, they are not actually being used. Many websites I come across still replace emoji with image files rather than display the text, likely for compatibility reasons (even this post if you’re visiting the main WordPress page). But if we can get by with using images, why bother encoding the images into Unicode in the first place?

And even within mobile chat apps, the place where standard use of emoji is most likely to be well supported, it’s not essential. Device vendors have custom implementations of emojifiers which are totally non standard and can be transferred entirely by images, like Apple’s memoji or Google’s emoji kitchen. Many mobile keyboards and IMEs offer quick access to embed reaction gifs and I don’t see any proposals to start encoding gifs into Unicode.

A lot of social platforms like slack, twitch, discord, signal even let users create icon and sticker packs they can upload and use as easily as if they were emoji. Almost every online communication channel bar SMS supports embedding of image files, which means there is largely no compatibility issue to worry about anymore.

This does leave a wide open question. How many, and which, emoji are rarely used? As noted above, Unicode’s own publication suggest many emoji are used incredibly rarely compared to others. In preparing this post I found many rankings of the most popular emoji, but hardly any of the most rarely used. This remains an open question and would be worth researching. It might help indicate for future proposals what themes and attributes lead to a well used, and therefore worthwhile, character.

🌅 Could we do better?

I am not against the idea of including emotional images as part of text. I use emoji and plenty of other small icons all the time. I just don’t think standardised emoji are the best way to do it.

Aside from the amusing emoji keyboard, there isn’t really an easy way to write emoji through computers. There are some mobile IMEs starting to offer search by drawing, but then I question why we wouldn’t just draw on screen and send that as part of the message directly. If the point is to make communication more intimate, wouldn’t the ability to easily make our own symbols as part of handwriting be more helpful? Custom icons as supported by many social apps are much more personal.

My most common input method if I need an icon is to spell it out. Using a compose sequence allows easy access to icons such as . For me that’s the most natural way to access symbols when I type as an English writer. It’s also the most frequent way I locate emoji if I ever use them. Either I type and the IME suggests a symbol for me, or I explicitly request a symbol by using colons. In many apps :symbolname: is how I use symbols. On twitch you literally just type the name of the icon and no more. So really, I’m just typing the word, not the symbol. The symbol isn’t part of the language, just how it gets displayed.

Which raises an interesting thought experiment. If we do want to display arbitrary terms as images, why are we waiting for Unicode to define a symbol for every single word? As it becomes unworkable within Unicode, why not bypass it entirely? Unicode already recognises some issues with defining visual appearance in things like flags. With flags, Unicode broadly defines how they are built and offers a codespace of country identifiers that can be used instead. We could just do the same for any concept. A base blank emoji character indicating the author wants to display something visually, joined to a textual phrase that can be transformed into a visual on the fly by a system, if the user wants to, and still allowing fallback to text for visually impaired users. Why wait for Unicode to spend a year deciding whether they want to assign a precious code point to “virus” when an author could write <Emoji>+ZWJ+”Virus” and have it magically transform into a virus with spike proteins by pulling from an open art library. Look! clip-art for the modern age!

Ultimately, I argue that emoji, as they exist right now, are a mistake.

  • They don’t actually represent organic written language
  • They are an inefficient way of encoding information
  • They cannot possibly fully encode the visual concepts they represent

Beyond the icons which are frequently used it is a waste of time and resources to be mapping these symbols onto Unicode. Emoji standardisation is ultimately unnecessary for embedding inline images into text. And we could be innovating and championing much more creative ways with embedding visuals into our written language.


Thanks for reading if you got this far. You might ask – Is this serious? Partly I’m just making fun of emoji because it’s fun. But in another way, I am serious. I hate wasted resources, projects that spiral out of scope scare me, and the effect of poorly designed technology on people bugs me immensely. I note that I am coming from a purely computing background, so there are perhaps some issues of language that I am missing entirely. Maybe the way emoji are built right now is totally fine. Please sound off in the comments to let me know what you think (emoji welcome). 💖

Join the Conversation

  1. @lonm "I also wonder how long does it take a writing system to become standard enough to codify."

    The deseret alphabet was used between 1854 and 1869 and is encoded (U+10400) in Unicode. So there is precedent saying that 15 years is enough…

    1. This is really interesting, thanks. If I understand right, that is a constructed writing system . 15 years doesn’t seem to have given it much chance to grow, but for a time it was used in a formal structured way.

Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.