Have you ever encountered squares (□) or tofu-like blocks (⯐) when working with Chinese text in your designs or documents? These mysterious boxes, often called “tofu blocks” in East Asian typography, appear when your computer can’t display certain Chinese characters. They not only disrupt the visual flow but can make content completely unreadable – imagine crucial words in your business document suddenly turning into squares!
These missing characters are a common frustration for designers and professionals working with Chinese text, especially when using fonts that weren’t specifically designed for Chinese language support. While it might be tempting to blame your input method or text editor, the real story is more complex.

There are many causes of missing characters, but two main ones: 1) character encoding issues; 2) font support coverage. Input methods are more related to the former, but this rarely occurs in most situations nowadays. The latter is the most common cause of missing characters and will be the focus of this article – teaching you how to identify and address such issues.
While these two problems might seem unrelated at first glance, they share the same underlying cause: character sets.
What is a Character Set?
A character set is like a dictionary that tells users which characters exist. Just as humans need dictionaries to recognize characters, computers need character sets as a foundation to look up and identify text. Imagine giving a passage in Thai to someone who only has a Korean dictionary (and doesn’t know Thai) – they definitely wouldn’t be able to read it. The same applies to computers – if you ask a computer to display characters it can’t find in its reference set, missing characters will naturally appear.
Since we have dictionaries, why would there still be missing characters when writing Chinese characters for computers that use Chinese dictionaries?
That’s because there are just SO MANY Chinese characters! Due to character creation methods, writing conventions, historical relationships, and other factors, Chinese characters include not only basic characters but also various variant forms. For example, “峰” can have “山” on the left or top, “旭” can have “日” on the right or bottom, and so on. Including these variants, the number of Chinese characters becomes nearly uncountable. Taiwan’s Ministry of Education’s latest 2017 edition of the “Dictionary of Chinese Character Variants” contains a staggering 106,330 characters!
With such astronomical numbers, including everything in one dictionary isn’t economically practical. In reality, we don’t actually need to use all hundred thousand plus characters in daily life, so dictionaries typically have their own standards and selections. For instance, the “Mandarin Daily News Dictionary” designed for school kids only includes 9,238 characters, while the Ministry of Education’s “Standard Form of National Characters” plus “Standard Form of Less-Frequently Used National Characters” together contain 11,149 characters.
Of course, computers don’t use physical dictionaries but digital character sets. Like physical dictionaries, digital character sets need to consider requirements and make choices about which characters to include and how many. Adding variant characters and regional differences, there are dozens of character sets just for Chinese characters to reference.
Given how time-consuming and labor-intensive it is just to compile character sets, expecting every font to achieve the “Dictionary of Chinese Character Variants” level of over 100,000 characters is unrealistic. Therefore, when designing fonts, creators reference these curated character sets to produce corresponding glyphs. If the font you’re using references a character set that doesn’t include the characters you need, missing characters become inevitable.
Common Digital Character Sets
Traditional Chinese
Big5 – The De Facto Industry Standard
Big5 is considered the industry standard for Traditional Chinese fonts. In the 1980s, private enterprises, facing urgent usage needs, collaborated with Taiwan’s Institute for Information Industry to create it somewhat hastily. Due to limited alternatives at the time and adoption by major software companies, it became the tacitly accepted industry standard for the Traditional Chinese market.
However, Big5 was primarily based on the Ministry of Education’s Standard and Less-Frequently Used Character Tables, thus lacking many variant characters, Hong Kong common-use characters, and local language characters. Beyond reference limitations, its hasty compilation and age make Big5 less than ideal.
Although the latest 2003 version includes 13,060 characters, which might seem substantial, it remains inadequate for contemporary needs. Therefore, many font foundries independently expand their character counts. For example, most justfont products extend beyond Big5’s baseline to include over 14,000 characters.
CNS11643 (National Standard) – Comprehensive but Overly Extensive
Also originating in the 1980s, CNS11643 was led by the Executive Yuan. Though initially unable to become the standard due to insufficient character coverage, it has continuously updated over decades and now includes nearly 109,000 characters (including symbols). Beyond characters, CNS11643 also provides Ming, Song, and Kai style font files.
While CNS11643 rarely encounters missing character issues, its scale of over 100,000 characters is too massive. Except for fonts supporting household registration needs, few fonts can meet this standard, making it rare to find CNS11643-compliant options in consumer font choices.

HKSCS (Hong Kong Supplementary Character Set)
Although Big5 became the industry standard by circumstance, its character selection primarily referenced Taiwan’s Ministry of Education’s Standard Character Tables, failing to include many commonly used Hong Kong characters. Therefore, the Hong Kong government supplemented Big5 with local common-use characters, including street names and Cantonese characters, creating HKSCS. If you need Hong Kong common-use characters, remember to verify if your purchased font supports HKSCS.

Japanese
JIS (Japanese Industrial Standards) – Japan’s National Standard
Established by the Japanese Industrial Standards Committee, this is Japan’s national-level standard. JIS is divided into four levels: Level 1 includes the most commonly used kanji in Japanese daily life and names; Level 2 adds less common place names, personal names, and special-use kanji; Level 3 is compiled for professional purposes, including more specialized field characters and symbols, such as medical terms; Level 4 is even more specialized and rare than Level 3.
Some independent designers’ Japanese fonts reference JIS specifications. However, most free Japanese fonts only support JIS Levels 1 or 2, making them prone to missing characters in Traditional Chinese contexts. Please verify support coverage before downloading.

Adobe-Japan1 – Adopted by Most Japanese Fonts
Promoted by design industry giant Adobe, this has become the standard for the Japanese font industry, with most computer-bundled Japanese fonts complying. Adobe-Japan1-0 (AJ0) was released in 1993, and has now been updated to Adobe-Japan1-7 (AJ7). The market mainly circulates AJ3 to AJ6 (AJ7 only differs in “Reiwa” composite characters). AJ3 supports JIS Levels 1 and 2, making it a very useful basic character set for Japanese contexts. AJ4 was compiled for commercial printing, adding many name characters and business symbols, making it more professional than AJ3. AJ5 and AJ6 added support for JIS Levels 3 and 4.
However, AJ3 and below lack basic characters like “你” (you/your), while characters like “哪” (which), “呢” (a question particle), “嗎” (a question marker) are only included in AJ5 and above, making missing characters likely in Traditional Chinese contexts. These characters are particularly problematic because they represent some of the most frequently used characters in Chinese – “你” is one of the most common pronouns, while “哪”, “呢”, and “嗎” appear constantly in everyday writing and conversation. When such high-frequency characters are missing, it severely impacts text readability and usability, even if the overall character count of the font seems substantial. You can use the Japanese Font Missing Character Check Tool by But from Zi-Hi to check issues and find solutions.

Simplified Chinese
GB (Guobiao) – China’s National Standard
China’s national-level Simplified Chinese character set takes its name from the pinyin initials of “Guobiao” (National Standard). Initially released as GB2312, it can support most use cases. However, as times changed and usage habits evolved, plus many rare characters, special characters, and name characters weren’t included, China expanded the character set to GB18030.
While GB includes some traditional characters, it’s primarily designed for simplified characters, so missing characters in Traditional Chinese contexts are predictable. Additionally, although GB doesn’t directly regulate writing styles, fonts supporting GB typically follow China’s “Table of General Standard Chinese Characters” guidelines, resulting in different stroke patterns from Traditional Chinese conventions.

How to Check Character Set Support
As font users, we don’t necessarily need to know exactly how many or which characters a character set includes. However, understanding which character set standards a font references before purchasing or downloading can effectively prevent situations where you discover it doesn’t meet your needs after using it.
There are many ways to judge, the most straightforward being the font foundry’s official description. Most font foundries will note supported character sets in their product descriptions for reference. Traditional Chinese foundries noting Big5 support typically add extra common-use characters.

Japanese major font foundries often add suffixes to product names indicating character set support range. AJ3 uses the suffix Std, AJ4 uses Pro, AJ5 uses Pro5, and so on. This is particularly helpful when you already have font files (such as system fonts).

Of course, you can also use the test sentence provided at the beginning of this article to quickly judge a font’s support range in the trial area:
Independent designers’ works or products not supporting common character sets might instead indicate supported character counts. After all, these common character sets often contain tens of thousands of characters, challenging even for major font foundries, let alone independent designers.
When character set standards aren’t listed, if aiming for general use without missing characters, it’s recommended to have at least 7,000 characters, and note whether Zhuyin (Bopomofo) and alphanumeric characters are supported.
We’ve introduced several common character sets and how to test font support range. There are also many other character set references, such as “Farewell to Tofu Characters” which compiles Taiwan’s local language characters (including romanization) hoping to bid farewell to tofu-block missing characters whenever possible.
However, due to the unique nature of Chinese characters, having a font completely without missing characters is almost impossible. As consumers, what we can do is understand our character usage needs and font support ranges well, and pay appropriate fees to support designers in continuing to create more fonts that meet our needs.
This article is translated from 下載字型前先看|為什麼會缺字? with the help of Claude.