Last March, before going to Beijing, I thought I would try to learn a few useful characters and sentences. As it happened, I did not have enough time to really learn anything useful before the trip, but I have been fascinated by the language ever since, and I have continued studying. I hope this is not a metaphor for the relationship between theory and practice in computing.

After having lived in the US for almost ten years, the way I pronounce interesting, pseudorandom, and other long words is still the butt of jokes, so I am under no illusion of ever speaking understandable Mandarin. I would like, however, to make some progress on reading and writing Chinese and on understanding Mandarin as spoken by a Beijinger or a Taiwanese.

(If you speak Chinese, either stop reading here, or by continuing reading, you pledge not to make fun of my neophyte enthusiasm.)

An educated Chinese speaker knows at least 5,000 characters, and a basic level of literacy corresponds to about 2,000 characters. I hope to eventually learn the 1,067 characters in the main part of this book. There is a method to the madness of so many characters. There are about 200 basic components, called radicals, of which all characters are made of. In the simplest cases, the radicals combine to give the meaning: for example the character 好(hao) is a combination of the radicals 女, “woman,” and 子, “child,” and it means “to love,” “to be good,” and also “good” as an adjective, or 安(an) is a combination of the radicals for “roof” and for “woman,” and it means “peace.” (There is peace if there is a woman in the house.) In other cases, one combines a similarly pronounced character, which suggests the pronounciation, with a radical that suggests the meaning. For example 客 (ke) means “guest” and contains the radicals for “roof,” “to follow” and “mouth.” The explanation is that if combines “roof,” which suggests the meaning, with the character 各 (ge) which suggests the pronounciation. Why 各 (ge), which means “each,” is made of “to follow” and “mouth,” I have no idea.

Knowing many characters is not, however, enough to have a good vocabulary. Many words, in fact, are composed of two (sometimes three) characters. Sometimes, the combination makes perfect sense. For example, 电 (dian) means “electricity,” 视 (shi) means “to look at” and 机 (ji) means “machine,” hence 电视机 (dianshiji) “television.” Or consider that 避 (bi) means “to avoid,” 孕 (yun) means “(to be) pregnant” and 套 (tao) means “case” (as in pillowcase), hence 避孕套 (biyuntao). Other combinations are strange, for example 太 (tai) means “too” (as in “excessively”), but 太太 (taitai) means “wife,” or 东 (dong) means “East,” 西 (xi) means “West” and 东西 (dongxi) means “something.”

Anyways, now that I have learnt a little bit of the language, I thought I would go back to some pictures of signs that I had taken in China and see if I could reconstruct what they meant.

So here is one sign:

I start by looking up the characters in a dictionary, but how do you look up a character in a dictionary? There is a shortcut if you know the pronounciation, but what about a character you know nothing about? We said each character is made of a set of radicals, and one radical is considered the “main” radical for the character. I don’t quite understand how you recognize it, but at worst one can do trial and error. Another fact is that by looking at a character it is typically possible to reconstruct how it is supposed to be drawn, and how many strokes it takes to draw it. With this information (main radical and total number of strokes) you go to the dictionary, which has an index of radicals, and then, for each radical, all characters that have it as a main radical, ordered by number of strokes, and you find your character. It is interesting that the way we look up a word in a dictionary for an alphabetic language is essentially binary search; here, instead, we have more of a hash function that maps a character to the pair (radical,strokes), and collisions are handled by linear search.

Back to the picture. We have the characters

雷 (lei) 雨 (yu) 天 (tian) 气 (qi) 禁 (jin) 打 (da) 手 (shou) 机 (ji)

Where 雷 (lei) means “thunder” and 雨 (yu) means “rain,” so together they are “thunderstorm.” Then we have 天 (tian), which means “heaven” or “day,” and, in this case, “sky” and 气 (qi) which means “breath,” “energy” or “soul.” Is it heavenly spirit? No, 天气 (tianqi) means “weather,” and it’s a two-character word. So the first part is sort of “thunderstorm weather.” Then 禁 (jin) means “to forbid.” 打 (da) means “to hit,” and sometimes it means “to play,” as in playing a musical instrument or, more generally, operating a machine, especially one that produces sound. 手 (shou) means “hand” and (remember the TV) 机 (ji) means machine. The “hand machine” 手机 (shouji) is a cell phone. So

It is forbidden to use cell phones during a thunderstorm

Indeed:

(If you can’t see the characters in this entry, and you are using Windows XP, go to start->control panel->regional options->regional options->languages and check the “Install support for East Asian Languages” box. It just takes a few seconds.)

About these ads