Hertz Pro Member United States Joined 4514 days ago 47 posts - 63 votes Speaks: English* Studies: German, Spanish, Mandarin Personal Language Map
| Message 9 of 17 12 March 2013 at 8:14pm | IP Logged |
As the OP hinted, an exhaustive analysis would account for the multiple uses of each character in
conjunction with others. Knowing 人 doesn't mean you know 工人 土人 女人, 大人, and so on. You'd need
stats to say: "15% of the time, 人 is used in a way I understand."
Edited by Hertz on 12 March 2013 at 8:17pm
1 person has voted this message useful
|
Mountolive Pro Member United States Joined 4460 days ago 10 posts - 29 votes Speaks: English* Studies: Spanish Personal Language Map
| Message 10 of 17 25 March 2013 at 4:26am | IP Logged |
I still have a copy of T.K. Ann's Cracking the Chinese Puzzles from an attempt to learn Mandarin some years ago which never really got off the ground.
In Chapter 6 of the first volume he discusses the results of private survey of character-usage in newspapers conducted by 20 students over a period of one year. 1,411,088 characters were counted. The results he reports were as follows:
- There were a total of 4,687 different characters used.
- The 50 most common characters made up 27.5% of the 1.4 million characters counted.
- The first 500 most frequent characters accounted for 74.7% of the total.
- The first 2500 most frequent characters accounted for 98.8% of the total.
- Ann claims that a knowledge of 3650 characters will allow a reader to recognize 99% of the content of a Chinese newspaper.
Ann's statistics may be a bit dated (his books were published in 1982), but you might be able to use his results as one data point.
4 persons have voted this message useful
|
OneEye Diglot Senior Member Japan Joined 6851 days ago 518 posts - 784 votes Speaks: English*, Mandarin Studies: Japanese, Taiwanese, German, French
| Message 11 of 17 25 March 2013 at 5:45am | IP Logged |
That sounds great and everything, but it still means that if you know 3650 characters, there will likely be a few in
every article that you don't know.
The list of characters learned by Taiwanese students linked to above is kind of weird. If you read up at the top,
the researchers analyzed reading material, dictionaries, and textbooks used by school students, and then split
the characters up into "levels" based on the frequency of use. Both characters of 蘑菇 show up in the ninth level,
which is weird to be because it's a pretty common word.
I have to say though, there's still an unsettling number of characters I don't know on that list.
The list of 4808 characters from the MOE can be found
here.
I think that if you're aiming for advanced proficiency in Chinese, somewhere in the 4000-6000 range would be
suitable, depending on what you like to read. The more "highbrow," the more characters you'll need. I also have a
feeling that mainland writers tend to use fewer characters than those from Hong Kong or Taiwan, so if you plan
on limiting yourself to simplified material, maybe you can get away with fewer.
Edited by OneEye on 25 March 2013 at 6:31am
1 person has voted this message useful
|
shk00design Triglot Senior Member Canada Joined 4445 days ago 747 posts - 1123 votes Speaks: Cantonese*, English, Mandarin Studies: French
| Message 12 of 17 25 March 2013 at 10:28pm | IP Logged |
This topic came up at least once already. Always interesting to explore.
Unlike English and languages that uses an alphabet, a lot of what you know off our head
come from frequent use. The more a character comes up, the easier it is to recognize. A
lot of times you recognize a character when you see it but can't remember how to write
it on the spot.
Reading a newspaper for instance, you will occasionally come across unfamiliar
characters. Knowing the characters around it in a sentence you can make out what the
unknown character is.
When it comes to writing an E-mail, it is much simpler. You have dictionaries on your
computer to locate the proper characters. All you need to know is the meaning or
pronunciation. Reading a news article online is much the same. You can just Cut & Paste
the character to a computer dictionary for quick look-up.
Edited by shk00design on 25 March 2013 at 10:31pm
1 person has voted this message useful
|
cacue23 Triglot Groupie Canada Joined 4300 days ago 89 posts - 122 votes Speaks: Shanghainese, Mandarin*, English Studies: Cantonese
| Message 13 of 17 14 April 2013 at 5:20am | IP Logged |
egill wrote:
Here is a list of characters that Taiwanese students supposedly learn (separated into
school grades 1-9). There's 5568 total but the first 8 sets (3526 total) would probably
suffice as a starting off point.
List |
|
|
Oops, never thought traditional Chinese character was that hard to recognize (I use the simplified version) until I saw that list...
1 person has voted this message useful
|
gaoyoude1 Diglot Newbie United Kingdom fluentinmandarin.comRegistered users can see my Skype Name Joined 4214 days ago 6 posts - 16 votes Speaks: English*, Mandarin Studies: French, Spanish
| Message 14 of 17 14 May 2013 at 3:24am | IP Logged |
I have learned more than 3000 Chinese characters systematically, and when I read modern
literature or Chinese newspapers, I hardly come across any characters at all that I don't
know, and when I do, they are generally obscure names of trees or fish etc.
I would say that the most common 3000 characters are more than enough unless you want to
get into ancient texts.
2 persons have voted this message useful
|
lorinth Tetraglot Senior Member Belgium Joined 4275 days ago 443 posts - 581 votes Speaks: French*, English, Spanish, Latin Studies: Mandarin, Finnish
| Message 15 of 17 14 May 2013 at 9:28am | IP Logged |
Yet another source of interesting statistics on this subject is
Patrick Zein's site, based on Jun
Da's research on vast corpuses of literary and non-literary writings.
In a nutshell, with 3000 characters, you should recognize 99.2 % of characters used in
contemporary texts (which still leaves about 1 unrecognized character every 4 lines,
maybe 5 or 6 per page). In my experience, you can start reading contemporary prose
(though somewhat laboriously) with 2000 characters, which is supposed to amount to a
97.0 % undestanding level. You will notice that the law of diminishing returns works at
full swing for the upper percentiles.
And, of course, the big caveat is: "recognizing characters" does NOT mean "recognizing
words" or "understanding sentences".
Edited by lorinth on 14 May 2013 at 9:33am
2 persons have voted this message useful
|
lichtrausch Triglot Senior Member United States Joined 5961 days ago 525 posts - 1072 votes Speaks: English*, German, Japanese Studies: Korean, Mandarin
| Message 16 of 17 14 May 2013 at 5:14pm | IP Logged |
Unless I'm reading something with a pop-up dictionary, the most unknown characters I can
tolerate is ≈1 per page. Its just too much of a pain to be looking up characters on top
of looking up unknown words. So I probably won't be cracking open a novel until I'm
around 4000 characters. But at this point I'm making speedy progress so it probably isn't
far off. I actually have little idea how many characters I know since I don't use Anki or
Heisig or what not. If I was forced to guess I'd say somewhere between 3000 and 3500.
It gets very hard to quantify at this level.
1 person has voted this message useful
|