patrickwilken Senior Member Germany radiant-flux.net Joined 4536 days ago 1546 posts - 3200 votes Studies: German
| Message 1 of 14 21 November 2014 at 11:19pm | IP Logged |
I haven't found any good vocabulary size tests for German online so I decided to waste a bit of time yesterday trying to estimate how many words I know. I took a somewhat random set of ten books I own and counted how many words I did not know in the first 1000 words of each book with the following results:
98.7% American Gods by Neil Gaiman
98.6% The Spy who came out of the cold by John Le Carre
98.3% Blicke windwärts by Ian M. Banks
98.0% Das fünfte Zeichen by Jo Nesbø
97.6% Die Vermessung der Welt by Daniel Kehlmann
97.5% Das Paradies ist anderswo by Mario Vargas Llosa
97.4% Bonita Avenue by Peter Buwalda
97.2% Just Kids by Patty Smith
97.1% Solar by Ian McEwan
97.1% Sputnik Sweetheart by Haruki Murakami
Average words known: 97.8%
Assuming that German word frequencies are roughly the same as English word frequencies, Paul Nation's word frequency tables suggest that I am somewhere in the 7000-8000 word range.
I was surprised how consistent the numbers generated were across the books.
While I take with a grain of salt any absolute figure for numbers of words known, I think this is a useful method for testing relative improvements in whatever language you are learning over time. I'll come back to these books in a year or so and retest them again to see where I am.
4 persons have voted this message useful
|
Jeffers Senior Member United Kingdom Joined 4912 days ago 2151 posts - 3960 votes Speaks: English* Studies: Hindi, Ancient Greek, French, Sanskrit, German
| Message 2 of 14 22 November 2014 at 2:01pm | IP Logged |
That is quite interesting. Do you find that you can easily "read with pleasure", and figure out most of the unknown words by context?
I know of one online German vocabulary test: http://www.itt-leipzig.de/static/startseiteeng.html. It only goes up to 5000 words. I'd be curious to see how you do on the test in comparison to your estimate from books. I have to say, a lot of people on HTLAL complained that they knew a word but just didn't understand the definitions given for it. The tests are monolingual, and some of the circumlocutions used are pretty awkward.
2 persons have voted this message useful
|
Ari Heptaglot Senior Member Norway Joined 6585 days ago 2314 posts - 5695 votes Speaks: Swedish*, English, French, Spanish, Portuguese, Mandarin, Cantonese Studies: Czech, Latin, German
| Message 3 of 14 22 November 2014 at 2:15pm | IP Logged |
If you have a paper dictionary, you can just do it by counting the number of known words on a random page. Multiply the proportion of known words with the number of words contained in the dictionary. Do more pages for better accuracy.
This is arguably the only thing paper dictionaries are good for. :)
Edited by Ari on 22 November 2014 at 2:16pm
4 persons have voted this message useful
|
patrickwilken Senior Member Germany radiant-flux.net Joined 4536 days ago 1546 posts - 3200 votes Studies: German
| Message 4 of 14 22 November 2014 at 2:38pm | IP Logged |
Jeffers wrote:
That is quite interesting. Do you find that you can easily "read with pleasure", and figure out most of the unknown words by context?7
|
|
|
I could read any of these books without a dictionary and get most of the text now, which is a little surprising for me as I hadn't realized that my level has creeped up this high. Having done this exercise I feel I could now pretty much read any standard novel in a bookstore.
However, some are still much easier than others. The Gaiman and the Le Carre were a breeze to read. Nesbo (another crime novel) is also pretty easy. I generally find Murakami easy, so I was a little surprised by the lower rating, but I think that comes from a difficult paragraph (1% - is equivalent to to 10 errors). Of course, grammar/style also makes a big difference -- crime novels are just generally written in a much straightforward style than more "literally" works. Murakami's style is very straightforward, which is perhaps why I find it so easy, despite the lower vocabulary estimate.
Keep in mind that while 98.5% vs 97.0% doesn't seem like a lot, it means that there are twice as many unknown words the "97%" book. And I also think that even the "known" words are perhaps more difficult - there are afterall words you really know and some that you know but aren't so comfortable with.
Overall the numbers seemed to reasonably well track the difficulty of the books for me.
I tried this and didn't do so well. I am not sure why there was a discrepancy. Perhaps in part because my vocabulary has largely grown from reading novels, not other sources like newspapers.
But like I said I don't really trust this as an accurate estimate of vocabulary, but I do think it would be a helpful way to keep track of progress. It's the sort of thing I would like to do once a year.
Ari wrote:
If you have a paper dictionary, you can just do it by counting the number of known words on a random page. Multiply the proportion of known words with the number of words contained in the dictionary. Do more pages for better accuracy.
|
|
|
Good point. Though I thought Iversen said you get different estimates depending on the dictionary you use (i.e., dictionaries with more headwords tend to give you lower estimates). Also I am not quite sure how to relate words in dictionaries with the sort of word-groups that Nation uses.
But sure, if you use the same dictionary each time it would be a great way of keeping track of progress too.
I do like actually getting a real estimate of how much of a real book I can read though. There is something much more interesting for me about knowing I can read 95% or 98% or 99% of a novel than knowing I know 8000 words, but that's just a personal preference obviously.
Edited by patrickwilken on 22 November 2014 at 3:13pm
1 person has voted this message useful
|
fiolmattias Triglot Groupie Sweden geocities.com/fiolmaRegistered users can see my Skype Name Joined 6692 days ago 62 posts - 129 votes Speaks: Swedish*, English, Arabic (Written)
| Message 5 of 14 22 November 2014 at 2:46pm | IP Logged |
While I found this both interesting and fascinating, there is a little thing that is
uncertain here; Paul Notion discusses books at a certain vocabulary level, and for you
to compare you also need to know how the vocabulary of these books is compared to
other novels in German. If I take 10 kinder books in German I might understand some
98% as well, but that does not make my German vocabulary size in the 7000-8000 word
range :)
Besides the problem you already mentioned about average vocabulary size in books
between German and English, of course.
Nevertheless a very interesting post!
2 persons have voted this message useful
|
patrickwilken Senior Member Germany radiant-flux.net Joined 4536 days ago 1546 posts - 3200 votes Studies: German
| Message 6 of 14 22 November 2014 at 3:07pm | IP Logged |
fiolmattias wrote:
While I found this both interesting and fascinating, there is a little thing that is
uncertain here; Paul Notion discusses books at a certain vocabulary level, and for you
to compare you also need to know how the vocabulary of these books is compared to
other novels in German. If I take 10 kinder books in German I might understand some
98% as well, but that does not make my German vocabulary size in the 7000-8000 word
range :)
|
|
|
The figures I was using were for adult novels. The figures Nation shows are pretty consistent across the adult novels, but they are old Gutenburg books. So there are probably some problems relating them to the books I am using.
This sort of estimate is obviously affected by the sorts of books you read too. If you are a heavy scifi fan you might find yourself understanding more that genre than others.
I just tried to pick a reasonable range of adult books that I hadn't read to get an estimate. I was pretty surprised how consistent the estimate was to be honest.
fiolmattias wrote:
Besides the problem you already mentioned about average vocabulary size in books
between German and English, of course. |
|
|
I vaguely remember seeing a graph years ago that compared English and German word frequencies and the distributions seemed very similar, but I can't find it now.
Overall I think this method is best for estimating relative, not absolute, vocabularies. And it is obviously bound by the books you sample, but that can be a plus. I can say with some confidence now that of the books I am likely to want to read (any standard novel in a bookstore) that I could understand somewhere in the range 97%-98.5% of the vocabulary, which is quite useful for me.
I think despite all the uncertainties around the vocabulary estimates, that indicates that my vocabulary is somewhere in the 7000-8000 range, because: (1) if my range was much higher I would be getting closer to 99% in standard novels; (2) I can now follow movies with a very high understanding, which in English is around 6000-7000 words; (3) and if it was much lower I would be getting closer to 95%.
Edited by patrickwilken on 22 November 2014 at 3:15pm
2 persons have voted this message useful
|
Ari Heptaglot Senior Member Norway Joined 6585 days ago 2314 posts - 5695 votes Speaks: Swedish*, English, French, Spanish, Portuguese, Mandarin, Cantonese Studies: Czech, Latin, German
| Message 7 of 14 22 November 2014 at 4:42pm | IP Logged |
Sounds like you ought to pop that German up to Basic Fluency in your profile!
Basic Fluency - you understand at least 80% of a regular newspaper in your target language and can hold regular conversations about any topic, understanding what people say and getting your point across.
2 persons have voted this message useful
|
agta Diglot Groupie Poland Joined 5527 days ago 43 posts - 53 votes Speaks: Polish*, English Studies: German, Italian
| Message 8 of 14 22 November 2014 at 6:53pm | IP Logged |
This is very interesting idea and I think since now I will check percentage for every novel I'll be reading. Not for any particular purpose but just out of curiosity.
2 persons have voted this message useful
|