Register  Login  Active Topics  Maps  

How many words do you need to learn?

 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
64 messages over 8 pages: 1 2 3 4 5 6 7 8 Next >>
luke
Diglot
Senior Member
United States
Joined 4583 days ago

3138 posts - 1257 votes 
Speaks: English*, Spanish
Studies: Esperanto, French

 
 Message 1 of 64
24 March 2005 at 5:05am | IP Logged 
Quote:
A major New York newspaper established they used only 600 words on average in their newspaper daily.   
    
    
I heard that, and other sources saying you only need a relatively small number of words (depending on the source, 1500, 1000, 600, 500, even 100) to communicate. It doesn't ring true to me.    
    
Michel's citation of a study saying the NY Times only   used 600 words has to be wrong in some way. Maybe it's 6000. (I know the CD says 600).    
    
I did this little experiment. I went to usatoday.com (since no subscription is necessary), and grabbed the first headline article, and counted the words in the article. There were 857 distinct words in that one article.    
    
Methodology.      
Copy/paste article into a file.      
Convert spaces to newlines.
Convert to lower case.      
Remove punctuation, other than apostropes.    
Remove numbers.    
Do a unique sort on the file.    
Count the words.    
For posterity, this was the headline article.    
   http://www.usatoday.com/news/health/2005-0 3-23-cover-compoun    ding_x.htm
    
Unless the 600 word study did something like exclude all proper nouns, conjugations of verbs, plurals, numbers, compound words, abbreviations, prefix and suffix word variations (words starting with "in", "un", "dis", "anti", etc, or ending in "ly", "ally", "ation", etc, it's impossible the Sunday NY Times (a big fat newspaper) had only 600 unique words.    
    
I checked "20,000 words in Spanish in 20 minutes" out   of the library. It's one of those books that goes the other direction, saying you can have a huge vocabulary by understanding cognate suffix patterns. The truth about the 20,000 words book is you may not know some of the cognates in your native tongue because they are highly scientific or obscure. That said, knowing the   magic of cognate suffixes is helpful. When you understand prefixes and suffixes in English, your vocabulary goes up like 10 times.      
    
I guess it's important to define "what is a word?". If you know 15 moods/tenses for a verb for 6 types of people, (I, you, we, y'all, polite and informal), is that 1 word, or 15 * 6 = 90 words? If you know masculine/feminine/plural of an adjective, is that 1 word or 4? There's no question that if you know a lot of variations of a word, you can express ideas more precisely than someone who doesn't know any variations
of the same word.    
    
Obviously a lot depends on which words you know. If you know the most commonly used words, they are more useful than knowing obscure words. What's obscure and what's not depends on where you use them. The word "hill" may be unusual in a medical book, but it's a very common word. "mandibula" may appear several times in a college nursing textbook, and "music" may never appear.    

The word "thorax" is known by a typical 4th grader. Ask someone in they 40s when they last heard the word thorax, and they may say, "in grade school". They know the word though.

Edited by luke on 24 July 2006 at 7:32pm

6 persons have voted this message useful



administrator
Hexaglot
Forum Admin
Switzerland
FXcuisine.com
Joined 4754 days ago

3094 posts - 508 votes 
12 sounds
Speaks: French*, EnglishC2, German, Italian, Spanish, Russian
Personal Language Map

 
 Message 2 of 64
24 March 2005 at 7:04am | IP Logged 
I have a file of lexemes in Russian sorted by frequency. Lexemes are 'unique' words, that is for instance 'to be' instead of counting 'is' 'was' 'are' are a different word each time.

With the files I was able to create a graph of frequency versus rank:



This is a very basic lexicographic analysis and only reproduces what you can find, I am sure, in many academic articles.

The result is that:
the    75 most common words make up 40% of occurences
the   200 most common words make up 50% of occurences
the   524 most common words make up 60% of occurences
the 1257 most common words make up 70% of occurences
the 2925 most common words make up 80% of occurences
the 7444 most common words make up 90% of occurences
the 13374 most common words make up 95% of occurences
the 25508 most common words make up 99% of occurences

This shows clearly that vocabulary frequency follows both the law of Pareto (80% of occurences by only 20% of words) and the law of diminishing returns.

So yes you can probably read any text with only 3000 or 5000 words, but you will always miss some key words. You can't really say that all you need is 3000 words although this certainly gets you to a more or less autonomous stage in your learning, from which you can learn many words by their context.

I hope this helps!

Edited by administrator on 24 March 2005 at 8:31am

8 persons have voted this message useful



Eric
Senior Member
Australia
Joined 4606 days ago

102 posts - 6 votes
Speaks: English*
Studies: Spanish, French

 
 Message 3 of 64
24 March 2005 at 7:30am | IP Logged 
administrator wrote:
The result is that the 75 most common words make up 40% of occurences.


That's amazing Francois, truly it is.

If a mere 75 words can have such a high percentage of occurance, then you can only imagine if you had 600 how you could get by in most situations that aren't too specialized.

Luke some interesting stuff there, unfortunately I am a language pleb and can't answer.

Edited by Eric on 24 March 2005 at 7:37am



ElComadreja
Senior Member
Philippines
bibletranslatio
Joined 4616 days ago

683 posts - 80 votes 
2 sounds
Speaks: English*
Studies: Spanish, Portuguese, Ancient Greek, Biblical Hebrew, Cebuano, French, Tagalog

 
 Message 4 of 64
24 March 2005 at 11:42am | IP Logged 
Not exactly sure how the russian grammar works, but those 75 words have got to be alot of the grammatical words. and, the, but, an, etc.
There's a joke about learning biblical greek that goes something like "When you learn the word for 'and', you can read the majority of the bible."

The fact that I have an (I guess) above average English vocabulary has been quite helpful. Like, I know that "odious" means hatred, so when I see something like the Spanish "odiar" it's not that big a deal. Oh, but dare I dive off into a language without a large influx of latin?

Edited by ElComadreja on 24 March 2005 at 12:05pm



luke
Diglot
Senior Member
United States
Joined 4583 days ago

3138 posts - 1257 votes 
Speaks: English*, Spanish
Studies: Esperanto, French

 
 Message 5 of 64
24 March 2005 at 5:00pm | IP Logged 
administrator wrote:
Lexemes are 'unique' words. I hope this helps!
       

You are awesome! I was unaware of the linguistic term lexemes.       
       
I searched the fine web and found a paper by Mark Davies of Brigham Young University http://www.lingref.com/cpp/hls/7/paper1091.pdf which is about this topic for Spanish in particular.
The paper does some comparisons to earlier studies on English and German too.   
       
Interestingly, the paper distinguishes between fiction, non-fiction, and oral vocabularies. Oral vocabularies are somewhat smaller than written. It suggests a vocabulary of the 4000 most popular word forms would   cover 90% of Spanish speech, but you need the 8000 most popular word forms to cover 90% of written texts. He used a very broad source for his sample.
      
The paper also discusses frequency by word type, i.e.   noun, verb, adjective, adverb, modifier, preposition,   conjunction. The magic mix would be about 64% nouns, 24% adjectives, 6% verbs, 5% prepositions, conjunctions and modifiers, and 1% adverbs for spoken Spanish.     
   
He also does analysis of the percentage of nouns, verbs, adjectives etc make up the most frequent lexemes. One would know 30% of all the adverbs, but only 10% of all the verbs to understand 90% of spoken Spanish.     
     
Some words are very popular in speech but not popular   
in non-fiction (gustar). Others are popular in non-fiction and not speech (denominar).   
   
Mark Davies will publish a book in the summer of 2005   called "The Routledge Frequency Dictionary of Spanish" with a thorough analysis and the top 6000 words. It looks like it will be a great contribution. The paper is quite interesting.

Edited by luke on 30 August 2006 at 8:30pm

3 persons have voted this message useful



ProfArguelles
Moderator
United States
foreignlanguageexper
Joined 4634 days ago

610 posts - 1520 votes 

 
 Message 6 of 64
24 March 2005 at 7:46pm | IP Logged 
The maddening thing about these numbers and statistics is that they are impossible to pin down precisely and thus they vary from source to source. The rounded numbers that I use to explain this to my students I usually write in a bull's eye target on the whiteboard, but I don't have the computer skills to draw circles in this post, so I will just have to give a list:

250 words constitute the essential core of a language, those without which you cannot construct any sentence.
750 words constitute those that are used every single day by every person who speaks the language.
2500 words constitute those that should enable you to express everything you could possibly want to say, albeit often by awkward circumlocutions.
5000 words constitute the active vocabulary of native speakers without higher education.
10,000 words constitute the active vocabulary of native speakers with higher education.
20,000 words constitute what you need to recognize passively in order to read, understand, and enjoy a work of literature such as a novel by a notable author.
16 persons have voted this message useful



heartburn
Senior Member
United States
Joined 4585 days ago

355 posts - 22 votes
Speaks: English*
Studies: Spanish

 
 Message 7 of 64
24 March 2005 at 8:13pm | IP Logged 
Does anyone have, or know where I can get a lemmatized Spanish word frequency list? I don't want to wait 'til summer.

Edited by heartburn on 24 March 2005 at 8:14pm



heartburn
Senior Member
United States
Joined 4585 days ago

355 posts - 22 votes
Speaks: English*
Studies: Spanish

 
 Message 8 of 64
25 March 2005 at 12:30am | IP Logged 
Ok. I spent too long looking, but I came up with some out-of-print titles that look very interesting. They all seem to be available at AbeBooks.

An English, French, German, Spanish Word Frequency Dictionary: A correlation of the first six thousand words in four single language semantic frequency lists
by Eaton, Helen S.

Spanish Key Words: The Basic Two Thousand Word Vocabulary Arranged by Frequency in a Hundred Units with Comprehensive Italian and English Indexes
by Pedro Casal

Arabic Key Words: The Basic Two Thousand-Word Vocabulary Transliterated and Arranged by Frequency in a Hundred Units
by David Quitregard

French Key Words: The Basic Two Thousand Word Vocabulary Arranged by Frequency in a Hundred Units with Comprehensive French and English Indexes
by Xavier-Yves Escande

Italian Key Words: The Basic Two Thousand Word Vocabulary Arranged by Frequency in a Hundred Units with Comprehensive Italian and English
by Gianpaolo Intronati

Edited by heartburn on 25 March 2005 at 1:59am



2 persons have voted this message useful



This discussion contains 64 messages over 8 pages: 2 3 4 5 6 7 8  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3280 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2017 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.