Register  Login  Active Topics  Maps  

Lists of high freq. words

 Language Learning Forum : Questions About Your Target Languages Post Reply
19 messages over 3 pages: 13  Next >>


Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6706 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 9 of 19
17 October 2007 at 8:38am | IP Logged 
I'm not too fond of frequency lists because they have problems at both ends of the scale.

For the very common words it is evident that you have to learn the words, but you will see them everywhere so you don't need a list to point them out. Furthermore those words are often irregular (pronouns) or they are essential 'glue' at the syntactical level (conjunctions), so that you need more than just a nodding acquaintance with these words to use them correctly. In fact it may be difficult to translate them in isolation because their use is tied to the use of other words, as for instance the prepositions used with certain verbs, and this makes the 'meaning' very diffuse.

For slightly less common words the frequency lists may be relevant, i.e. words that tend to pop up here and there, but not so often that you see them all the time. Besides these words normally have a more welldefined meaning, which you can learn - but never without the risk that they are used in idiomatic expressions. The more common a word is, the more likely it is that it has some ultra idiomatic uses that you have to learn individually.

For words beyond, say, the first 1000 items on the list the frequencies are so low that it really isn't worth learning them from a list. If you have some special interest, as for instance history or music or zoology or exotic cuisine, you will in all likelihood meet the 'special' terms of your chosen interest much more often than item 1001 on a general frequency list. For instance the history buff will meet the words for different kinds of weapons, the gourmet will have to learn the names for different kinds of meat, and the birdwatcher will be confronted with the words for each and every part of a bird plus the names of typical habitats. It is however unlikely that such words will figure on any frequency list, - and if they do it will probably be due to a methodological flaw (a too small or skewed sample), or the word is on the list because of a less specialized use.

Above those first 1000 words or so you will be better served with word lists that you have compiled yourself.


Edited by Iversen on 17 October 2007 at 8:59am

1 person has voted this message useful



Linguamor
Decaglot
Senior Member
United States
Joined 6621 days ago

469 posts - 599 votes 
Speaks: English*, German, Italian, Spanish, Swedish, Danish, French, Norwegian, Portuguese, Dutch

 
 Message 10 of 19
17 October 2007 at 2:59pm | IP Logged 
Iversen wrote:

Besides these words normally have a more welldefined meaning, which you can learn - but never without the risk that they are used in idiomatic expressions.


Research from the field of corpus linguistics has shown that the meaning and usage of a word is highly dependent on the other words with which it is used.

Iversen wrote:

For words beyond, say, the first 1000 items on the list the frequencies are so low that it really isn't worth learning them from a list. If you have some special interest, as for instance history or music or zoology or exotic cuisine, you will in all likelihood meet the 'special' terms of your chosen interest much more often than item 1001 on a general frequency list.


Words in the frequency range of 2000, 3000, 4000, 5000, and beyond, are still so frequent as to have a likelihood of occurrence far greater than specialist terms.

Corpus-based research using computers has allowed researchers to discover many things about word meaning, word use, and word frequency that were not previously known.

Corpus linguists have huge amounts of data available to them. The British National Corpus and the Cobuild corpus contain hundreds of millions of words of text and transcribed speech.

Similar corpora exist for other languages also.


Word frequency lists for English from the British National Corpus
www.kilgarriff.co.uk/bnc-readme.html

                  

Edited by Linguamor on 17 October 2007 at 3:06pm

2 persons have voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6706 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 11 of 19
17 October 2007 at 7:56pm | IP Logged 
I think you are underestimating the weirdness of at least some people's reading habits. For instance I'm reading a lot of Lonely Planet guides, so I meet the English words "lonely" and "Planet" quite often. They are respectively number 3778 and number 3143 on the list you refer to, - not really impressive rankings in spite of their importance in my world. "Railway" and "museum" are apparently fairly common words (no. 1153 and 1473, hurray!), but terms like "airport" (2855) and - even worse - "terminal" (3933) and "timetable" (no. 4562) are quite far down the list. Common words (in my world) like "guidebook", "visa", "customs", "botanical", "nave", "circus", "zoo", "backpacker", "malaria" and "sunset" don't even make it to the list.

Another of my interests is paleontology, a word that itself is not on the list, while "jaw" and "skull" barely make it at rank 4145 resp. 4846, and the word "dinosaur" is totally absent. I'm also very interested in classical music, but the only orchestral instrument that is on the list is the "horn" (with a lot of help from the car industry). The word "astronomy" (another of my interests) is also totally absent, while "linguistic" at least is there (rank 3050), - but not the name of the disciplin, "linguistics". Neither "consonant" nor"vowel" is on the list, and "morphology" and "phonetics" have also missed the entrance requirements. But at least "grammar" is found at at rank 2977 (probably because people hate it), and the sensation of the day: the word "language" is at rank 452, - higher than even football (1529), which is certainly not one of my interests. I note with some glee that "gymnastics" is not even on the list. I'm not going to complain, I always hated that word.

So you see, there is a reason that I don't find my interests covered by ordinary frequency lists. If I had to learn English now - which is not among my current plans - I would be better served with a list that I had compiled myself based on my actual reading (plus some selective culling of dictionaries) than with the general frequency table.

In principle I could have taken an electronic text from the internet about one of the subjects I have mentioned above and stuffed it into one of the word counters that have recently been mentioned in this thread, and then I could have compared the rankings with those in the 'general' frequency list, and of course I would then have found a lot of common terms. I'm certainly not trying avoid all words that are found on the general frequency lists, - on the contrary! But I can just as well learn such words while doing my usual reading - and if I stumble over an interesting word I'm going to learn it whether it is on the hallowed list or not.



EDIT (18/10): my my, I'm getting old. I forgot to mention the most important argument against frequency lists: they don't give explanations, translations or examples. Any decent dictionary tells you what the words in it means (and yes, I know that it can't tell you everything about a word). Frequency tables don't, so they can at most give you a hint about words you don't know yet, but they can't tell you anything about the meaning without becoming dictionaries themselves. Yesterday I still thought - for infathomable and flawed reasons - that frequency lists might have some relevance for the language student. Today I have to categorize them as totally useless. Leave them to the scientists that Linguamor mentions and grab your good ol' dictionary.

What we could hope for is that frequency indications found their way into ordinary dictionaries, either in the form of a rank number or in some simplified, more graphic form.


Edited by Iversen on 18 October 2007 at 3:40am

2 persons have voted this message useful



William Camden
Hexaglot
Senior Member
United Kingdom
Joined 6275 days ago

1936 posts - 2333 votes 
Speaks: English*, German, Spanish, Russian, Turkish, French

 
 Message 12 of 19
18 October 2007 at 6:00am | IP Logged 
The Russian Learners' Dictionary by Nicholas J. Brown is both a frequency list of the first 10,000 words in Russian, and a dictionary. It has some of the weaknesses of frequency lists but overall is, I think, an excellent way to build up Russian vocabulary.   
1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6706 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 13 of 19
18 October 2007 at 6:21am | IP Logged 
A step in the right direction, but what I would like to see is a frequency indication at every lexeme in the dictionary, - or maybe five stars for a word within the first 500 words, four stars up to nr. 999, three stars at no. 1000-1999 and so on down to a grinning scull at rare and obsolete words. Plus maybe the complete frequency list right after the morphology tables that many dictionaries contain. I don't deny the relevance of frequency counts as such, but you have to make the information available where it is needed and in a form that makes it easy to use it.



Edited by Iversen on 18 October 2007 at 6:23am

1 person has voted this message useful



Captain Haddock
Diglot
Senior Member
Japan
kanjicabinet.tumblr.
Joined 6771 days ago

2282 posts - 2814 votes 
Speaks: English*, Japanese
Studies: French, Korean, Ancient Greek

 
 Message 14 of 19
18 October 2007 at 9:13am | IP Logged 
Even then, what constitutes the top 500 or 1000 or 5000 words in a language varies greatly according to source and register — newspapers vs. colloquial speech, for example — as well as other factors (regional variation, etc.).

I don't see how you could create an authoritative list.
1 person has voted this message useful



Linguamor
Decaglot
Senior Member
United States
Joined 6621 days ago

469 posts - 599 votes 
Speaks: English*, German, Italian, Spanish, Swedish, Danish, French, Norwegian, Portuguese, Dutch

 
 Message 15 of 19
18 October 2007 at 10:50am | IP Logged 
Iversen wrote:

What we could hope for is that frequency indications found their way into ordinary dictionaries, either in the form of a rank number or in some simplified, more graphic form.


Iversen wrote:
... what I would like to see is a frequency indication at every lexeme in the dictionary, - or maybe five stars for a word within the first 500 words, four stars up to nr. 999, three stars at no. 1000-1999 and so on down to a grinning scull at rare and obsolete words.


The Cambridge Advanced Learner's Dictionary, based on a corpus of 600 million words of written and spoken language from a huge variety of sources (the Cambridge International Corpus), provides frequency information for about the 12000 most frequent words.

E -(Essential) Words which everyone needs to know to communicate effectively (4900 words).

I - (Improver) Words that are also very common in native speaker English (3300 words).

A - (Advanced) Words that are sufficiently common that they should be known by advanced learners of English (3700 words).




Edited by Linguamor on 18 October 2007 at 10:54am

1 person has voted this message useful



Linguamor
Decaglot
Senior Member
United States
Joined 6621 days ago

469 posts - 599 votes 
Speaks: English*, German, Italian, Spanish, Swedish, Danish, French, Norwegian, Portuguese, Dutch

 
 Message 16 of 19
18 October 2007 at 11:38am | IP Logged 
Iversen wrote:

In principle I could have taken an electronic text from the internet about one of the subjects I have mentioned above and stuffed it into one of the word counters that have recently been mentioned in this thread, and then I could have compared the rankings with those in the 'general' frequency list, and of course I would then have found a lot of common terms.


One of the things that make frequency lists less useful for language learning/teaching purposes is that the compilers try to determine the frequency of words in the language as a whole, i.e. trying to include in the corpus the widest possible range of sources so that the corpus is representative of everything that is written or spoken in the language. The language learner would be better served by frequency information that is derived from a corpus that is comprised of texts and speech that is representative of the language that the language learner is likely to encounter in his or her use of the language.       

Iversen wrote:

EDIT I forgot to mention the most important argument against frequency lists: they don't give explanations, translations or examples.


A frequency list that is a simple list of words without any other information than the relative frequency of the words in the corpus is not of much use to the language learner.



1 person has voted this message useful



This discussion contains 19 messages over 3 pages: << Prev 13  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.4219 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.