36 messages over 5 pages: 1 2 3 4 5 Next >>
emk Diglot Moderator United States Joined 5531 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 17 of 36 29 January 2015 at 6:06pm | IP Logged |
Frequency dictionaries are all based around a corpus, and depending on which specific corpus is chosen, this makes a huge difference. A written corpus will be different than a spoken corpus, and even with a written corpus, it matters how much is pulled from newspapers and how much is pulled from fiction.
So just as with eyðimörk and ammazzavampiri, I learned un maestro del aire ridiculously early in my Spanish studies, because I wanted to watch Avatar, and they say it constantly. And since Avatar involves a lot of ice and boats, it was only a short jump from there to read a kid's graphic novel about the Titanic. And Avatar also helps with Harry Potter, which helps in turn with other fiction. I much prefer focusing my vocabulary on actual, interesting tasks, and allowing things to broaden gradually.
This is also why I like parallel text and parallel text/audio in the very beginning: It allows me to use interesting materials without first needing to memorize tons of vocabulary. And interesting materials will quickly show me what words are important to me.
The downside is that I'll wind up making too many pop culture references. But even there, if I choose media that's wildly popular among native speakers, at least I'll know how to say, Si seulement vous connaissiez le pouvoir du côté obscur ! Which is not exactly a useful skill, but at least it's an amusing one. :-)
5 persons have voted this message useful
| smallwhite Pentaglot Senior Member Australia Joined 5307 days ago 537 posts - 1045 votes Speaks: Cantonese*, English, Mandarin, French, Spanish
| Message 18 of 36 30 January 2015 at 3:34am | IP Logged |
s_allard wrote:
The interesting thing about the poll so far is that 66% of the respondents don't worry about word frequency at all. |
|
|
Since the question in post #1 was "Do you ever use a frequency dictionary? When do you stop?", I think people tick the "I don’t worry about word frequency" option when they don't study off a frequency list.
7 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5429 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 19 of 36 01 February 2015 at 2:17pm | IP Logged |
When we look at frequency lists and their utility, it is very important to understand how they are
constructed. First of all, we must keep in mind that these lists are based on aggregate data. In other
words, we sum all the words of sample sets such as movies, contemporary fiction, newspaper articles,
everyday speaking, etc. and we put the words in order of decreasing frequency. Obviously, I've glossed
over many of the methodological issues involved in this process.
The key point here is that while we may have a large total set of different words, an individual sample
source only uses a portion, often a small portion, of all those words. For example, if I have 100
speakers in my sample, they may use less than 500 different words each but since those words can
differ from one speaker to the next, I may end up with 10,000 words in the entire set.
I won't get into my usual rant about how many words one needs to speak a language but what is
important is that when we look at frequency rankings, it is very important to look at the spread or
breadth of the words used, i.e. how they are spread across the users in the sample. What we are really
interested in are those words that are used by a large number of speakers.
You would think that high frequency also means wide spread. This is generally true but not always. I
see in a list for essential vocabulary in French based on 163 users that there are only 6 words used by
all 163 users. We see that word number 12, the pronoun on, is used by only 128 users. So 35 or
21.4% of the sample of users never used the 12th most common word in the French language. By the
time you get to word number 100, the numbers for the spread are all over the place. For example,
word number 149, femme, is used by only 74 users, or less than half the sample. Word 1000 is
used by 17 speakers out of 163.
These figures only confirm what we all know intuitively; word frequency is not uniformly spread among
users. It's pretty much the opposite. Only a very tiny number of words are shared by all users and then
it's rapidly downhill all the way from there.
The second point that I want to make but will have to wait until I get more time is that what really
users share is the grammar, and this is the key to language performance, both active and passive. This
is somewhat reflected in frequency lists because the high-frequency words tend to be those function
or grammar words.
Edited by s_allard on 01 February 2015 at 2:50pm
1 person has voted this message useful
| Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6596 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 20 of 36 01 February 2015 at 3:43pm | IP Logged |
s_allard wrote:
The second point that I want to make but will have to wait until I get more time is that what really users share is the grammar, and this is the key to language performance, both active and passive. This is somewhat reflected in frequency lists because the high-frequency words tend to be those function or grammar words. |
|
|
Um but you've made the point already? Honestly, just accept that nobody mysteriously forgets that grammar exists when they discuss the vocabulary. It would be worth pointing out in a language learning discussion on a travel forum or some such, because casual learners can definitely get too excited about the vocab and have the misconceptions that you repeatedly attribute to all of us here. Seriously, on a specialized forum there's no need to bring this up every single time.
"Grammar words" are often excluded from the frequency lists too, since they're not meant to be your only source.
Also about the highly frequent words, it's worth noting that the first 500-1000 are so common that it's pointless to learn them from frequency lists. I don't really use this kind of resources anyway, but if I did, I think it's mostly useful roughly between 500 and 5000 words. Of course this depends on the language, especially on how many words beyond the first 5000 you get for free (quite few in Finnish or Japanese, plenty in Spanish for anyone who speaks English).
6 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5429 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 21 of 36 02 February 2015 at 8:34am | IP Logged |
Typical word frequency lists are rather crude tools that lump all the words into the same categories
and look only at coarse frequency. But in fact there are major differences in the kinds of words. One
often hears of the distinction between meaning words and function or grammar words.
In fact the distinction is really between open-ended classes and closed classes. The open-ended
classes contain those words whose numbers are constantly changing because new forms are always
being invented and obsolete forms dying out. The most common class is nouns or words for
designating things. Then we have the classes of noun modifiers, verbs, verb modifiers.
We could thus have sublists for things like the most common verbs, noun modifiers or adjectives. One
of the big problems here in a language like English is the fact that nouns can function like noun
modifiers as in "country music", unlike French which uses a different structure. On the other hand,
French has a very simple mechanism for converting all adjectives into nouns simply by using the
determinant article, as in "le nouveau."
The closed-end classes have a number of components that change very slowly because their functional
role is primarily to connect words in the right order. Things like prepositions, pronouns, demonstrative
pronouns and adjectives, possessive adjectives, determinants, conjunctions, interrogative words
remain quite constant. For example, the standard English pronoun system has remained nearly
unchanged for the last 200 years except for some recent changes to reflect non-sexist usage.
When we look at word frequency lists, these closed-class words, or grammar words, are at the top of
the list . I've never seen a frequency list that excludes these so-called grammar words but I don't claim
to know everything.
Here are the first 50 words of the Wiktionary frequency list for English-language Contemporary fiction.
the I to and a of was he you it in her she that my his me on with at as had for but him said be up out
look so have what not just like go they is this from all we were back do one about know if
There are no nouns and only six verbs, three of which are variations of to be. All the other words
are closed-class grammar words.
Here are the first 50 entries for the Wiktionary French word frequency list based on subtitles.
je de est pas le vous la tu que un il et à a ne les ce en on ça une ai pour des moi qui nous y mais me
dans du bien elle si tout plus non mon suis te au avec va oui toi fait ils as être
The pattern is the same as in the English list: the vast majority of the words are grammar words, no
nouns and a smattering of verbs.
In all the cases, as we work our way up the frequency list, the grammar words give way to the verbs
and nouns.
For me the most intriguing observation in the vocabulary text coverage studies by Paul Nation and
others is that the most common 100 words in English give around 50% general text coverage. This is
far too low for any kind of useful understanding of the text but what it does say is that half the words
of most texts are recognizable with just 100 words.
As expected, most of those 100 most common words are grammar words that are used to bind the
meaning words like the nouns, verbs and their modifiers together.
To make matters more complicated, these meaning words are often subjected to morphological
transformations based on principles of inter-word agreement.
The conclusion here is that you really can't separate vocabulary from grammar when you look at word
frequency in use. To memorize the first 200 words of a language like French from a list might take a
couple of days. But this would be a useless exercise. To learn to use these words properly could easily
take years, and many people never get them right. I'm thinking particularly of the pronouns y and en
and pronominal verb forms.
1 person has voted this message useful
| patrickwilken Senior Member Germany radiant-flux.net Joined 4532 days ago 1546 posts - 3200 votes Studies: German
| Message 22 of 36 02 February 2015 at 12:21pm | IP Logged |
At first glance it would seem that frequency lists would be useful as they provide you with a good foundation vocabulary, however, the corpus they are generated from invariably makes the list too board for your own specific needs. So learning via a frequency list is going to be less efficient than learning words selected from the particular corpus you are learning.
I also find using SRS to learn words in isolation, as opposed to words embedded within sentences, less efficient. Doing the latter allows me to learn many more words passively in the same amount of time, which is all I need as my goal is to access native materials as fast as possible. Also every sentence offers its own little grammar lesson, something you can't get from learning a list of words.
Edited by patrickwilken on 02 February 2015 at 12:24pm
3 persons have voted this message useful
| Cavesa Triglot Senior Member Czech Republic Joined 5008 days ago 3277 posts - 6779 votes Speaks: Czech*, FrenchC2, EnglishC1 Studies: Spanish, German, Italian
| Message 23 of 36 02 February 2015 at 1:21pm | IP Logged |
I may worry about frequency at times, but mostly on the common sense level. If I
encounter a word often, it is obviously worth learning.
However, I do not worry about frequency lists at all:
-their corpuses usually differ greatly from what I am likely to need.
-they end too soon. It is easy to learn the real top few hundred or a thousand words,
you actually cannot learn the language at all while avoiding them as they are
everywhere, no matter whether they are the grammar words, common verbs or the days of
the week. So such a frequency list carries no added value and no reason why I should
use it
-I would be interested in long lists of the most common 15000 or 20000 words. However,
those are rarer to find or at least to find more versions of.
I can understand why some people love the frequency lists. It is a good step towards
some goals, it is often a better selection than what you find in many beginner courses
and so on. However, I highly doubt any serious learner could naively believe the
frequency list (or vocabulary drilling in general) is the only thing you need in order
to learn a language so I see no point in hitting some old strawmen in this thread.
4 persons have voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6702 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 24 of 36 02 February 2015 at 1:30pm | IP Logged |
I have voted for "irrelevant above the first 1000 words", but as noted by s_allard above most of the really common words (or rather word forms) are 'grammar words', so in essence I'm just saying that you have to learn those - except maybe some extremely rare forms. And then you can look at a frequency table at some point to do a 'mopping up' operation to close a few holes. Above 1000 words your choice of sources is more relevant than the figures in any frequency table.
If you draw a cumulative curve for frequencies and coverage it will go steeply upwards for a short while and then flatten out. The point where the curve starts flatting might be a relevant marker for the relevancy of frequency lists - everything to the right of that point is by definition rare, and everything to the left is common.
I made a set of curves based on the Kilgariff corpus for the Novi Sad event, and here the turning point lies around the 60% coverage line - and that means that words with a frequency above approx. 0,03% by this definition are 'common'. That is somewhere around the first 270 word forms with this corpus and the attached frequency list. In English there are relatively few morphological forms of each word, but the prepositions, adverbs, articles and pronouns belong here. In languages with much morphology you would expect that inflected function words (like pronouns) become slightly rarer, but so do all other words so my hunch is that the functions words you would be be expected to know always would be found within the first 1000 words regardless of the language - and probably within the first 500 first or even less.
Emk once published some statistics for French where the different word classes were separated, and the general picture was that the great majority of the rare words are substantives. The verbs and adjectives have their distribution skewed towards the less rare words, while adverbs, conjunctions and prepositions mostly belong to the common area.
And why is this important? Well, primarily because you should use different learning techniques for function words - with larger focus on their structural characteristics. The very rare words - which are the overwhelming majority of all words - might inspire you to use some kind of organized memorization, whereas the problem with words that pop up in almost all texts lies in the varied expresssions in which they are used, not in learning each individual word as such.
Edited by Iversen on 02 February 2015 at 3:06pm
4 persons have voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3906 seconds.
DHTML Menu By Milonic JavaScript
|