Register  Login  Active Topics  Maps  

Experimenting with French word frequency

 Language Learning Forum : Specific Languages Post Reply
55 messages over 7 pages: 1 2 35 6 7  Next >>
Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6395 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 25 of 55
05 September 2014 at 2:06pm | IP Logged 
emk wrote:
Oh, and for what it's worth: I believe you need both a solid vocabulary and high speaking fluency and correctness.

If we speak of C1/C1 and job interviews, I agree. But in real life grammatical accuracy is surprisingly useless, I'd say. Apart from certain specific "trouble points", you can really screw up a lot and still be understood if you know the needed vocabulary. (if you use circumlocutions with poor grammar, hell breaks loose, true) I personally work on my accuracy mostly because I don't want my own speech to make me cringe, and because many of my friends are language geeks, and I want to have even more friends among them.

Maybe it's time for a new "what impresses you?" thread. I remember how the administrator started one, and it was very common to value a native-like pronunciation most of all. s_allard's pet topic of mastering small talk is certainly a legitimate criterion, but it's just one of many possibilities.

Edited by Serpent on 05 September 2014 at 2:11pm

1 person has voted this message useful



rdearman
Senior Member
United Kingdom
rdearman.orgRegistered users can see my Skype Name
Joined 5034 days ago

881 posts - 1812 votes 
Speaks: English*
Studies: Italian, French, Mandarin

 
 Message 26 of 55
05 September 2014 at 3:26pm | IP Logged 
Serpent wrote:
But in real life grammatical accuracy is surprisingly useless, I'd say. Apart from certain specific "trouble points", you can really screw up a lot and still be understood if you know the needed vocabulary.


I have to agree with this, I think a large vocabulary is more important than grammar if you are just trying to be understood. For example:

Me you help find where train station?

will get a better result for a beginning English speaker than.

Could you please tell me where the ..... can't remember the word.... the .... errrr....




4 persons have voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5228 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 27 of 55
05 September 2014 at 3:45pm | IP Logged 
Jeffers wrote:
s_allard wrote:

Right from the beginning, during all these pleasantries, the examiner is getting an idea of the candidate's level.
The way the person answers to "how are you today?" can tell the examiner how things are going to start off. Is
the person stuttering and stumbling or do they come across as articulate and fluent?

As Head and Nation point out in the article I mentioned above, high-proficiency speakers are able to provide
more nuanced and detailed answers in their discussions.

The examiner is not keeping track of the vocabulary being used but they notice mistakes. If the candidate can
talk about the weather in a detailed manner without making mistakes, then half the battle is won.


I think it's ironic, but you're really mixing up fluency and proficiency in your argument. In the first paragraph I
quoted, you're clearly writing about fluency. But then you write about "high-proficiency speakers" who give
"nuanced and detailed answers", and then talking about the weather "in a detailed manner". I agree it doesn't
require high levels of vocabulary to be fluent, but these latter two descriptions certainly involve increased
vocabulary.

The difference between "talking about the weather" and "talking about the weather in a detailed manner" is
vocabulary. A "more nuanced and detailed answer" implies more vocabulary.

Saying that I mix up fluency and proficiency is fighting words! (Just kidding). When I wrote "articulate and
fluent", I used fluent exactly as I've always used it i.e the absence of undue hesitations in the speaking voice.

I don't know how many times I have to say that more vocabulary is better than less. Do you need more vocabulary
to talk about the weather in a detailed manner than to just talk about the weather? Yes, the question is how much
more.

But the essential argument here is that in this kind of interview situations the examiner is not counting the
vocabulary you are using, The examiner sees the overall ability to manipulate the language, and particularly the
ability to put the words in the right order, in the right form and smoothly.

Do you talk about the weather by listing all the technical words you know? If the interviewer asks you what the
weather is like do you answer "fog, low visibility, light rain, overcast, 16 degrees celsius, 80% humidity" with the
idea of impressing the person by your technical knowledge? Or maybe something like: "The weather right now is
not great; it's a bit on the foggy side, typical for this time of the year. I haven't been out yet and I'm not really
looking forward to it. I personally prefer more sun. What's the weather like where you are?" Which answer carries
the day?
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5330 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 28 of 55
05 September 2014 at 4:19pm | IP Logged 
Serpent wrote:
But in real life grammatical accuracy is surprisingly useless, I'd say. Apart from certain specific "trouble points", you can really screw up a lot and still be understood if you know the needed vocabulary.

There's another interesting wrinkle here: If you only know a grand total 300 words, then perfect pronunciation and fluid speech can get you into a lot of trouble. Why? Well, if somebody walks into a shop and says, with an atrocious accent:

"I... wants?... <pointing> the baguette."

…then the person behind the counter will speak slowly and use small words. But if somebody walks in and says the following with perfect intonation and pronunciation:

"C'n I please h'v a baguette?"

…then the response will be at full native speed. If don't understand words like coûter "cost" and celle-là "that one" (both of which could easily be left off a 300 word list), you probably don't want to be subjected to full native speed responses. When you actually do start producing excellent small talk, many natives will become a lot less patient. Somebody who speaks deliberately and with a heavy accent will generally be treated as a foreigner who's making a massive effort. Somebody who handles small talk perfectly but who gets temporarily stuck discussing the latest movies will occasionally be treated like an idiot.

Re-examining Milton's vocabulary size estimates

Anyway, all this is a bit of a diversion. :-) What I want to do next is use the Lexique coverage data to look at Milton's vocabulary estimates:



Milton's metholodogy was a bit odd, because he based it on the XLex tests. These don't actually estimate vocabulary size. Instead, they estimate how many of the most frequent 5,000 words you know. No matter how huge your vocabulary, you can never score 5,001 on these tests.

Now let's take another look at coverage chart I generated from Lexique:

emk wrote:


Code:
Words    Film    Boo k
250    76.16% 68.56%
500    82.79% 75.53%
1000   88.39% 82.03%
2000   93.00% 88.16%
4000   96.41% 93.42%
8000   98.55% 97.30%
16000 99.67% 99.73%


Let's break this down by CEFR level, omitting A1.

A2. The students in Milton's study have average passive vocabularies of 1,700 to 2,237 words. This seems reasonable: It's enough to cover a wide range of basic conversations (including the responses), and it gives 88% text coverage of native text. There's enough words to have all the common verbs, plenty of nouns, a nice selection of adjectives, and a full set of pronouns and connecting particles.

B1. Milton shows vocabularies of 2,194 to 3,338 words. This gets students to roughly 90–95% coverage of books and films. This still feels fairly plausible to me, and it's a nice step up from A2.

B2. Here's where I start to disagree with Milton, slightly. He gives 2450 to 4012 words. At the upper end of this range, we have only 93.42% coverage of text according to Lexique, or an average of 23 unknown words per 350-word page. If we compare this to the CEFR self-assessment checklist, we get:

Quote:
B2 Reading

- I can rapidly grasp the content and the significance of news, articles and reports on topics connected with my interests or my job, and decide if a closer reading is worthwhile.
- I can read and understand articles and reports on current problems in which the writers express specific attitudes and points of view.
- I can understand in detail texts within my field of interest or the area of my academic or professional speciality.
- I can understand specialised articles outside my own field if I can occasionally check with a dictionary.
- I can read reviews dealing with the content and criticism of cultural topics (films, theatre, books, concerts) and summarise the main points.
- I can read letters on topics within my areas of academic or professional speciality or interest and grasp the most important points.
- I can quickly look through a manual (for example for a computer program) and find and understand the relevant explanations and help for a specific problem.
- I can understand in a narrative or play the motives for the characters’ actions and their consequences for the development of the plot.

I think it would be challenging to perform at this level with an average of 23 unknown words per 350-word page. For comparison purposes, shortly after I passed the DELF B2 with a solid reading score (20+), I sampled a frequency dictionary and estimated my vocabulary to be very roughly 7,000 words, give or take at least a thousand.

C1. And here's where I stop believing Milton's numbers completely. He gives 2,675 to 4,300 words for C1. This is scarcely larger than the 2450 to 4012 words he gives for B2. I think there's actually three things going on here:

- Milton's running into the 5,000 word ceiling on XLex.
- The Spanish-speaking French students, with their 2,000-to-3,300 word vocabularies, may be getting a Romance-language discount.
- There's some pretty bad CEFR inflation going on here.

C2. And now we've completely escape any notion of probability. Milton measures a vocabulary size of 3,525 to 4,068 for C2 students. First of all the highest-scoring group of B2 students (Greek learners in Greece) knew a mean 4,012 words, and the highest scoring group of C2 students (English learners in Greece) knew a mean of 4069 words. Either way, we're still look at an average of about 23 unknown words per 350-word page (for French, so this is a fuzzy comparison). Let's compare this dismal performance with the actual C2 standard:

Quote:
C2 Reading

- I can recognise plays on words and appreciate texts whose real meaning is not explicit (for example irony, satire).
- I can understand texts written in a very colloquial style and containing many idiomatic expressions or slang.
- I can understand manuals, regulations and contracts even within unfamiliar fields.
- I can understand contemporary and classical literary texts of different genres (poetry, prose, drama).
- I can read texts such as literary columns or satirical glosses where much is said in an indirect and ambiguous way and which contain hidden value judgements.
- I can recognise different stylistic means (puns, metaphors, symbols, connotations, ambiguity) and appreciate and evaluate their function within the text.


So looking at the Lexique graph and table above, here are my personal guesstimates for passive vocabulary, focusing on the formal written register, and allowing minor weaknesses with very casual speech:

A2: About 2,000 words, as per Milton.
B1: About 3,000 words or a bit more, slightly higher than Milton.
B2: 4,000 to 6,000 words.
C1: At least 8,000 words, giving 98.5% coverage of arbitrary text.
C2: Roughly 16,000 words or more, giving >99.5% coverage of arbitrary text.

Of course, any of these numbers could be off by 25%, given the inherent vagaries of word counting.

And in no case do I wish to discourage people: it's easy to pick up a couple thousand words using Assimil and Anki (or your preferred alternatives), and you can creatively finesse extensive learning from there until you cross the 98% threshold. If you can find a way to enjoy semi-extensive reading with 90% vocabulary coverage, you can do most of the "work" while reading trashy fun novels and watching TV. Don't think of all this vocabulary as a horrible slog; think of it as an excuse to goof off shamelessly in a new language.
6 persons have voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5228 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 29 of 55
05 September 2014 at 5:17pm | IP Logged 
emk wrote:
..
The easiest way for you to convince me that 300 words will take somebody a long way is to actually fill out the
list. Claiming a small number of words is enough doesn't make sense unless you actually choose the words. Once
the words are picked out, it becomes possible to compare them against a wide variety of typical A2 conversations
and see whether they suffice. I'm perfectly willing to help with this project—I can provide lists of candidate words,
I can build tools to count the unique words in a corpus, and so on.

I agree that the noun list will be the most difficult. It's too subject-specific and the distribution is too flat. The
top 2000 nouns only get you 84% coverage of nouns in running text (I find this to be astonishingly low,
compared to 90% coverage with 260 verbs):
....


I don't think that I can convince emk that 300 words will take somebody a long way because we approach the
whole question of minimum vocabulary size from two radically different positions. His position, and I hope I'm
getting it right, is that we have to compare a suggested minimum number of words to a range of sample
conversations and see what kind of word coverage we get. So let's say we take a given 300-word vocabulary and
run it past a set of transcriptions of 10 typical A2 conversations and we get a 50% word coverage. Well, you can't
do anything with that. Or take the first 300 words of a French frequency list and run that past a database of
French movie subtitles. Again, you won't be able to do anything with 300 words.

Here is my approach. 300 words is a threshold for a learner to start developing meaningful statements. I'm not
sure what emk means by "take you long way". I prefer the term threshold. Why 300 words and which 300 words?
300 is an arbitrary figure of course that teachers of French use, but I personally base this figure on observations
of real conversations and not by looking at large frequency lists. In essence, I'm looking from the ground up.

I've given two examples of real conversations where the people use far less than 300 different words. In one
conversation there are 224 different words. If we run those words against some A2 conversation dataset, what
kind of coverage would we get? I have no idea. Let's pick an arbitrary figure, 60%. We conclude that this level of
coverage is insufficient for good understanding of A2 level French.

Does this make sense? Statistically, yes. If you do the same comparison with a dataset of C2-level French, you
would probably get similar results because this particular conversation contains a fair number of quite specific
terms. Do we conclude that these people are not speaking C2-level French and could not pass a C2 spoken
French test?

To be more specific, let's look at a key component of speaking proficiency: verb usage. The two conversations I
looked at used 33 and 34 verbs respectively, of which only 8 were shared. Considering that it seems you need
around 1,000 verbs to get 97% word coverage of all the verbs used in a dataset of French movie titles, what can
we conclude about these native speakers of French? Maybe they don't really understand each other? Are they A1-
level speakers?

But we do see two French speaker having a great conversation with just 33 or 34 verbs. And if we combine the
two conversations, 59 verbs will do.

Now the big question: how many verbs do you need to participate in a conversation in French? You have a choice.
If you want to speak like all the actors in French movies, you need 1,000 verbs. I agree. If you want to have a
two-minute conversation with native French speaker at a given moment, I say you need around 34. You choose.

Which 34? Frankly, outside the 10 most common verbs in French, the actual verbs used will depend on the
subject of the conversation. The verbs used will vary from conversation to conversation.

If we look at pronouns, we see that in the two conversations, only 10 pronouns were used in both conversations.
French has more than 10 pronouns of course.

For adverbs and adjectives, we see a similar story. Small numbers and some shared elements.

Nouns are where there are the biggest differences. This is not surprising because this where most of the subject
content is expressed.

How does this all come together in the idea of a 300-word threshold? First of all, we look at the breakdown by
category in real conversations, Then we throw in some numbers for padding. This is why I suggested 80 verbs.
Considering that 33 and 34 were enough for two conversations, 80 should give us room to play with. It's the
same reasoning with all the various categories.

The key point here is that there cannot be a standard 300-word list for everybody. Couldn't you just take the first
300 most frequent words from a French frequency list? Why not just take the 80 most common French verbs?
Why not use the 150 most common nouns from French movies? That won't work because they may not be right
for you because of your specific interests.

The big question is which words. This is where the learner has to do some leg work and look at their interests
and their situation. If you are retired and studying French to go to spend time in France, your vocabulary needs
will be different from those of a person with young children who has just moved to Montreal.

But in both cases, I maintain that somewhere in the 300-word region learners will have enough material to work
with and start speaking the language for real.


1 person has voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5228 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 30 of 55
05 September 2014 at 6:13pm | IP Logged 
I have always been intrigued by James Milton's article on vocabulary size for C2 candidates because of the seemingly
low numbers for receptive vocabulary on the French tests. Milton is a well-known researcher in the field of
vocabulary size and presumably knows what he is doing.

I should point the absolute dearth of vocabulary sizes studies of actual C2 test results. We just do not know what
productive vocabulary candidates use, and at best we can estimate what the receptive vocabulary is.

I feel that in the CEFR world, vocabulary size is not on the radar. There are absolutely no guidelines or numbers
related to recommended vocabulary size.

I take no position on recommended receptive vocabulary size for the various CEFR levels because I believe it is hard
to measure this kind of vocabulary. Furthermore, my own interests are more in productive vocabulary.
1 person has voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6395 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 31 of 55
05 September 2014 at 6:28pm | IP Logged 
Sure, everyone has different interests. Just please acknowledge that those 300 active words should be accompanied by many more known passively. Maybe most Canadians pick them up at school and in general due to living in a bilingual country. But by default it's not a given, so it sounds like you claim 300 words are enough for the passive vocabulary too.
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5330 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 32 of 55
05 September 2014 at 8:28pm | IP Logged 
s_allard wrote:
His position, and I hope I'm getting it right, is that we have to compare a suggested minimum number of words to a range of sample conversations and see what kind of word coverage we get. So let's say we take a given 300-word vocabulary and run it past a set of transcriptions of 10 typical A2 conversations and we get a 50% word coverage.

This is basically correct, with a few caveats:

1. The word list doesn't have to be based on frequency dictionaries. It can be carefully tuned for A2 tasks, in much same way Basic English was tuned.[1]

2. Nobody expects A2 students to perform B1 or C1 tasks. They need to make polite small talk, buy basic living supplies, order dinner, find the bathroom, give and receive directions, and successfully use subways and trains. But within this limited problem domain, they should satisfy all the A2 criteria.

3. Students may use pantomime and ask for new vocabulary words.

Personally, I think you could carefully construct a 1,000-word vocabulary that could be creatively used to handle most of these situations. But most students will need a total vocabulary of at least 1,500–2,000 words if they want to go out in public and function at an A2 level in most situations. Not all of this vocabulary would need to be active. I base these numbers on Milton's survey and on my own experience.

[1] Basic English assumes 850 general words, and 150 extra words to cover individual interests. And Basic English can tricky even for native speakers. Vernor Vinge, a writer and math professor, spoke about a science fiction story he wrote: "The word-hacker in me was also intrigued by the Basic English vocabulary the aliens used. (It turned out to be surprisingly difficult to write in that vocabulary. Once I saw the Gettysburg address redone in Basic English; it seemed about as eloquent as the original. I didn't realize until I was writing this story was a feat that was.)" You can find a version of the Gettysburg address in Basic English online. Note the use of phrasal verbs and idiomatic compounds. Many of these turns of phrase would be unknown to actual beginners.

s_allard wrote:
To be more specific, let's look at a key component of speaking proficiency: verb usage. The two conversations I
looked at used 33 and 34 verbs respectively, of which only 8 were shared.

This is the basic limit of using a small conversational corpus: each new conversation you add requires a fair bit of new vocabulary. Here's what I'd argue:

What you can do with only 300 words
- Have a controlled conversation on a known topic, either with a tutor or in class.
- Switch to a new topic by introducing a few dozen words.
- Learn (and even internalize) a large amount of French grammar.
- Make brief small talk with bilingual neighbors.

What you can't do with only 300 words (or so I claim)
- Perform the full variety of survival tasks which are normally expected of an A2 student.
- Pass a well-designed A2 exam (without the gross academic dishonesty of being given the test in advance).

s_allard wrote:
I have always been intrigued by James Milton's article on vocabulary size for C2 candidates because of the seemingly
low numbers for receptive vocabulary on the French tests. Milton is a well-known researcher in the field of
vocabulary size and presumably knows what he is doing.

I'm convinced that there's something awfully suspicious about Milton's paper. If you take his numbers seriously, it means that you can be C2 with less than 95% text coverage, or about 20 unknown words per page in the typical paperback. This is incompatible with the kind of nuanced comprehension of complex materials demanded by the C2 criteria. If we go by this standard, my passive skills would be far above the C2 threshold. Either (a) Milton's use of a 5,000 word XLex test distorted his results drastically, or (b) there's a lot of people out there calling themselves C2 (with the enthusiastic encouragement of their teachers) who can't read an ordinary paperback without a dictionary and a whole lot of guessing.

Milton's numbers are also inconsistent with those of several other reputable researchers, as seen in this figure by Françoise Kusseling and Wilfried Decoo:



My proposed numbers are much more in line with those from Schmitt, who used roughly the same methodology: He looked at the C2 level descriptors, figured out how much vocabulary would be needed to reach those rather demanding standards, and then used coverage data to estimate vocabulary size. In my case, I could also calibrate my numbers against a vocabulary size estimate I made around the time I passed a B2 exam.

Milton's numbers are based on a 5,000 word XLex exam given to actual students. The students' CEFR levels were determined by teachers who had assigned them to "streams for study at each of the CEFR levels." To me, this seems lack rigor, and encourage CEFR level inflation. A better strategy would be to assess student levels using actual CEFR assessment exams such as the TCF.



3 persons have voted this message useful



This discussion contains 55 messages over 7 pages: << Prev 1 2 35 6 7  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3438 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.