319 messages over 40 pages: 1 2 3 4 5 6 7 ... 6 ... 39 40 Next >>
Elexi Senior Member United Kingdom Joined 5565 days ago 938 posts - 1840 votes Speaks: English* Studies: French, German, Latin
| Message 41 of 319 08 April 2014 at 1:26pm | IP Logged |
I am glad shapd mention Professor Nation - he gives a clue to why it is so difficult to
develop a vocabulary size necessary to get high levels of comprehension - his findings
are that it takes 5-16 separate exposures to a word for it to stick. To get a vocab of
8000 words and so get into the 95% level of vocabulary needed for comprehension, that
makes (at the highest) 128,000 exposures. Now many of us use a raft of methods -
Assimil, Anki, Quizlet, extensive reading, LR etc to get such exposure - such methods
seem to be frowned upon by most language teachers I read or talk to and certainly seem
to be in the realm of geeks like us, not the average 16-18 year old.
Of course, such a high level is not necessary for colloquial language use, but it does
seem to be necessary for comprehension of a wide range of texts.
Edited by Elexi on 08 April 2014 at 1:53pm
2 persons have voted this message useful
|
emk Diglot Moderator United States Joined 5532 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 42 of 319 08 April 2014 at 1:49pm | IP Logged |
s_allard wrote:
None of these tests actually count how many words you actually can use. They estimate how many
words for which you know one definition based on a sample derived from a frequency list. This means that if you
know word number 100 on the list, the assumption is that you know the 99 preceding words. If you know word
2000 then you know the previous 1900 words, etc. What this means is that with relatively few questions you can
guestimate how many words a person "knows". This is not the same as answering a question for every word. |
|
|
No, this is perfectly ordinary statistics: First, you divide up the words in a language by frequency. For example, you could buckets from 0–999, 1000-1999, 2000–2999, and so on. Then, you chose a random selection of words from each bucket. Let's say the user knows 29 of 30 words chosen from the first bucket, 25 of 30 words from the second bucket, 19 of 30 words from the third bucket, and so on. Using this information, you can construct a very reasonable estimate of passive vocabulary size. The math has been extremely well-understood for the better part of a century, and it even allows you to calculate confidence intervals.
Frankly, it's not even interesting to argue about whether random sampling can be used to construct reasonable estimates. If you're curious, see your nearest online course in statistics.
However, I do agree that these tests only tell you about passive vocabulary, not active vocabulary. And even passive vocabulary is hard to define. Two possible improvement for these tests: (1) include "fake" words, and penalize users for selecting them, and (2) require users to provide actual evidence they understand the meaning, perhaps by asking them to choose which word should be used in a given context. Various other studies have used these approaches in the past.
s_allard wrote:
All these issues come to a head when one is preparing for language tests. There is this idea that the higher the
level the bigger the vocabulary. This is true in a sense but when I see figures such as one should know 20000
words to pass a C2 exam, I think this is totally wrong. |
|
|
Let's flip it around the other way. Assuming that a user has actually passed a C2 exam, how many words do they actually know? We can answer this, at least in general terms. Let's take the IELTS exams, which are supposedly pretty well-calibrated CEFRL exams. They publish a mapping between IELTS scores and CEFR levels, based on their own research:
We can see that C2 typically maps to IELTS 8 or 9. Now, given that somebody actually has actually taken IELTS and decided to try testyourvocab.com (yes, there's a sampling bias here), here are their average passive vocabulary sizes, estimated using word recognition without penalties for "recognizing" non-existant words, and with no requirement to supply a definition:
So given this particular measuring stick, we can say with some degree of confidence that if you score 5,000 on testyourvocab.com, you're relatively unlikely to score 8 or 9 on an IELTS exam.
shapd wrote:
There seems to be some confusion about how the vocabulary sizes in the original paper were derived. The paper itself is ambiguous, but as emk said, James Milton has developed a standard test using the first 5000 words in a frequency list. Part of the reason for this cut off is that it is very difficult to derive accurate frequencies above this number without huge corpuses of the right kind of material.
The test is a recognition ie passive test of a sample of the full list, so it is expressed as number of words/5000. A score of 3000 does NOT mean that the subject only knows 3000 words. The test has been shown to be reliable and reproducible, so can be used diagnostically. Ir will be less accurate as the level of competence increases, as a native speaker will presumably reach the 5000 limit, but it is useful for the lower levels. All the figures in the paper are from Milton's work, so his comments are all internally consistent. |
|
|
Yes, it's really difficult to build good corpora above 5,000 words. But the testyourvocab.com corpus is based on the British National Corpus (with both news and transcribed speech), and it uses a relatively sophisticated methodology, so this isn't just a random internet test put together by a bored programmer. It's clearly designed to be a reproducible, internally consistent metric, albeit one which gives relatively high numbers because of its lack of fake words and other internal controls.
And this is where my concern with James Milton's come in. I trust his results for the CEFRL A1–B1 range, because the 5,000 word cap can only introduce so much error at those levels. But I had 80+% coverage of the top 5,000 when I sat the DELF B2 (estimated by sampling pages in Routeledge's dictionary), and I had to guess a lot, though I scored fairly well. And when I run into unknown words in a French text today, they tend to be nowhere near the top 5,000.
Personally, if somebody has a passive French vocabulary of 5,000 words, I would recommend against sitting the DALF C2. Given the graphs above, and my personal estimates using frequency dictionaries over the years, I would recommend a bit more study first.
8 persons have voted this message useful
| Gemuse Senior Member Germany Joined 4082 days ago 818 posts - 1189 votes Speaks: English Studies: German
| Message 43 of 319 08 April 2014 at 4:33pm | IP Logged |
Regarding vocab, I've read papers that say that the avg college freshman has about
17,500 word families.
Test for English:
http://my.vocabularysize.com/
1 person has voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 44 of 319 08 April 2014 at 5:09pm | IP Logged |
I won't get into a discussion of sampling theory for vocabulary size right now, but I think that what must be
clearly established is that these common tests such as the one used here are very crude and, above all, indicate
at best words that one can minimally recognize. What exactly is knowing a word? And what about the ability to
use the word?
Again I should point out that this kind of test does not measure the actual size of the vocabulary of a sample of
users, which would be really interesting. Instead we are looking at a test of sampled vocabulary and the results
are extrapolated to all users.
And what does this mean for those of us who want to write exams such as the CEFR? Should you run out and buy
a frequency dictionary and study the first 20,000 words in language x if you want to sit the C1 exam and then
the next 10,000 words for the C2 level? I personally think that this is a totally wrong strategy, but people should
do what they want.
If you want to answer the question, how many words do you have to be able to recognize and be able to use at a
given test level, it would be more useful to look at what is called text coverage.
For a discussion of all these issues, here is
an excellent article
Here is a first quote:
"The good news for second language learners and second language teachers is that a small number of the words
of English occur very frequently and if a learner knows these words, that learner will know a very large proportion
of the running words in a written or spoken text. Most of these words are content words and knowing enough of
them allows a good degree of comprehension of a text."
But here is the really juicy quote:
"The significance of this information is that although there are well over 54,000 word families in English, and
although educated adult native speakers know around 20,000 of these word families, a much smaller number of
words, say between 3,000 to 5,000 word families is needed to provide a basis for comprehension. It is possible
to make use of a smaller number, around 2,000 to 3,000 for productive use in speaking and writing."
I highly recommend the article because it discusses in detail many of the issues we gloss over here.
Now back to our CEFR tests at the C level. How many words families do you have to know? It would seem
somewhere around 5000 or 6000 is all you need. Is learning a list of the most common 6000 words in French a
guarantee of passing the C2 exam in, let's say, French or Spanish? Certainly not.
The CEFR does not test for vocabulary size. It tests for the ability to use the language. This is where grammar,
syntax and interaction skills come into play. Because we are so obsessed with vocabulary size as a reflection of
education, culture and intelligence, we tend forget that it is how words are used that is really important.
Where are the studies of grammar mastery? How much grammar does one have to know at a given CEFR level?
And how does one measure the ability to make distinctions of meaning using prepositions? And how do you
measure the use of idioms.
Indeed, my experience with the CEFR exams, at least for Spanish, is that the higher the level, the issue isn't
vocabulary size - of course more is better - but the ability to play with the language fluently and confidently. I
would even go so far as to say that grammatical precision is more important than vocabulary. What's the point of
having a huge vocabulary if your verb tenses are all wrong or you are stumbling over prepositions or relative
pronouns?
3 persons have voted this message useful
| Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6597 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 45 of 319 08 April 2014 at 6:08pm | IP Logged |
Ideally, someone wanting to take a CEFR exam should read and listen so much that they can cope with comprehension tasks effortlessly. But many learners prefer to sweat and learn words in isolation and consume as little native materials as possible.
Did you see my post at all? The Finnish CEFR exams don't test for vocabulary size, but they do test you on a sample word list. For each word, you have three answers to choose from. You don't need to explain your choice or to define the words, and you can rely on associations. The right answers are from the same semantic field, and they can be synonyms, hypernyms, perhaps even antonyms etc.
The notion of head words differs from language to language. Finnish, German and Esperanto can mix and match noun stems as they please. To some extent English does this too - is "computer chair" a separate vocabulary item from its parts? What about "wallpaper"?
As for your final mostly rhetorical question, well, of course a horrible grammar can hinder communication, but so can glaring vocabulary holes. There's a tricky balance between showing how you can work around them and showing how much you do know.
And let me just remind you once again that casual learners tend to like learning the vocabulary more than grammar, and same applies to those who are learning out of necessity. But many of those driven by passion and curiosity actually enjoy grammar, including a large percentage of HTLAL'ers.
4 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 46 of 319 08 April 2014 at 8:37pm | IP Logged |
emk wrote:
...
So given this particular measuring stick, we can say with some degree of confidence that if you score 5,000 on
testyourvocab.com, you're relatively unlikely to score 8 or 9 on an IELTS exam.
...
Personally, if somebody has a passive French vocabulary of 5,000 words, I would recommend against sitting the
DALF C2. Given the graphs above, and my personal estimates using frequency dictionaries over the years, I would
recommend a bit more study first.
|
|
|
How large a vocabulary do you need to pass exams? Before answering this, it should be noted that no exam tests
for vocabulary size. Nowhere in the CEFR documentation is there anything about required vocabulary for given
levels. I am not as familiar with the IELTS exam but the wikipedia entry does not mention vocabulary size at all.
One should also note that the CEFR is an achievement type test, i.e. it tests for a given level whereas the IELTS is
an assessment test that returns a score indicating level of proficiency. The IELTS has an Academic Version for use
by educational institutions and the General Training version.
Since these exam systems do not measure vocabulary, what do they measure? Here is the wikipedia entry for
Band 8, Very Good User, :
"Has full operational command of the language with only occasional unsystematic inaccuracies and
inappropriacies. Misunderstandings may occur in unfamiliar situations. Handles complex detailed argumentation
well."
What is the minimum receptive (passive) and productive (active) vocabulary size that will allow us to handle this
situation? There are two schools of though here. The Maximalist school says you need at least a huge receptive
vocabulary in the 20,000 word families range and a big productive vocabulary - I haven't seen many precise
figures.
The Minimalist school, of which I'm a proponent, says that a relatively small receptive size - 5000-6000 word
families are enough because they provide adequate text coverage. As for productive vocabulary, a relatively small
figure in the 1500-2000 range can be enough.
It should be pointed out that there are still today major methodological issues about what constitutes a word and
how to measure vocabulary. Idioms are the big problem here. We know that the verb GET enters in many
compound constructions and idioms like "to get the hang of." How do we count these? Or the French adverb "tout
à fait." Is that a separate word or just three words?
The fundamental idea behind the Minimalist approach is that the exam does not measure vocabulary size; it
measures the ability to manipulate or handle the language using the vocabulary. While a large vocabulary is
probably better than a small one, skill in using language is not solely dependent on using many different word
families. Indeed, the ability to communicate effectively with a smaller vocabulary is indicative of good language
skills. Fine shades of meaning and nuances can be rendered in many ways and not only by adding more words.
Here is what the speaking test is like:
"Speaking
The speaking test contains three sections. The first section takes the form of an interview during which
candidates may be asked about their hobbies, interests, reasons for taking IELTS exam as well as other general
topics such as clothing, free time, computers and the internet or family. In the second section candidates are
given a topic card and then have one minute to prepare after which they must speak about the given topic. The
third section involves a discussion between the examiner and the candidate, generally on questions relating to
the theme which they have already spoken about in part 2. This last section is more abstract, and is usually
considered the most difficult."
Again, how many words does one need for this test? What is the examiner looking for? Good vocabulary is
obviously important, but good in the sense of appropriate and accurate. I suggest that good pronunciation,
grammatical precision, fluency (lack of hesitation and stumbling), good collocations and accurate use of idioms
are the key indicators that the examiner is looking for.
Unlike emk, I believe that a 5000-word receptive vocabulary and a solid 2000-word productive vocabulary are all
you need for a C1 level test. The more the better of course, but what I'm saying is that instead of trying to cram
10000 words into your head when preparing for the test, you might be better off making sure that the 2000
word productive foundation is rock-solid.
3 persons have voted this message useful
| Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6597 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 47 of 319 08 April 2014 at 11:25pm | IP Logged |
Maybe I'm wrong, but I'd say both me and emk and pretty much everyone but you is focusing on the vocabulary needed to actually be at that level, as described by the CEFR portfolio/checklists. A serious language learner should aim to pass the test comfortably, without relying on the similarity to sample tasks etc. If passing is possible with a smaller vocabulary*, that's a shortcoming of the test, and nothing to be enthusiastic about if you really care about CEFR and don't like when it's devalued.
*But this depends hugely on the test. You won't pass the Finnish CEFR vocabulary test with 5000 words under your belt, believe me. BTW, Dialang diagnostic tests also include both a vocabulary placement test (prior to any main test), and a specific vocabulary test.
Also, I agree that it may be enough to deliberately learn 2000 words actively and 5000 passively. Especially for an English speaker learning a Romance language. But you can't just ignore the cognates/borrowings like that. No way.
Neither are all CEFR tests purely achievement-based. I don't know how many times I've told you that the Finnish test is divided into basic/medium/high, where the possible results are X1, X2 and below X1, for each level. These are the results in each area, btw. Obviously you can't pass a level unless all skills are at this level or higher, but you can still show a potential employer that you are B1 in reading and writing, for example, even if you fail the rest. There are jobs where this is good enough.
Also, for many learners it's much easier to determine which range they're in than which specific test they are ready for. For example, I needed to listen to just one sample task to decide that B2 is too easy and boring for me.
Of course it's just one exam, and I don't know how many countries handle CEFR as sensibly as Finns do, but please stop generalizing based on the French and Spanish exams you're familiar with. Both of them are available in Finland btw, the principles/structure are the same as for Finnish :D You just need to be able to handle the instructions in Finnish or Swedish. So you can't even generalize about ALL Spanish/French CEFR exams.
I also have major reservations about the "rock-solid foundation". If you show variety, small errors in basic words can be forgiven. If you make it obvious that you have a relatively small vocabulary, you're driving yourself into a corner with no leeway for mistakes.
See for example how two of Shekhtman's principles are "decorating" and "showing off" (I forgot the exact wording).
Edited by Serpent on 09 April 2014 at 12:23am
4 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 48 of 319 09 April 2014 at 4:55am | IP Logged |
The main reason that people are so interested in measuring vocabulary size is the belief that both receptive and
productive vocabulary is an indicator of overall proficiency. If this is true then language testing could be
considerably simplified. One could just test for vocabulary.
Think of how all those complicated CEFR and IELTS exams could be reduced to tests of vocabulary lists. I'm sure
there are some people who are now thinking along those lines, but there are some major methodological issues
to be resolved before we get there.
One of the very basic issues is how to define a word. I won't attempt repeat all the arguments about what
constitutes a word but it is certainly much more complicated than just anything between two white spaces.
And then there is the whole issue of how meaning is connected to words. And what it means to know a word.
Does that mean being able to recognize a word, to know a definition or to be able to use it in various situations.
I'm very skeptical about measurements of receptive vocabulary because of all these caveats. When I read that a
typical adult university-educated English-speaker has a vocabulary of 30,000 to 40,000 word families in English,
I really don't know what this means.
For all these reasons I think the measurement of productive vocabulary is more interesting and reliable. Whereas
with receptive vocabulary we have to rely on complicated sampling schemes to estimate at best vocabulary sizes,
with productive vocabulary, it's relative easy to measure what people actually use. Here the concept of depth of
knowledge is useful, as outlined here from wikipedia:
"Depth of knowledge[edit]
The differing degrees of word knowledge imply a greater depth of knowledge, but the process is more complex
than that. There are many facets to knowing a word, some of which are not hierarchical so their acquisition does
not necessarily follow a linear progression suggested by degree of knowledge. Several frameworks of word
knowledge have been proposed to better operationalise this concept. One such framework includes nine facets:
orthography - written form
phonology - spoken form
reference - meaning
semantics - concept and reference
register - appropriacy of use
collocation - lexical neighbours
word associations
syntax - grammatical function
morphology - word parts"
When you look at actual productive vocabulary usage in various situations what is striking is how small it is
relative to receptive vocabulary. Daily conversations may use only a few hundred words whereas in an academic
environment it will be much bigger. But people only use a fraction of the words they claim to know. And the big
caveat here is how well are words used. Some people use words accurately and appropriately; they are easily
understood whereas other people use words poorly and are not easily understood. But they may be using the
same words.
I think we are too hung up with receptive vocabulary. Probably because the numbers are big. Productive
vocabulary, the stuff that you can really use, and preferably well, is, in my opinion, more difficult to acquire.
Receptive vocabulary takes care of itself through exposure. You learn words as you encounter them. Speaking
them or writing them well is the real challenge. And even when it comes to exam preparation, I believe that
productive vocabulary, the stuff that comes out of your mouth or your writing hand is more critical.
4 persons have voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3906 seconds.
DHTML Menu By Milonic JavaScript
|