319 messages over 40 pages: << Previous 1 2 3 4 5 6 7 ... 9 ... 39 40 Next >>
s_allard Triglot Senior Member Canada Joined 5429 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 65 of 319 10 April 2014 at 4:21am | IP Logged |
I just want to point out an excellent article co-authored by Paul Motion, arguably the preeminent researcher on
vocabulary size testing in English. 7. An investigation of the
lexical dimension of the IELTS Speaking Test.
The reason I'm so excited about this paper is that it studies the real output of candidates at various levels of
proficiency. Instead of wild claims of what people should or do know, here we have trustworthy numbers of the
words that people really use. I draw the readers' attention to page 10 where the authors present the average
number of distinct words according to the different proficiency bands. At the highest band represented, Band 8
(out of 9) the average output is 1491.0 different words. As to be expected, this number decreases as we go
down the proficiency scale.
The lesson here is that regardless of all kinds of numbers concerning receptive vocabulary necessary to sit these
tests, when it comes to actual speaking proficiency, only a small number of words are actually used. Before
somebody starts yelling that productive vocabulary is normally much smaller than receptive vocabulary, I wish to
say that my point here is that it seems possible to achieve the next-to-highest proficiency level with only about
1500 words of productive vocabulary.
A great thing about the article is that is discusses at length some of the qualitative issues that go with
vocabulary. In particular there is an excellent discussion of use of formulaic language such as collocations and
idioms.
Edited by s_allard on 10 April 2014 at 1:35pm
2 persons have voted this message useful
| Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6596 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 66 of 319 10 April 2014 at 5:56am | IP Logged |
I don't think anyone disagrees that you can pass an exam by using only 1500 words. The question is how many more words you need to have available - to be at the actual level or even just to pass a different year's test with the same ease.
As for the original article, the main point was whether the GCSE, A-Levels and the level upon graduation match the official standards. Given the difficulty of making students take actual CEFR tests repeatedly, I think estimating vocabulary size is an acceptable method of comparison. Do you really think the learners in question reach C1?
Edited by Serpent on 10 April 2014 at 5:57am
2 persons have voted this message useful
| Ari Heptaglot Senior Member Norway Joined 6581 days ago 2314 posts - 5695 votes Speaks: Swedish*, English, French, Spanish, Portuguese, Mandarin, Cantonese Studies: Czech, Latin, German
| Message 67 of 319 10 April 2014 at 7:13am | IP Logged |
From the study linked to by s_allard:
Quote:
As noted in the results of the statistical analyses, the candidates at Band 8 produced substantially more words as a group than did those at lower proficiency levels. However, the quality of their vocabulary use was also distinctive. This was reflected partly in their confident use of low frequency vocabulary items, particularly those associated with their employment or their leisure interests. |
|
|
Who the heck would manage to cram their entire active vocabulary into a single test? Naturally the number of words acually used during a test is only a portion of the total number of words known.
3 persons have voted this message useful
| Jeffers Senior Member United Kingdom Joined 4908 days ago 2151 posts - 3960 votes Speaks: English* Studies: Hindi, Ancient Greek, French, Sanskrit, German
| Message 68 of 319 10 April 2014 at 8:38am | IP Logged |
Ari wrote:
From the study linked to by s_allard:
Quote:
As noted in the results of the statistical analyses, the candidates at Band 8 produced substantially more words as a group than did those at lower proficiency levels. However, the quality of their vocabulary use was also distinctive. This was reflected partly in their confident use of low frequency vocabulary items, particularly those associated with their employment or their leisure interests. |
|
|
Who the heck would manage to cram their entire active vocabulary into a single test? Naturally the number of words acually used during a test is only a portion of the total number of words known. |
|
|
In addition, notice that the total sample size is only 22,366. If you were to raise the sample size to 100,000 you would no doubt find a larger number words used.
2 persons have voted this message useful
| luke Diglot Senior Member United States Joined 7204 days ago 3133 posts - 4351 votes Speaks: English*, Spanish Studies: Esperanto, French
| Message 69 of 319 10 April 2014 at 10:06am | IP Logged |
Ari wrote:
From the study linked to by s_allard:
Quote:
As noted in the results of the statistical analyses, the candidates at Band 8 produced substantially more words as a group than did those at lower proficiency levels. However, the quality of their vocabulary use was also distinctive. This was reflected partly in their confident use of low frequency vocabulary items, particularly those associated with their employment or their leisure interests. |
|
|
Who the heck would manage to cram their entire active vocabulary into a single test? Naturally the number of words acually used during a test is only a portion of the total number of words known. |
|
|
Bingo!
2 persons have voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6702 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 70 of 319 10 April 2014 at 10:35am | IP Logged |
I have localized the thread where I described my three months word count: "All I need to know is 2500 words". There is one curious detail (which I almost had forgotten): I had to divide the corpus into three parts, and at first I only calculated statistics on the first part. Result: 1200 unique words. Then I remembered the next two pages, and the number rose to 2400 - and not to 3 x 1200. This is logical since a larger sample will include more repetitions, and therefore an enormous sample won't contain a correspondingly ginormous number of unique words - however I do know that my used active vocabulary in- and outside HTLAL must be considerably higher than the 2400, and my potentially active vocabulary must be even higher - but nobody can say how much higher, not even myself.
But soon the old study will be overtaken by a new and larger one. I have copied the content of my Multiconfused Log to something like a dozen Wordfiles, and yesterday evenening I sorted the contents of two of those files according to language used, and those two files alone (which contain roughly 8 months of data from November 2008) contain more than 30.000 English words and just as much in other languages, if not more. I evidently can't deliver a result right now, but with the new material I'll be able to show the relation between sample size and unique words, and at least in principle I'll also be able to couple frequency and coverage - at least for votre humble serviteur, when I write in my log (which of course is different from me when I am involved in discussions).
There is another aspect: when I'm through all the Word files I'll not only have an immense corpus in English, but also sizeable corpora in at least German, French and a few languages more. And if I - as planned - divide my English corpus into chunks so that I can make a curve for the relationship between corpus size and unique words found, then it should in principle be possible to assess the expected number of unique words for corpus sizes that correspond to those I have for for instance Dutch and Latin.
Of course all this should have been done on a scientific basis and for more than one person (who happens to be the researcher), but while we wait for the professionals my study will at least be a step in the direction of getting some hardcore facts behind the discussions about the size of active vocabularies.
PS: See also the parallel post in my multiconfused log page 448.
Edited by Iversen on 10 April 2014 at 12:09pm
5 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5429 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 71 of 319 10 April 2014 at 1:28pm | IP Logged |
Jeffers wrote:
Ari wrote:
From the study linked to by s_allard:
Quote:
As noted in the results of the statistical analyses, the candidates at Band 8 produced substantially more
words as a group than did those at lower proficiency levels. However, the quality of their vocabulary use was also
distinctive. This was reflected partly in their confident use of low frequency vocabulary items, particularly those
associated with their employment or their leisure interests. |
|
|
Who the heck would manage to cram their entire active vocabulary into a single test? Naturally the number of
words acually used during a test is only a portion of the total number of words known. |
|
|
In addition, notice that the total sample size is only 22,366. If you were to raise the sample size to 100,000 you
would no doubt find a larger number words used. |
|
|
Now we have some real meat for a discussion. There are three kinds of vocabulary in play here:
1. The receptive vocabulary that is the set of distinct word families than one can recognize (know one definition
of)
2. The productive vocabulary that one can recall instantly and use correctly.
3. The measured vocabulary used in a specific context, i.e. a test, writings over a given period, language spoken
over a month, etc.
For 1 and 2, we use estimates based on various tests. I won't argue at this point over the validity of these
estimates but I should point out that to my knowledge, because of logistical barriers there are no studies that
have actually counted receptive and productive vocabulary. I also believe that there are fundamental
methodological issues of definition of what is a word that make these studies problematical but that is another
debate.
While we have lots of studies and figures for 1 and 2, we have very few for 3. Although there are certainly others
out there, for the time being we have Iversen's and this one from Nation and Head. In both cases, what is striking
is that the numbers seem relatively low.
Nobody is arguing that these figures represent the entire productive vocabulary of these individuals. What we see
is that in a specific context of use, HTLAL posts and an oral proficiency test, these are some indications of real
working vocabulary sizes.
One could imagine other contexts, such as all the words that one has used in crossword puzzles over the last 10
years. But what we have here is numbers that are more relevant to us. If I only had the courage, I would
undertake a study of my own HTLAL posts, but I'm afraid that will have to be a retirement project.
The key finding in the Nation study is that you can pass the IELTS Band 8 speaking proficiency test using less
than 1500 words. That's an average for 15 individuals speaking for the allowed duration of the test for a total
number of 22,366 tokens. More individuals speaking for a total of 100,000 tokens would not have made a
difference.
Those who read the report will have noticed that the authors highlight how the high-performing individuals used
a combination of high-frequency and low-frequency words. This is a measure of sophistication. They also used
more formulaic language.
In my opinion the significance of all this is that it puts the lie to all these extravagant claims about the number of
words one needs to pass these kinds of tests. Mutatis mutandi I would say that you could probably pass a C1
oral proficiency language test using less than 1800 words. I said using not knowing.
Regular readers of my posts know that I have been saying this all along. You don't have to know a vast number of
words to speak a language. You have to master the core high-frequency elements that contain most of the
grammatical function words and then add whatever low-frequency specialized and content-rich words that are
needed.
2 persons have voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6702 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 72 of 319 10 April 2014 at 2:01pm | IP Logged |
I would formulate it differently: you can travel around with very few words (5 words: where, toilet, buy, that-one, please), and it is actually difficult to find opportunities to use an extremely large number of words - as shown by my own pilot test from 2006. BUT unless you have a solid reservoir of potentially active words and expressions you will have trouble formulating complex or specialized thoughts, and unless you have an even larger stock of passive words and expressions you will have trouble understanding genuine speech (and even more: writing) because precisely those 5 or 10% of the words you don't know are those that contain the essence of the content.
I agree with s_allard that people actually can get by using very few unique words, but I can also see that the languages I can read fluently without a lexicon or translation are those where I know 20.000 words or more according to my own word counts. In languages with less (like Russian and Latin) I run into so many obstacles that it disturbs the easy flow.
PS: Shakespeare's output is evaluated here - around 29.000 words in his collected works. Not bad, but due to the different structure of Swedish their national icon Strindberg allegedly comes out with 119.000 words if his works are run through the same program. Which just goes to show that you have to be very careful with your definitions.
Edited by Iversen on 10 April 2014 at 2:33pm
4 persons have voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3438 seconds.
DHTML Menu By Milonic JavaScript
|