210 messages over 27 pages: << Previous 1 2 3 4 5 6 7 ... 17 ... 26 27 Next >>
montmorency Diglot Senior Member United Kingdom Joined 4828 days ago 2371 posts - 3676 votes Speaks: English*, German Studies: Danish, Welsh
| Message 129 of 210 21 August 2012 at 2:46pm | IP Logged |
I mentioned David Crystal, who is always interesting to read, even if he doesn't provide
easy answers usually. This is a fairly old article (1987). Nevertheless it might cover
some useful points, regarding English:
http://www.davidcrystal.com/DC
_articles/English83.pdf
Edited by montmorency on 21 August 2012 at 2:49pm
1 person has voted this message useful
| Serpent Octoglot Senior Member Russian Federation serpent-849.livejour Joined 6597 days ago 9753 posts - 15779 votes 4 sounds Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish
| Message 130 of 210 21 August 2012 at 2:58pm | IP Logged |
Peregrinus wrote:
Serpent wrote:
But 5000 words? pfffft. atama warui did say it's probably 5000 for Japanese and 2500 for a European language! And in most languages you get a significant fraction of these 2500 words for free, with the notable exceptions of Finnish, Icelandic etc. If the language is even more closely related to one you speak/understand, you get even more at a discount, and you totally should either use a course aimed at speakers of this language or dive straight in and get tons of input - among other things, in order to add the theoretically known words to your real passive vocab. To state it more explicitly: the importance of a "proper language course" depends on how similar the language is to your native one and other languages you speak. |
|
|
Serpent,
I am basing that figure, for the reasons I gave, on the level one needs for 95%+ coverage for extensive reading (though of course one can still use ER with less but just not have the process be as fast with more lookups required). And that 5000ish figure *includes* cognates, even those that have to be reasoned out a little, and is for closely related languages to English. And regarding Japanese, I seem to remember atama warui and others positing an even higher figure like 8000+.
Since courses most often recommended here like Assimil and FSI seem only to take one to the 2000+ mark, the question is then whether intensive vocabulary study is faster and more efficient to get to a higher 5000 or whatever mark than a slower process via extensive methods where one is not getting above 90% coverage maybe, again including cognates, i.e which is better to bridge the gap.
This issue of vocabulary coverage levels needed for a certain percentage coverage for extensive methods is an interesting one that has been discussed previously in threads as well as here, but not really the point of this particular thread though I was the one who started talking about ER. Perhaps it would be of benefit to discuss it in a separate thread and try to nail down both general requirements estimated by experts like Dr. Arguelles, and those specific to various languages. |
|
|
The thing is that cognates/borrowings from the same source aren't evenly distributed between the "levels". The core vocabulary is full of irregularities and surprises, while the "newspaper words" and "themed vocab" are usually more international. The first 2000 will be less transparent, but even there you can learn a lot from real texts if you know the content and/or grammar.
IDK, for me it's not a choice of what to do after completing a coursebook. It's a choice of when and how much to supplement the genuine materials with more formal/traditional resources. And the choice of the media is even more important. E.g. in Germanic languages some words can be easier to guess in writing and some from the sound of them, so it's a good idea to use both if just one isn't enough (or you don't feel you're at the right level to limit yourself to reading).
2 persons have voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 131 of 210 21 August 2012 at 3:29pm | IP Logged |
Iversen wrote:
...
s_allard wrote:
To my knowledge, there are very few cases of individuals who have systematically measured the number of words they use over a given time. This basically means wearing a recording device for, let's say a week, and creating a record of every word spoken. I know that some scientists have done this. But @iversen is one of the few people who has attempted to do this with emails if I recall correctly. |
|
|
You are almost there - I counted the number of unique words I had used in a three months period here at HTLAL (and landed at around 2400 'headwords'). |
|
|
Since @mooby has presented my position fairly well, I will not repeat myself. Let's move on.
I think it's very interesting to note that @iversen arrived at only 2400 headwords in three months here at HTLAL. This is measured active vocabulary. We all know that he has a much larger passive vocabulary, measuring in the tens of thousands. What is the significance of a number like 2400?
I will certainly not answer for him. But I suspect that for most people, the true active vocabulary, especially spoken, is quite low and reflects the nature of their daily activities.
Let's say I also use 2400 headowrds in my posts. But they are not identical to @iversen's 2400, although they share a common core of around 1500. Now, if we were to pool the posts of all the contributors in English, how many headwords would we have in our data set? Let's say 8000.
This means that for us to "understand" each other we all have to have a passive vocabulary of at least 8000 words. No one person has to have an active vocabulary of 8000 words. Maybe some people use only 500 words (notice I didn't say 300). Others may use 6000.
What is the minimum number of words that you need to be considered a serious contributor to HTLAL? I'm not asking what is the minimum number of words you need to understand HTLAL with 95% coverage.
I won't dare put a figure down because the heavens might fall on me. But in all seriousness, I suggest that the question is basically unimportant. What is important is how the words are used.
This is a whole different can of worms because we now have to look at issues of syntactic construction, number of mistakes, intelligibility, long-windedness, sophistication, etc that are not easily measured.
By the way, these sorts of figures probably explain why the vocabulary size necessary for the C1-C2 exams seem so low, as remarked by @emk. An active vocabulary of 4500 words for the French C2 or 5000 for the English C2 using the Cambridge exams may strike us low, but in reality isn't that reflective of real usage?
Edited by s_allard on 21 August 2012 at 4:09pm
4 persons have voted this message useful
|
Iversen Super Polyglot Moderator Denmark berejst.dk Joined 6703 days ago 9078 posts - 16473 votes Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian Personal Language Map
| Message 132 of 210 21 August 2012 at 4:45pm | IP Logged |
There is still one caveat. If all members here at HTLAL collectively have used 8000 words during a 3 month period, it still doesn't mean that you are safe with 8000 passive words because you can be fairly certain that it won't be the same 8000 words. How much you need more is anybody's guess, but let's just assume that the number of words you need is distinctly above 10.000 words. And then we haven't taken in account that you also need words for discussions with spouses, kids, collegues and sundry professionals outside HTLAL and maybe even for reading highbrow literature like Shakespeare or Barbara Cartland.
The lesson I draw from this is that you need to learn a lot of words passively, and you need to do it as early as possible in your language learning career - even if that means that you have to spend time on things like SRS or wordlists instead of just hanging out with your friends or watching TV.
But getting a large passive vocabulary is only half the story, because you also have to activate some of those words, which implies that you must find a way to use them. And there hanging out with your chums is an excellent way, provided that you can convince them to speak in your target language about a vast array of topics. For those who have got fewer (relevant) personal contacts writing stuff or speaking through Skype are good alternatives. But even with these activities it will take time to activate a substantial amount of your passive words, and it is not at all unlikely that you only use 300 different words in three months' time in a new language. But this doesn't mean that 300 words in any sense is enough. They are only a fraction of the words you should be ready to use, and these are again only a tiny fraction of the words you should be prepared to recognize.
My strategy is to boost my passive vocabulary massively, because I then gain access to genuine materials where I can learn how to use 'my' words. I don't expect to use them all from day one. And that's diametrally opposite the teaching methods where a teacher supply the pupils with words in small graduated doses, but assume that they remember them all and can use them all without notice.
Edited by Iversen on 21 August 2012 at 5:02pm
5 persons have voted this message useful
|
emk Diglot Moderator United States Joined 5532 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 133 of 210 21 August 2012 at 5:15pm | IP Logged |
s_allard wrote:
By the way, these sorts of figures probably explain why the vocabulary
size necessary for the C1-C2 exams seem so low, as remarked by @emk. An active
vocabulary of 4500 words for the French C2 or 5000 for the English C2 using the
Cambridge exams may strike us low, but in reality isn't that reflective of real usage?
|
|
|
I can easily believe that an active vocabulary of 4000–5000 words is enough to pass
many C2 exams. But those 4500–5000 word estimates came from that Miller book you
mentioned, and he's trying to count passive vocabulary, not active vocabulary. To be
precise, he's using X-Lex, which tests students' ability to discriminate between words
and non-words. (Frankly, that's a pretty weird metric, now that I think about it. But
it's definitely closer to "passive" than to "active".)
Also note that you can't accurately judge the size of my active vocabulary by counting
all the words I use in (say) a year. Consider the following hypothetical example: A
friend of mine who draws comic books shows me some some illustrations with murky dark
backgrounds and sharply illuminated, three-dimensional figures. I say, "Hey, cool,
that's a really neat chiaroscuro effect you've got there." Now, you could have recorded
everything I said or wrote for the past 5 years and never detected that I know
"chiaroscuro". But if I see that particular artistic effect, the word is on the tip of
my tongue. And Iversen's active vocabulary is surely greater than the 2,500 words he
actually used here on HTLAL, especially given his love of scientific publications.
Anyway, here's another fun piece of data. It's a pie chart showing how big your
vocabulary needs to be in order to understand movie subtitles:
Percentage of Words Understood, by vocabulary frequency
(based on movie and TV subtitles)
(Methodology: I used the same 50K-word subtitle data sets that I used in post 75, and
stemmed the words using the Snowball English stemmer, which is more accurate than the
Porter stemmer I used before. "1–300" means the 300 most common words, "301–1000" means
the 700 words after that. After stemming, the data set covered 33,355 root forms.
Obviously, a lot of the rarer words in this data set will be proper names and
misspellings.)
Or in tabular form:
Quote:
Range Coverage
1..300 73.45%
1..1000 86.11%
1..2500 92.77%
1..5000 96.17%
1..10000 98.27%
1..20000 99.47%
1..33355 100.00% |
|
|
So if you want to know 98% of the words before doing extensive TV watching, you'll need
a vocabulary between 5,000 and 10,000 words. Honestly, I think that's a bit high: You
could probably get a lot out of TV while knowing less than 98% of the words. I also
suspect that the numbers are much better for a single TV series, because you will
quickly pick up the vocabulary used by a show.
Interestingly, one of the criteria for C2 is "can understand with ease virtually
everything heard or read". I don't think you could claim this if you knew 5000 words.
Judging from the table above, that would only give you 96.17% of the words in an
average TV show or movie, which is pretty bad. Either 10,000 or 20,000 words looks much
more promising: That would be 98.27% or 99.47%, respectively.
I guess this just goes to show that I'm not convinced by Miller's claim that you can
pass a real C2 exam while only recognizing 4,500 words. I've calculated things about 5
different ways, and I keep seeing indications that C2 should fall somewhere between
10,000 and 20,000 words of passive vocabulary. And this corresponds nicely with typical
vocabulary sizes after several years of full-time immersion, if I remember the research
correctly.
I find these numbers pretty fascinating, because they explain so much about my
listening comprehension and what I need to do to improve it.
4 persons have voted this message useful
| Peregrinus Senior Member United States Joined 4492 days ago 149 posts - 273 votes Speaks: English*
| Message 134 of 210 21 August 2012 at 5:39pm | IP Logged |
Iversen wrote:
And that's diametrally opposite the teaching methods where a teacher supply the pupils with words in small graduated doses, but assume that they remember them all and can use them all without notice.
|
|
|
The teachers I remember from high school and college over studying four different languages (all living ones), all relied almost soley on passive methods for testing, and only one in a test personalized for me, required an active speaking component. I don't think I even remember their playing any tapes during an exam to test listening comprehension.
Also, I wonder how many students these days use SRS to study vocabulary throughout a semester and may actually retain the majority of that knowledge past cramming for exams.
@emk,
Thanks again for taking the time to produce another excellent analysis. Is there a way to factor in Serpent's point on cognates/loanwords on a language to language specific basis?
1 person has voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 135 of 210 21 August 2012 at 5:55pm | IP Logged |
Although @iversen and I have had some of these discussions before, it is always refreshing to debate with someone serious and methodical. I won't revisit this questions of 300 words because I don't believe that it's important.
Where I think @iversen and I disagree fundamentally is how we see the learning path that leads to language proficiency. I tend to emphasize actual listening and speaking skills and less reading or writing. So, right away my lexical targets are lower because we know that less words are used in the spoken than in the written language.
More importantly, however, is the fact that and emphasis on speaking forces one to confront directly issues of how to produce meaningful utterances spontaneously and correctly with decent pronunciation.
My approach to this is to specifically concentrate on mastering the structural elements that will allow me to put the words together instantly and properly. Vocabulary is part of this, of course, but I believe will expand spontaneously to fill the need.
This means that I'll concentrate on the core elements of the language: the 100 most important verbs and the most important conjugations, word order, the most important prepositions, known pitfalls and common mistakes, difficult spots that I have identified, discourse markers, register markers, degrees of politeness, elements of casual speech and slang, regional particularities, etc. I emphasize lots of repeated listening with transcripts and working with a native speaker who corrects me on the spot and can answer all my questions about real usage.
I find that I can have fun with relatively little knowledge. I am able to easily handle transactions with confidence in Spanish stores locally because I have practiced these interactions specifically. I wouldn't call this touristy phrase-book stuff like pointing at menus. It doesn't take much to be able to interact with a shopkeeper or a waiter. I can say what the other person expects to hear and I have a good idea what the person is going to say. How many words does it take to do that? Not much, but it helps to be able to use them smoothly, correctly and with a decent accent.
When I need to enhance my speaking skills with more vocabulary, I do what everybody does. I'll read, make flashcards and lists, etc. I also will deliberately do things like look at a cooking show or a sports broadcast to see how a specialized vocabulary is used.
So, I think @iversen are heading in the same direction on parallel paths with different individual emphases and styles. I don't see a problem.
1 person has voted this message useful
| s_allard Triglot Senior Member Canada Joined 5430 days ago 2704 posts - 5425 votes Speaks: French*, English, Spanish Studies: Polish
| Message 136 of 210 21 August 2012 at 6:23pm | IP Logged |
emk wrote:
s_allard wrote:
By the way, these sorts of figures probably explain why the vocabulary
size necessary for the C1-C2 exams seem so low, as remarked by @emk. An active
vocabulary of 4500 words for the French C2 or 5000 for the English C2 using the
Cambridge exams may strike us low, but in reality isn't that reflective of real usage?
|
|
|
I can easily believe that an active vocabulary of 4000–5000 words is enough to pass
many C2 exams. But those 4500–5000 word estimates came from that Miller book you
mentioned, and he's trying to count passive vocabulary, not active vocabulary. To be
precise, he's using X-Lex, which tests students' ability to discriminate between words
and non-words. (Frankly, that's a pretty weird metric, now that I think about it. But
it's definitely closer to "passive" than to "active".)
Also note that you can't accurately judge the size of my active vocabulary by counting
all the words I use in (say) a year. Consider the following hypothetical example: A
friend of mine who draws comic books shows me some some illustrations with murky dark
backgrounds and sharply illuminated, three-dimensional figures. I say, "Hey, cool,
that's a really neat chiaroscuro effect you've got there." Now, you could have recorded
everything I said or wrote for the past 5 years and never detected that I know
"chiaroscuro". But if I see that particular artistic effect, the word is on the tip of
my tongue. And Iversen's active vocabulary is surely greater than the 2,500 words he
actually used here on HTLAL, especially given his love of scientific publications.
.... |
|
|
Who said that @iversen had an active vocabulary of 2,500 words? I simply pointed out that he himself counted 2400 headwords in his posts over a three month period here at HTLAL. Nothing more.
But there's an interesting methodological questions here. If you can't accurately judge the size of an individual's vocabulary by recording everything he said for a year or 5 years, just how would you go about it?
I think most scientists would use some sort of sampling method. We take a cohort of x number of people and follow their lexical behaviour for a given period that we think is sufficient for purposes of this study. Of course, we'll miss those cases where a person uses a word like chiaroscuro outside our sample period.
Does this missing word invalidate our estimate of the vocabulary size of our sample? No, we simply say that we believe that the estimated active vocabulary size of our subjects is xx with an error factor of n percent. Basic statistics.
My only reason for using the statistic provided by @iversen himself was to show what an actual study of measured active vocabulary would look like. As I said, @iversen is actually the only person I know here at HTLAL who has done this.
What is the alternative to this? What most studies of individual usage do is simply ask people if they use the word. We can all see the limitations of that method.
1 person has voted this message useful
|
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3906 seconds.
DHTML Menu By Milonic JavaScript
|