Register  Login  Active Topics  Maps  

Is counting your vocabulary size useless?

 Language Learning Forum : General discussion Post Reply
210 messages over 27 pages: << Previous 1 2 3 4 5 6 7 ... 10 ... 26 27 Next >>


Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6703 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 73 of 210
20 August 2012 at 10:53am | IP Logged 
s_allard wrote:
Here is a snippet taken from the site: http://www.englishforums.com/English/TranscriptLightHeartedC onversation/dwnbp/post.htm. It lasts about a minute. No big vocabulary here. Look at the number of times the words "do, the, think, you, should, love, I" are repeated. Pretty simple stuff,..


I put that snippet into Word, cleaned it up and put each word on a line of its own. Then I put the result into Excel and removed the duplicates. All in all this short piece of text contains 113 words (from a running text of 252 words), including two abbreviated proper names and some compound words like "you'd" and "you've" and abbreviations like " 'cause" which a learner must learn as lexical units. So just this short snippet uses more than a third of the famous 300 words.

Then I took the first 252 words from Shakespeare's sonnets and did exactly the same thing to them - result: 163 words (including 3 words with a regular genitival "'s"). In other words Shakespeare seems to be using roughly 1½ times as many different words as SM and JK in their conversation. Basing a conclusion on just two short texts would be foolhardy, but here and now I haven't got time to take the logical next step, namely to count unique words AND - not least - repetitions across samples in a lot of text samples from SM/JK resp. the bard. That would show once and for all whether Sheakespeare only uses one and a half times as many different words as SM and JK discussing transport and fees from BBC and a few other things. Or taken from the opposite direction: whether two random persons in a random conversation actually use 2/3 of the number of different words used by UK's greatest poet in some of his most important works.

Diligent literats with 'puters have calculated that "In his complete works, Shakespeare used 31,534 different words and a grand total of 884,647 words counting repetitions" (Bennett, Briggs, Triola, Second Edition, Addison Wesley Longman, 2002 ). As you see the percentage of different words is lower across such a large corpus than it is in a short text (actually you can estimate the number of 'extra' words if Shakespeare had written 884.647 words more - see the link for details). In principle SM and JK would keep the same proportion of new words compared to Shakespeare in large as they do in small samples. So let's assume that the two guys discuss until they have uttered 884.647 words (or died of sheer exhaustion). The big burning question then is whether they start repeating themselves, or whether they at the end of the exercise actually have used around two thirds of the number of words used by Shakespeare - i.e. at least 20.000 different words (including proper names, contractions etc.)

I don't know the answer, although my guess is that they start repeating themselves before Shakespeare. But I'm also fairly sure that they end up using far more than 300 words.

Edited by Iversen on 20 August 2012 at 1:37pm

5 persons have voted this message useful



Peregrinus
Senior Member
United States
Joined 4492 days ago

149 posts - 273 votes 
Speaks: English*

 
 Message 74 of 210
20 August 2012 at 12:17pm | IP Logged 
s_allard wrote:

As for demonstrating the scientific validity of my position, I think it is everywhere around us. I see it every day. The people who are able to speak easily and correctly are not the ones with the biggest vocabulary. They are the ones who have learned to actually speak.



Anecdotal "evidence" in general doesn't demonstrate anything scientifically, although it does provide a source of hypotheses. The exception *possibly* being such assessments made by experts like Prof. Arguelles.

BTW, how do you know the vocabulary level of those people?
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5532 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 75 of 210
20 August 2012 at 1:21pm | IP Logged 
s_allard wrote:
I think that one of the reasons many people are so skeptical about the
number of words needed to conduct a conversation is that they have never studied real
conversations or looked at a transcript of a real conversation. Here is a snippet taken
from the site: http://www.englishforums.com/English/TranscriptLightHeartedC
onversation/dwnbp/post.htm. It lasts about a minute.



SM: You can see these pictures online if you care. As a Radio One DJ, eh, do you think
that I should have a better car. And if so, what would you suggest?
JK: Well, I think, seeing as you’ve been playing my single, I think that you should
have a better car and more money. In fact, I think that you should have more money…
SM: What, do I get some of the royalties?
JK: …and the BBC should - Don’t worry, even I don’t get those, I owe them so much down
there, but, ehm, - video cost, dear boy, video cost - …but I’ll tell you what, ehm, I
think you should, ehm, I think you should have a company car.
SM: Yeah, they don’t do them.
JK: But why don’t you get one of these little electric bubble things everyone is
driving around…
SM: What, a Smart car?
JK: No, not the Smart car, you know the really tiny little that you plug it in.
Other: Oh, I’d love to, I’d just love to see one of those.
SM: Like, like, like when you charge your phone, one of them.
JK: Yeah, exactly, yeah. It’s only like, I don’t know, three quid a month to run it, or
something.
SM: What, and you’d drive one of those, would you?
JK: I would drive one, yeah, if I lived in London. D’you know, when I go into London,
‘cause I drive around a scooter, yeah. Drive a little Vespa.


I did a little experiment with this conversation. Methodology:

1. I took a list of the 50,000 most-used words, based
on movie subtitles. This should be mostly conversations, plus a tiny amount of
narration. Movie dialog isn't completely natural, but it's probably better than any
other list I could use.

2. I converted each word to a Porter Stem. This means I
treat inflected forms (run/runs, horse/horses) as the same word. This simulates a basic
knowledge of grammar.

3. I recalculated word frequencies using the Porter stems.

4. I allowed the following international brands and speech noises as "freebies": BBC
London Vespa eh ehm.

Here's what the conversation looks like with 300 words:

Quote:
SM: You can see these XXXXXXXX XXXXXX if you care. As a XXXXX One XX, eh, do you
think that I should have a better car. And if so, what would you XXXXXXX?
JK: Well, I think, seeing as you've been playing my XXXXXX, I think that you should
have a better car and more money. In XXXX, I think that you should have more money…
SM: What, do I get some of the XXXXXXXXX?
JK: …and the BBC should - Don't worry, even I don't get those, I XXX them so much down
there, but, ehm, - XXXXX XXXX, XXXX boy, XXXXX XXXX - …but I'll tell you what, ehm, I
think you should, ehm, I think you should have a XXXXXXX car.
SM: Yeah, they don't do them.
JK: But why don't you get one of these little XXXXXXXX XXXXXX things XXXXXXXX is
XXXXXXX around…
SM: What, a XXXXX car?
JK: No, not the XXXXX car, you know the really XXXX little that you XXXX it in.
Other: Oh, I'd love to, I'd just love to see one of those.
SM: Like, like, like when you XXXXXX your XXXXX, one of them.
JK: Yeah, XXXXXXX, yeah. It's only like, I don't know, three XXXX a XXXXX to run it, or
something.
SM: What, and XXXXX XXXXX one of those, would you?
JK: I would XXXXX one, yeah, if I lived in London. XXXXX know, when I go into London,
‘XXXXX I XXXXX around a XXXXXXX, yeah. XXXXX a little Vespa.


Frankly, that looks like a pretty frustrating conversation.

Here's what it looks like with 1000 words:

Quote:
SM: You can see these pictures XXXXXX if you care. As a Radio One XX, eh, do you
think that I should have a better car. And if so, what would you XXXXXXX?
JK: Well, I think, seeing as you've been playing my single, I think that you should
have a better car and more money. In fact, I think that you should have more money…
SM: What, do I get some of the XXXXXXXXX?
JK: …and the BBC should - Don't worry, even I don't get those, I owe them so much down
there, but, ehm, - XXXXX XXXX, dear boy, XXXXX XXXX - …but I'll tell you what, ehm, I
think you should, ehm, I think you should have a company car.
SM: Yeah, they don't do them.
JK: But why don't you get one of these little XXXXXXXX XXXXXX things everyone is
driving around…
SM: What, a Smart car?
JK: No, not the Smart car, you know the really XXXX little that you XXXX it in.
Other: Oh, I'd love to, I'd just love to see one of those.
SM: Like, like, like when you charge your phone, one of them.
JK: Yeah, exactly, yeah. It's only like, I don't know, three XXXX a month to run it, or
something.
SM: What, and you'd drive one of those, would you?
JK: I would drive one, yeah, if I lived in London. XXXXX know, when I go into London,
‘cause I drive around a XXXXXXX, yeah. Drive a little Vespa.


You could manage one-on-one conversation at this level with some cooperation and
pantomime.

And 2,500:

Quote:
SM: You can see these pictures XXXXXX if you care. As a Radio One XX, eh, do you
think that I should have a better car. And if so, what would you suggest?
JK: Well, I think, seeing as you've been playing my single, I think that you should
have a better car and more money. In fact, I think that you should have more money…
SM: What, do I get some of the XXXXXXXXX?
JK: …and the BBC should - Don't worry, even I don't get those, I owe them so much down
there, but, ehm, - video cost, dear boy, video cost - …but I'll tell you what, ehm, I
think you should, ehm, I think you should have a company car.
SM: Yeah, they don't do them.
JK: But why don't you get one of these little electric XXXXXX things everyone is
driving around…
SM: What, a Smart car?
JK: No, not the Smart car, you know the really tiny little that you XXXX it in.
Other: Oh, I'd love to, I'd just love to see one of those.
SM: Like, like, like when you charge your phone, one of them.
JK: Yeah, exactly, yeah. It's only like, I don't know, three XXXX a month to run it, or
something.
SM: What, and you'd drive one of those, would you?
JK: I would drive one, yeah, if I lived in London. XXXXX know, when I go into London,
‘cause I drive around a XXXXXXX, yeah. Drive a little Vespa.


Now you're down to a handful of "excuse me?" questions.

And 5,000:

Quote:
SM: You can see these pictures XXXXXX if you care. As a Radio One XX, eh, do you
think that I should have a better car. And if so, what would you suggest?
JK: Well, I think, seeing as you've been playing my single, I think that you should
have a better car and more money. In fact, I think that you should have more money…
SM: What, do I get some of the XXXXXXXXX?
JK: …and the BBC should - Don't worry, even I don't get those, I owe them so much down
there, but, ehm, - video cost, dear boy, video cost - …but I'll tell you what, ehm, I
think you should, ehm, I think you should have a company car.
SM: Yeah, they don't do them.
JK: But why don't you get one of these little electric bubble things everyone is
driving around…
SM: What, a Smart car?
JK: No, not the Smart car, you know the really tiny little that you plug it in.
Other: Oh, I'd love to, I'd just love to see one of those.
SM: Like, like, like when you charge your phone, one of them.
JK: Yeah, exactly, yeah. It's only like, I don't know, three XXXX a month to run it, or
something.
SM: What, and you'd drive one of those, would you?
JK: I would drive one, yeah, if I lived in London. D'you know, when I go into London,
‘cause I drive around a XXXXXXX, yeah. Drive a little Vespa.


Not actually significantly different than 2,500 words, oddly enough.

At 10,000 words, you finally pick up "DJ", "online", "quid", "royalties" and "scooter",
giving you the whole conversation.

Conclusion: 1,000 words might just give you A2-level survival skills, and 2,500 words
is probably enough for a B1-level conversation. But you need a full 10,000 words to
understand every word of this conversation.
9 persons have voted this message useful



montmorency
Diglot
Senior Member
United Kingdom
Joined 4828 days ago

2371 posts - 3676 votes 
Speaks: English*, German
Studies: Danish, Welsh

 
 Message 76 of 210
20 August 2012 at 1:22pm | IP Logged 
Fascinating stuff.

It occurs to me that some other popular and influential authors that might be worth
analysing would be Charles Dickens and E.M. Forster.

Of more recent authors, I'm not sure, but some names will probably suggest themselves.

Someone who might already have done research like this, or at least know where it has
been done is David Crystal.


Edit: In my log entry for yesterday, which I posted today but drafted offline yesterday
(before I'd seen Iversen's and emk's most recent posts), I'd written that I'd want to
be nearer 3000 words than 300, but it now seems like even that is nowhere near enough!

Edited by montmorency on 20 August 2012 at 1:32pm

1 person has voted this message useful



maydayayday
Pentaglot
Senior Member
United Kingdom
Joined 5219 days ago

564 posts - 839 votes 
Speaks: English*, German, Italian, SpanishB2, FrenchB2
Studies: Arabic (Egyptian), Russian, Swedish, Turkish, Polish, Persian, Vietnamese
Studies: Urdu

 
 Message 77 of 210
20 August 2012 at 2:33pm | IP Logged 
I'd been concerned about my Spanish vocabulary for a long time so I bought the Barron's thematic vocabulary builder and digested that. The front cover says 'more than 10,000 words but when I came to read Harry Potter knew enough of the context to work out that la lechuza was an owl and that wasn't in the 10k words.

I've stopped worrying.


1 person has voted this message useful



fiolmattias
Triglot
Groupie
Sweden
geocities.com/fiolmaRegistered users can see my Skype Name
Joined 6689 days ago

62 posts - 129 votes 
Speaks: Swedish*, English, Arabic (Written)

 
 Message 78 of 210
20 August 2012 at 3:19pm | IP Logged 
Iversen wrote:

Shakespeare used 31,534 different words and a grand total of 884,647 words counting
repetitions


We swedes need 119.288 words to be able to read August Strindberg's complete works :)

Edited by fiolmattias on 20 August 2012 at 3:19pm

1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6703 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 79 of 210
20 August 2012 at 4:57pm | IP Logged 
I have read that number too. So basically most Swedes can't read Strindberg or..?

Apart from that I would like to see the criteria for that word count - it could be that all inflections etc. are counted separately.
1 person has voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5430 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 80 of 210
20 August 2012 at 5:20pm | IP Logged 
I love this debate because it goes to the very heart of what we mean by speaking a language. Let me first remind everybody that the original question as stated in the title of the topic was whether counting words is useful or not. I have not seen anything that has led to me change my position that for most people it is of no importance.

Now that the debate has veered towards the minimum vocabulary to speak a language, let's clarify some fundamental issues.

1. For some unknown reason, a few people here believe that I have suggested that 300 words are all you need to speak a language to C1 proficiency. How silly! If you need to pass any of those tests, spoken or written, you need more than that.

But I'm not talking about taking tests, I'm talking about the ability to have a simple conversation in a somewhat native-like manner.

2. In passing, I'll point out that I use the word fluency in its technical sense in linguistics. Many old-time readers here know that this is a pet peeve of mine. Fluency refers to ease of speaking and frequency of hesitation. Fluency is not a synonym of proficiency except in the world of language marketing.

3. The minimum or threshold we are talking about refers to active or productive vocabulary. It goes with saying that your passive vocabulary or words you recognize or think you recognize is much greater.

4. We are talking about the spoken and not the written language. We are not discussing how many words you need to read Harry Potter or a newspaper from cover to cover. We are talking about the ability to sustain something like a 5-minute conversation naturally with a native speaker over a cup of coffee or a beer. If you talk for two days non-stop with this same person, 300 words will not be enough. But I'm not talking about that.

Let me remind people that the authentic unscripted spoken language is very different from the written form. Readers who have taken the time to look at transcriptions in any language will note that real spoken language is quite "busy" and very often hard to understand because it relies heavily on sound to convey meaning.

Much of the so-called spoken language in the media, and this would include plays and movies, is not authentic unscripted language. It is represented language that sounds realistic. I'm sure many readers here use telenovelas, movies and other tv programs with subtitles to practice their target language. This is an excellent idea, but be aware that the language, as realistic as it sounds, is not the real thing.

One of the striking features of spoken language is its repetitive nature. I've always been struck by the statistic that across a large set of samples of spoken French, 38 words account for 50% of all occurrences. I know some people will rush to say that this means you don't know one out of every two words and that to get to 95% coverage you will need 10000+ words. That's true, but the fact remains that a very tiny number of words account for the bulk of usage.

5. A vocabulary of 300 words will limit the range of topics one can talk about. I agree totally. But that's not the point. This brings up the important issue of how vocabulary studies are conducted. I won't raise the issue of the definition of a word that I've already discussed.

All studies are based on a range of samples, either spoken or written, or some combination of the two. We then have a data base of tokens that are reduced to some basic forms. Then the words are counted and, very importantly, weighted according to their distribution across the samples. Words that appear more widely are given more importance than words that are less widely distributed.

Given the fact we all know that the core vocabulary of any language is very small, the number of different words used expands not so much as the length of a given sample but as the number of samples increases.

This is precisely why the number of words required for coverage in these vocabulary studies expands geometrically. So, if I take 1000 5-minute conversations, I would probably need, let's say 10000 words (or choose your figure) to provide 90% coverage.

But take one of those samples, that of two teenagers talking about their friends. I'm sure they can have a 5-minute conversation with less than 500 words. That is the point of that snippet of British conversation that I provided. Could these two gentlemen talk about that subject for five minutes with basically those same words? Of course, they can.

Some time ago here at HTLAL I discussed a quick study I did of verbs in a Spanish telenovela. I don't remember the exact figures, but it was something like around 70 verbs were used in a 54 minute episode. Those 70 verbs would not take be very far when reading the newspaper El Pais, but those same verbs kept coming back in all the subsequent episodes of the telenovela. How many different verbs are there in a year's programming of that telenovela? I don't know. But I suspect that it's closer to 300 rather than 3000. But more importantly, you could probably enjoy the entire series very well with only 150 verbs and let the others go by.

6. As for @emk's sophisticated transformations, I may be thick-headed, but I don't see how one can arrive at the astounding conclusion that you need to "know" 10,000 words to understand a text of 113 different words. Here is a conversation of 113 different words, not 300. If these two individuals had kept on talking for 5 minutes, maybe they would have reached 300. I don't see why we need any kind of complex math to interpret the fact that only 113 words were used.

Similarly, @iversen's calculations demonstrate perfectly well that across a range of samples you arrive at a large vocabulary size. Have I said anything differently?

What the snippet I gave illustrates is that one can have a one-minute conversation with 113 words and probably a five-minute conversation with less than 300 words. That's all I'm saying.

What is true, of course, is that you have to know how to use these words properly. If you're having problems with the verb "to do," you can't talk like this. And maybe this is what @emb is alluding to in that statistical labyrinth. Yes, you do need a fundamental knowledge of the language to be able to use it well. But the fact remains that only 113 words were used in this conversation.


7. The fundamental issue I believe is not the number of words but how you put them together. To say that 1000 words will give you barely A2 level survival skills is in my opinion missing the point. Sure, you may not pass a A2 exam with a thousand words but am I to believe that a person who really masters the 1000 most common words cannot have a decent conversation in French? Or that person could not do well in a store or a restaurant in Montreal? Or that person could not chat with someone in the bus? This is ridiculous. There are things that this person could not discuss, but, really, how many people use more than a thousand different words a day in their native language?

8. I am convinced that many people do not know how language really works. As I have said repeatedly, when I speak of 300 words as a threshold or core, I mean that these words are among the richest in uses and meanings. These are the key words in the language. Master these and the world opens up to you. Then, all you have to do is add more words.

If you believe that the best way to enhance your spoken language in English is to study Charles Dickens or Shakespeare, that reading Cervantes will give your spoken Spanish a boost or that Marcel Proust will improve your French conversation skills, all I can say is good luck.

On the other hand, I think you would do much better to memorize 10 minutes of a good talk show or soap opera. That is if you want to really speak the language.

When I travel to Germany, Holland and Scandinavia, I always marvel at the level of English language skills of the population in general. Maybe the quality of language instruction is outstanding, but I think that it's the exposure to English-language media with subtitles and lots of English-language pop music that is the biggest factor.

9. Maybe the fundamental question is whether your priority is actually speaking the language with natives. This may not be the case for very legitimate reasons. But if you want to be able to sustain some kind of interactive ability, there are a number of key linguistic skills to be mastered. Many people spontaneously pick these up through immersion and contact with native speakers because they correlate what they hear with what they see and imitate accordingly.

At the same time, so many people study a language endlessly with a variety of methods, books, tapes, CDs, software, etc. and have a huge vocabulary. But they still can't really speak well. And this is what it boils down to: show me what you can do. Open your mouth and let's see what comes out.

Edited by s_allard on 20 August 2012 at 8:05pm



4 persons have voted this message useful



This discussion contains 210 messages over 27 pages: << Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.4063 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.