Register  Login  Active Topics  Maps  

How many words to speak?

 Language Learning Forum : General discussion Post Reply
309 messages over 39 pages: << Previous 1 2 3 4 5 6 7 ... 34 ... 38 39 Next >>
patrickwilken
Senior Member
Germany
radiant-flux.net
Joined 4283 days ago

1546 posts - 3200 votes 
Studies: German

 
 Message 265 of 309
23 September 2014 at 11:03am | IP Logged 
fiolmattias wrote:
This page (http://blog.self.li/post/20854405575/how-to-understand-harr y-potter-any-
language) says that the Spanish translation contains 12.000 different words (of course
debatable, but way more than 2.500), and this (http://www.amazon.com/Unofficial-Harry-
Potter-Vocabulary-Builder-ebook/dp/B0032D97RO) book contains the 3.000 hardest, so it
must contain more than 2.500 words in total.


Perhaps the more interesting figure is that a broad analysis of the English corpus showed that 8000-9000 words groups was sufficient for 98% understanding of an incredibly wide variety of written text.

This article is worth a look:

Nation, I. “How Large a Vocabulary Is Needed For Reading and Listening?” Canadian Modern Language Review/ La Revue Canadienne Des Langues Vivantes 63, no. 1 (September 1, 2006): 59–82. doi:10.3138/cmlr.63.1.59.

https://www.victoria.ac.nz/lals/about/staff/publications/pau l-nation/2006-How-large-a-vocab.pdf

Edited by patrickwilken on 23 September 2014 at 11:04am

1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5282 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 266 of 309
23 September 2014 at 12:30pm | IP Logged 
robarb wrote:
emk wrote:

1. 300 to 500 words: You can establish communication if you learn subject-specific vocabulary in advance.
2. 1000 to 1500 words: You can probably manage to talk about a lot of concrete things, using workarounds
(A2?).
3. 2000 to 3000 words: You can be broadly competent at real-world tasks in familiar domains (B1?).
4. 5000 to 7000 words: You can debate abstract subjects semi-intelligently if you take your time (B2?).

This is a little harsh at the high levels, I'd say. Only 5000 unique words are fair game for the highest level of the
Chinese HSK, and those are passive, and you don't need all of them to pass. I would guess that if you had as
many as 7000 words you're comfortable using actively (and corresponding other skills), then a B2-level speaking
test would be absolutely a piece of cake, a C1 very doable, and a C2 not out of the question (but you'd need to
not make grammar mistakes, which is a separate question).

My vocabulary above numbers are passive, and they're for the DELF/DALF series of CEFR exams. As Serpent has already noted, the DELF/DALF exams may be a bit harder than some other European CEFR exams.

Running my software over the various sample exams, I'm starting to notice one thing: The DELF & DALF exams seem to assume that Romance/Latin cognates are easy. If you speak English, the DELF B1 reading comprehension questions are appropriate for B1. If your native language were Middle Egyptian, however, the B1 reading questions would be absolutely brutal, and far better suited to a hard B2 exam.

Let me show you what I mean. We'll take another look at that "DELF B1 with 400 book-oriented words" example I mentioned earlier. Here are the unknown words:

Quote:
adaptation, agir, aider, association, aujourd'hui, causer, centrer, chômage, combattre, débuter, diminué, emploi, faveur, fondre, gestion, handicapé, handicaper, insertion*, intégration, liguer, marqué, mobiliser, neuvième, personnes, physique, polémiquer, préjuger, professionnel, repris, réussir, ruer, selon, semaine, sixième, slogan, victime

I've marked the words with no easy English cognates in bold. And the pattern is quite clear to my eyes: The boldfaced words might not be in the top 400 in French, but they're all still really common (except ruer, which is bug in my software). The non-boldfaced words are actually surprisingly hard in many cases. In fact, I don't think many of words really belong on a B1 exam for our hypothetical ancient Egyptian learning French.

If you look at various sample reading comprehension exercises on these exams, you can see the pattern:

- DELF B1: Relatively easy articles from normal newspapers.
- DELF B2: Articles from "literary" newspapers on subjects like bushmeat and African deforestation.
- DELF C1: Essays written in the 1800s contrasting different schools of literature, and similarly annoying things.

At the B2 and C1 levels, the speaking prompts aren't that much simpler than the reading comprehension questions, and so you'll need a pretty big passive vocabulary just to understand the proposed topic and answer questions from the examiners. Of course, your active vocabulary might only be a third the size of your passive vocabulary, but if you used it reasonably well, you'd do just fine.

Getting back to the Chinese HSK, I think we can see why their hardest exam gives much smaller vocabulary numbers than the upper-level French exams:

- The "official" mappings between the HSK and the CEFR are unrealistic. HSK 6 probably does not represent the same level of competence as the DALF C2.
- The DELF/DALF exams are probably slightly hard even by European standards.
- The DELF/DALF exams assume 50% is a passing score, so they use fairly hard texts.
- The DELF/DALF exams assume that Latin cognates are easy, even when they're pretty rare.
- Chinese relies heavily on compound words (AFAIK), whereas English and the Romance languages tend to import new word stems from Latin and Greek.

I sat the DELF B2 with an estimated passive vocabulary of 5,000 to 7,000 words, and I passed by a very comfortable margin. I think you could probably still scrape out a pass with a passive vocabulary of 4,000 words. But keep in mind that an English speaker gets something like 3,500 near-exact cognates in the top 6,000 words.

robarb wrote:
Also, for what it's worth you are implying that
concrete language is less demanding than abstract language. While children do go through in that order, it
doesn't necessarily apply to adult learners: it's easy for me in French to say things like "there's simply no way to
demonstrate that a political system will work using theory alone; look at communism, it was impossible to
predict what would happen until people tried to carry it out." However, I have no idea how to say "pillowcase,"
"mop," "windowsill" or headphone jack."

By "concrete subjects", I don't mean "random things found around the house," which can indeed be difficult. I mean something more like, "I can talk about what I did today" or "I can reschedule a party with a friend." This is opposed to more abstract subjects, which might include opinions on teenage rebellion, gender relations at work, or environmental issues.

It's really amazing how much the Latin/Romance discount helps English speakers, isn't it?
3 persons have voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6347 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 267 of 309
23 September 2014 at 12:45pm | IP Logged 
So do they basically assume anyone taking the exam speaks English?
1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5282 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 268 of 309
23 September 2014 at 1:19pm | IP Logged 
Serpent wrote:
So do they basically assume anyone taking the exam speaks English?

Well, a Romance language is probably even better than English. Let's take those two paragraphs of the DELF B1 and make a list of everything that doesn't appear in the top 3000 French words (I also checked obvious related forms by hand this time).

Quote:
adaptation, association, centrer, chômage, diminuer, gestion, handicaper, insertion, intégration, liguer, mobiliser, polémiquer, préjuger, slogan

Once again, I've marked the words with no easy English cognates in bold. If we know the top 3000 French words plus English, we have two unknown words in two paragraphs. If our native language is Middle Egyptian, we have 14 unknown words. That's not fair for somebody sitting a B1 exam with a 3,000 word passive vocabulary and a good knowledge of how to derive words using suffixes.

Now, I'd love to know what that list looks like to a monolingual German speaker, or to a monolingual Mandarin speaker. Do words like association and handicaper filter into non-Romance languages via global English?

If you truly have no cognate discounts, I'd guess that the DELF B1 reading comprehension exam actually requires B2 comprehension. (And I'd guess the B2 exam requires C1 comprehension.) This is probably one of the reasons it's hard to generalize the CEFR outside of Standard Average European: It's too easy for test designers to assume anything covered by the European Sprachbund is "easy", and design their tests accordingly. So a Mandarin speaker who sits a European CEFR exam may get a skewed notion of what should be easy for a B1 student.
2 persons have voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6453 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 269 of 309
23 September 2014 at 2:04pm | IP Logged 
robarb wrote:

I would guess that if you had as many as 7000 words you're comfortable using actively (and corresponding other skills), then a B2-level speaking test would be absolutely a piece of cake, a C1 very doable, and a C2 not out of the question (but you'd need to not make grammar mistakes, which is a separate question). Also, for what it's worth you are implying that concrete language is less demanding than abstract language. While children do go through in that order, it doesn't necessarily apply to adult learners: it's easy for me in French to say things like "there's simply no way to demonstrate that a political system will work using theory alone; look at communism, it was impossible to predict what would happen until people tried to carry it out." However, I have no idea how to say "pillowcase," "mop," "windowsill" or headphone jack."


I have counted my passive words in many languages, and I have estimated the words I actually have used in a fairly large English sample (though small compared to the corpora used by researchers). But I have no idea how I should make a assessment based on facts of my active vocabulary, and I have actually reached the conclusion that 'activeness' isn't a binary characteristic, but more like a likelihood function of some sorts - which makes it even more difficult to study.

In fact even 'passive vocabulary' is a fuzzy notion because of the existence of more or less guessable words. But here you can at least grab a dictionary and start counting, or you can take one of the internet based tests, although these apparently all are based on frequency considerations of the kind that favor 'learned' words at the expense of original vocabulary and simple down-to-earth terms like "pillowcase" and "windowsill".

The only thing I do know is that I can recall not only a larger number of words in my 'strong' languages than in my weak ones, but also a larger percentage of my passive words. Part of the problem is that knowing a lot of words passively in their dictionary form isn't enough - you also need to know the morphology of each word plus enough syntax to construct sentences. Otherwise you can't use the words you do remember. So recall training is necessary - and it is probably less important whether you do it with a teacher or mentor as part of a conversation or alone by making videos or writing essays or just thinking, though motivation may be a factor here. At least the faculty to recall words seems partly to be general faculty, i.e. it is less important which precise words you try to recall - the activity will also benefit other words in your passive vocabulary.

Edited by Iversen on 23 September 2014 at 2:46pm

1 person has voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5180 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 270 of 309
23 September 2014 at 2:49pm | IP Logged 
Now that robarb has explained overfitting I think I have a better understanding. If I know only the vocabulary of
one Harry Potter book, I would have a problem understanding a book by a different author. For example, I would
have some difficulty reading a work by Charles Dickens. But suppose I'm not interested in Charles Dickens and I
all I want to read is one Harry Potter book, that vocabulary will suit me fine.

So we have a very lively conversation on what being triplets is like in a French family. Let's say 130 unique words.
If I know only these 130 words I would not be able to have a conversation on a different topic without adding
new words. That's pretty much a given. I agree, and have never said anything different. But the fact remains that
two native speakers can have a lively conversation using only 130 words. We assume they know other words.

If we look at this conversation in terms of the typical word coverage analysis, we say that using a large set of
many conversations - or French film subtitles - we see that the words in this particular conversation occupy
ranks up to 6000 in the word frequency list. Therefore, 100% coverage of this conversation requires 6000 words.
This is true. I totally understand this methodology.

What I find intriguing is the next logical conclusion: participation in this conversation of 130 words demands a
knowledge of 6000 words. Or the claim that 130 words are insufficient for a conversation unless everything is
learned by rote in advance - I may be exaggerating a bit.

My simple question, why do I have to know 5870 other words for this conversation?

The answer would seem to be that these other words are necessary for talking about other topics. Now we all
know that talking about a different topic requires more vocabulary. So let's take another conversation from the
France Bienvenue website. If we combine the unique words of the two conversations, we are now up to let's say
200 words. Then as we add other conversations, the unique word count goes up.

When you look at the actual contents of the conversations - and keeping in mind that there is also the phonetic
aspect that conveys a lot of information - we see, as always, there is a very tiny number of grammatical words
and content words - the very high-frequency components, and small numbers of topic-specific words. So, there
is considerable overlap between conversations at the high-frequency level at very little at the topic-specific level.
This means that as we add conversations to our dataset the vocabulary range will expand.

These conversations are typical examples of informal, relaxed interactions discussing everyday topics: what do
you have for breakfast, what was life like at a military lycée, what is it like to be triplets, etc. Linguistically, they
are simple compared to written formal French and very different from the literary language. The sentence
structures are very simple, lots of idiomatic expressions and local references that require explanation by the
authors of the web site.

The main point where I think I differ from emk and robarb is that I believe that the tiny core of structures that we
see at work here supplemented with whatever vocabulary is necessary will allow the learner to deal with any
situations. I have put that figure somewhat arbitrarily at 300 - I didn't invent it but I'll take responsibility.

This is not to say take the first 300 words from French film subtitle frequency list. Heavens no. That would not
definitely not work. Instead, let's take four conversations that give us a unique word count of 300 and see what
we can do with that. We know we can at least have four conversations.

If you take those 300 words and compare them to all the words of a set of French film subtitles, you may get let's
say 50% coverage. So what. Does that mean that the four conversations here are unreal?

But aren't these four conversations learned by heart and at the slightest deviation we are stuck? Obviously not.
There's no learning of lines by heart here. We see the language being used well. Suppose a fifth topic comes up.
Someone wants to talk about working in a restaurant. What do I do? Do I suddenly stop talking and head for the
door? No, first of all, all the common language vocabulary and skills are transferable. Secondly, I do what we see
users doing in all the conversations here. I ask questions. I learn new vocabulary on the spot.

If you read through the dialogues, you see that they are all basically an interviewer learning about a topic from
the person being interviewed.

The key idea here is that the speakers do not have to learn thousands of unnecessary words beforehand in order
to speak. There is no need to know 6000 words to have a conversation like one of these here.





Edited by s_allard on 23 September 2014 at 2:54pm

1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6453 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 271 of 309
23 September 2014 at 3:29pm | IP Logged 
There may be more efficient ways to learn vocabulary than by dragging words out of a teacher one by one and then ask for explanations, but the idea reflects a typical mindset about language learning: you can't learn thousands of words in one session, so you take them stepwise and let a teacher decide the order. This happens partly by chossing a topic and with relevant ultrasimple materials and partly by supplying explanations. And if you accept the things the kids do in school as speaking then OK, those kids speak. At least they say something - hopefully. And that's part of the concept. Even a pupil with 300 words is pushed to say something in a typical language class.

But in real life you can't be sure what the next topic is and which words your interlocutor will use (apart from a small group of very common words, including the typical grammar words), so either you choose situations where the topic is given or you are in trouble up to your ears and end up speaking English or what ever the local tourist koiné is. Is that fun? I don't think so, although I confess that I have pointed to lions in Kenya and Tanzania while using the word simba. But I could do that without too many misgivings because I didn't se myself as a budding speaker of KiSwahili, but just as another dumb tourist who had picked up a dozen or so words. If I really had decided to learn the language I would have felt ashamed if I couldn't have a real connversation (at the simple end of the scale), and I wouldn't have been able to have discussions over a broad and not yet specificed range of topics without having learnt several thousand words first. Even if I didn't use them myself I would need them in order to understand the answers from local people.

That being said, there is an alternative between the 300-words-1-topic extreme at one end of the spectrum of being a fluent C2+ speaker at the other, and that's the idea about language islands (in the plural). You will probably have to explain who you are and why you have tried to learn language X, so find out how to say something sensible about those things and learn the necessary vocabulary. And if you are going to speak to hotel receptionists and restaurant waiters you can also guess what sentences you might need to cover those areas. OK, learn the relevant vocabulary. The point - which also s_allard may be hinting at - is that in a situation where your vocabulary is deficient you'll be better served by focussing your vocabulary learning on specific areas and leave others for later.

Edited by Iversen on 23 September 2014 at 3:47pm

3 persons have voted this message useful



patrickwilken
Senior Member
Germany
radiant-flux.net
Joined 4283 days ago

1546 posts - 3200 votes 
Studies: German

 
 Message 272 of 309
23 September 2014 at 3:50pm | IP Logged 
Well this is completely depressing. I found a set of vocabulary tests at the University of Leipzig for different languages (German, English, Japanese, Arabic, Portuguese, Russian, French, Spanish, Italian), which test your knowledge (passive/active) of the 5000 most common words in each language.

As expected in my receptive knowledge of English I score essentially 100% in all levels (2/30 errors at the 2000 word level for some reason, perhaps I misread the question or rushed things).

For German however, I do really quite badly. For the 1000, 2000, 3000, 4000, 5000 word levels I get: 28/30; 17/30; 20/30; 18/30; 13/30 (didn't finish all the questions for the last so perhaps that would have been a bit higher).

Using the raw scores that suggests a passive vocabulary of 3200 for the first 5000 most frequent words in German.

The tests can be found here: http://www.itt-leipzig.de/static/startseiteeng.html

Edited by patrickwilken on 23 September 2014 at 3:51pm



2 persons have voted this message useful



This discussion contains 309 messages over 39 pages: << Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.4219 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.