Register  Login  Active Topics  Maps  

How much time studying vocabulary?

 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
350 messages over 44 pages: << Previous 1 2 3 4 5 6 7 ... 21 ... 43 44 Next >>
s_allard
Triglot
Senior Member
Canada
Joined 5223 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 161 of 350
18 May 2015 at 3:53pm | IP Logged 
Iversen wrote:
If you study 5 languages and want a passive vocabulary of some 20.000 headwords then it may
sound totally overwhelming that you should have to learn 100.000 headwords all in all. Luckily there is such a
thing as guessable words, including international loanwords. I have counted many strange things, but not the
percentage of words in any given language which are unique to that language. But similar words in related
languages and loanwords are so numerous that the total number of items you have to learn will be much less
than 100.000 unique headwords (or whatever that corresponds to in terms of word families). And if you set your
target lower than the 50.000 mentioned by some you don't need to stay 50 years abroad per language.

...    

I strongly disagree with these estimates of the number of headwords one needs to know passively and actively in
a language to achieve good levels of performance in a foreign languages. We should first keep in mind that these
figures such 10k, 15k, 20k and even 40K of passive vocabulary are not derived by actually measuring the
vocabulary of speakers. They are derived from studies based on measurements of numbers of words required for
levels of text coverage of data sets of texts such as books, film dialogues, telenovelas, etc. Each individual
sample may have a small vocabulary but when you combine a large number of samples, the sum vocabulary
needed to cover all the samples expands considerably.

Somewhere in this thread or the parallel one, we learned that the book Alice in Wonderland contained around
1760 unique words. That's not a lot. I would think that the Harry Potter books individually contain around that.
Let's say that you could read all the Harry Potter books with 2000 words because the author has a certain style
and active vocabulary aimed at a young audience.

The problem is that combining the unique words of Alice and Wonderland plus Harry Potter produces a set that is
larger than any of the individual books. If you were to look at 100 books covering a wide span of years and
genres you arrive at a humongous figure that a reader must know in order to read all these works comfortably,
i.e with a text coverage of around 95% or more.

This is all true but the fact of the matter remains that you only need 1760 words for Alice in Wonderland or 2000
for Harry Potter. So, if the only book you read in your life is a Harry Potter, a 2000-word passive vocabulary will
do you fine. In fact, around 70% of those Harry Potter words are in all the other books, i.e. if you know only 1400
in English words you can cover a good part of every book ever written in English literature. As for comprehension
and enjoyment that's another question that I'll look at in a minute.

This means, of course, that the size of one's passive vocabulary depends on the amount of reading, yes, but also
on the breadth, i.e. the range of different genres and eras. In English, if you combine a 19th century author like
Charles Dickens with a contemporary figure like Dan Brown, you are in for quite a ride in terms of vocabulary.

The other side of the equation is what does comprehension mean. A book like Alice in Wonderland is a lot more
than the sum of 1760 words. If you just studied those words with one-word translations and nothing else, could
you pick up the book and start enjoying it? Most of the words would look familiar of course but you would be in a
daze most of the time because you have never seen the words actually used in context.

Since this is a piece of great literature to really understand this work, you have to have a very good knowledge of
the English language, including grammar and the idiomatic use of vocabulary including metaphors and
connotations.

Similarly, if you read and enjoyed only one Harry Potter book in your life you would have a very good overall
knowledge of modern written fiction English. And if you could actually use those 2000 words as they are used in
a Harry Potter book, your written English would be outstanding. Most native speakers can't write like that.

Many readers here will jump up and say: "What if you want to read a wide range of books or write about many
different subjects besides how to become a teenage magician?" Well, obviously, you need more vocabulary. So,
you get what you need. The more you read, the more your passive vocabulary will expand. But the core
knowledge you get from that one Harry Potter will take you very far because, as @iversen has correctly pointed
out, much vocabulary is guessable or made up of loanwords and cognates.

I believe that you could read just one or two books and learn to express yourself in a sophisticated manner that
will take you wherever you want to go. Concretely, this may mean reading one novel in your target language
quite a few times to really assimilate its contents instead of trying to read many different novels.

As I have said many times, I believe that this whole business of counting words is a qreat waste of time. Yes, one
can count the number of Anki entries or the lines in an Excel spreadsheet but other than a sense of personal
satisfaction for having processed a number, I don't see the value of all this.

In fact, I think much of this debate can be discouraging to language learners. To say that you need a passive
knowledge of 20000 words in a language in a language like French or Spanish is somewhat irresponsible. For
French for example, if you have a good mastery of the language used in a French translation of a Harry Potter,
you would have a head start on any work of modern French fiction. Notice I said "good mastery" of the language.
I did not say master the vocabulary.

The vocabulary one needs is determined by the task at hand. In order to read Alice in Wonderland all I need 1760
words. That's it. I'm not reading Moby Dick or Great Expectations. They day I want to read those books, I'll learn
the vocabulary I need.

Edited by s_allard on 18 May 2015 at 4:22pm

1 person has voted this message useful



patrickwilken
Senior Member
Germany
radiant-flux.net
Joined 4326 days ago

1546 posts - 3200 votes 
Studies: German

 
 Message 162 of 350
18 May 2015 at 4:27pm | IP Logged 
smallwhite wrote:

You reminded me ;) You asked me in the poll thread what I could do with my 8k words, and I wrote a long message about my first 200 days. I asked you back what your first 200 days were like and was hoping to hear from you. Your aim above is exactly the same as mine so you're really the only person whom I can compare the efficiency of what I did against.


I sorry I don't have a great recollection where I was at at 200 days, but somewhere in B1-land I guess.

According to my notes, which are all guesstimates, I was B1+ at 1-year. After 2-years I estimated my level as B2-spoken/B2-reading-C1-listening. Now I am at B2-spoken/C1-reading/listening.

It doesn't sound like a lot has happened in the last year, but I can see really see substantial improvements in both my ability to read, watch movies or engage in conversations.

I'll write up a summary in my blog in a couple of weeks once I reach the three year mark.
2 persons have voted this message useful



rdearman
Senior Member
United Kingdom
rdearman.orgRegistered users can see my Skype Name
Joined 5029 days ago

881 posts - 1812 votes 
Speaks: English*
Studies: Italian, French, Mandarin

 
 Message 163 of 350
18 May 2015 at 5:03pm | IP Logged 
@s_allard

I have mentioned word counts before here:
rdearman wrote:
I do have a copy of Shogun electronically, so after a couple of quick hacks I've determined there are 437,199 words and only 20,227 unique words. This is a very raw result and I haven't sanity checked for punctuation, Japanese language usage, etc, but you could say 20k unique words in Shogun as a valid result. So if you're going to read Shogun (English version) then you need a vocabulary of 20,000 words. I did the same analysis for "Of Mice and Men", and there are 29,760 words with only 2,977 unique words. So the lesson here is; Read Steinbeck.


But I do have a copy of all the Harry Potter books although they are in Italian. Having done the same hacking to get uniq words vs combined word count:

All the HP books combined in Italian have 1,100,803 words and there are 45,337 unique words across all seven books. So your estimate of 2000 words for Harry Potter is wildly underestimated. Even if we supposed 50% those words are names, or made-up words for magic, are guessable or made up of loanwords and cognates. You still require a passive vocabulary of 22,000 words in Italian. However it does depend on the writer. For example Steinbeck's "Of Mice and Men" quoted above.

I decided to do some other examples based on the books I have available.
Jules Verne - 20000 Lieues Sous Les Mers: | total words: 146,349 | unique words: 17,039 (11%)
Jules Verne - De La Terre A La Lune: | total words: 56,149 | unique words: 10,111 (18%)
Guy De Maupassant - La Main Gauche: | total words: 37,192 | unique words: 7,634 (20.5%)
Isaac Asimov -Némésis (French): | total words: 121,610 | unique words: 11,499 (9.4%)

So for any of these French books you need a passive vocabulary ranging from 7,634 to 17,039 words. Even with an error margin of 50% your estimates for 2000 words are wrong. But what we can see is different authors have a different vocabulary range, and Isaac Asimov who was always berated for his simple vocabulary is a good author to read if you're a beginner, with only 9.4% unique words.

Put another way, 90.6% of the book Némésis by Isaac Asimov is repetition, and a waste of time if you're trying to advance your French vocabulary. So bulking up on vocabulary through reading, really depends on what authors you read. But regardless it is inefficient when you are only getting a 11-20% increase in vocabulary. Of course books are more entertaining than SRS, I'll give you that.

EDIT: put | in to make numbers easier to read.


Edited by rdearman on 18 May 2015 at 5:05pm

1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6496 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 164 of 350
18 May 2015 at 5:16pm | IP Logged 
My estimates of my passive vocabulary are actually based on counting headwords on representative pages in a number of dictionaries. So when I say that I feel comfortable reading in languages where I know some 20.000 headwords or more and need a dictionary below that, that statement has nothing to do with coverage - it is just some fairly hard figures combined with unquantified knowledge about my own behaviour.

However I have also counted words in my own writings here at HTLAL, and now I have at long last found the figures. In 2009 I compiled a corpus of some 15000 wordforms which after some analysis ended up as a mere 2400 unique headwords. In 2014 I made two corpora of 36304 hhv- 36868 words forms, and the uusual treatment in a spreadsheet thiese numbers eventually became 3498 and 3914 headwords, with a grand total of 5433 headwords - i.e. the overlap was 1979 headwords. Or in other words: roughly half the headwords in each sample was new. But it is also worth noticing that the total number of headwords in a sample of 74.000 word forms (5433) only was slightly more than double the number found in a sample of 15.000 wordforms (2400 headwords). So I wouldn't find ten times as many headwords with a corpus clocking in at 700.000 words forms.

NB: not even 'words actually used' can be taken as a measure of active vocabulary. I may have written a word once shortly after having learnt it, but there is no guarantee that I would be able to remember it now if I needed it. Active vocabulary is a volatile little fellah - it goes up and down with your mood, wakefullness and recent activities. Passive vocabulary is much more dependable and constant, and here size does matter. Just count the number of times you see words in quite ordinary texts which aren't found in even a pretty large dictionary. If 30.000 or 40.000 words aren't enough for a dictionary, why should a few thousand be enough for you?

Edited by Iversen on 18 May 2015 at 5:27pm

3 persons have voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6390 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 165 of 350
18 May 2015 at 6:21pm | IP Logged 
smallwhite wrote:

Serpent wrote:
Can you reply about the Chinese forum btw? Has anyone tried your method and what were the results?


I think you've misunderstood. You must be referring to this:

smallwhite wrote:
You see, I write similar things on another (Chinese) forum, and I've always received positive responses: fellow learners thanking me for the encouragement, forum moderators highlighting my posts and rewarding me with forum points, etc. That the same things are seen as discouraging here takes a little getting used to.


The subject matter was discouraging newbies and that's what I was referring to. I meant I write about adding 8k words in 4 mths, etc high bar stuff (and how I did them) and fellow learners thanked me blah blah blah. I meant I received positive responses there, and didn't expect that writing the same things here would generate negative responses.


I know. I just assumed that out of the many people, someone tried to do the same thing. The problem is not just discouraging newbies but feeding their reluctancy to do native materials. Basically, you're different from them in the sense that most *want* to read/watch stuff but put it off until they are "ready", done with all the textbooks they can find etc. You don't seem interested in the L2 content but accept that reading is needed, which is an extremely uncommon combination here on HTLAL. Maybe in China this attitude to a language as an academic subject is more widespread, idk.

By your method I'm just referring to your whole study plan for vocab, including the 8k/2s/SRS thing and the mining from books (that may or may not be left unfinished). If you want to narrow it down to a core method and the details that may vary, be my guest.

Edited by Serpent on 18 May 2015 at 6:22pm

2 persons have voted this message useful



s_allard
Triglot
Senior Member
Canada
Joined 5223 days ago

2704 posts - 5425 votes 
Speaks: French*, English, Spanish
Studies: Polish

 
 Message 166 of 350
18 May 2015 at 6:42pm | IP Logged 
rdearman wrote:
@s_allard

I have mentioned word counts before here:
rdearman wrote:
I do have a copy of Shogun electronically, so after a couple of quick hacks I've determined
there are 437,199 words and only 20,227 unique words. This is a very raw result and I haven't sanity checked for
punctuation, Japanese language usage, etc, but you could say 20k unique words in Shogun as a valid result. So if
you're going to read Shogun (English version) then you need a vocabulary of 20,000 words. I did the same
analysis for "Of Mice and Men", and there are 29,760 words with only 2,977 unique words. So the lesson here is;
Read Steinbeck.


But I do have a copy of all the Harry Potter books although they are in Italian. Having done the same hacking to
get uniq words vs combined word count:

All the HP books combined in Italian have 1,100,803 words and there are 45,337 unique words across all seven
books. So your estimate of 2000 words for Harry Potter is wildly underestimated. Even if we supposed 50% those
words are names, or made-up words for magic, are guessable or made up of loanwords and cognates. You still
require a passive vocabulary of 22,000 words in Italian. However it does depend on the writer. For example
Steinbeck's "Of Mice and Men" quoted above.

I decided to do some other examples based on the books I have available.
Jules Verne - 20000 Lieues Sous Les Mers: | total words: 146,349 | unique words: 17,039 (11%)
Jules Verne - De La Terre A La Lune: | total words: 56,149 | unique words: 10,111 (18%)
Guy De Maupassant - La Main Gauche: | total words: 37,192 | unique words: 7,634 (20.5%)
Isaac Asimov -Némésis (French): | total words: 121,610 | unique words: 11,499 (9.4%)

So for any of these French books you need a passive vocabulary ranging from 7,634 to 17,039 words. Even with
an error margin of 50% your estimates for 2000 words are wrong. But what we can see is different authors have a
different vocabulary range, and Isaac Asimov who was always berated for his simple vocabulary is a good author
to read if you're a beginner, with only 9.4% unique words.

Put another way, 90.6% of the book Némésis by Isaac Asimov is repetition, and a waste of time if you're trying to
advance your French vocabulary. So bulking up on vocabulary through reading, really depends on what authors
you read. But regardless it is inefficient when you are only getting a 11-20% increase in vocabulary. Of course
books are more entertaining than SRS, I'll give you that.

EDIT: put | in to make numbers easier to read.

Before we rush to conclusions here, there are a couple to points to be clarified. When we are talking about unique
words, are we talking about word forms or word families? I always used word families because I believe that that
is the most accurate figure because it eliminates much of the duplication. This is the term that Paul Nation uses
all the time. Word family counts are much smaller than word form counts. Very few programs can automatically
generate word families, especially in languages other than English.

I suggest that readers have a look at the site
Vocabulary Analysis of Project Gutenberg

The definition of unique words is not clearly specified but certainly means word forms. These are also older
works out of copyright.

One notices that the longer the work, the larger the unique vocabulary. In this article, the books are divided into
over and under 30000 words. Some works are more dense, i.e. more unique words per unit of words and others
are less dense.

In the longer works in the least dense vocabulary category and over 30000 words, we see that a major work like
Moll Flander by Daniel Defore contains 139300 words and 6139 unique word forms. How many word families
does that represent? Let's say 2/3 or around 4050 word families.

To come back to Harry Potter, I will admit that I was wrong to estimate 2000 word families for the entire series
but I should point out that the HP series contains a lot of proper names and fantasy words for the subject matter.
I would still maintain that any one book and especially the first one, Harry Potter and the Philosopher's Stone,
uses a relatively small vocabulary. A good article on all this is Paul Nation's:


How
large a vocabulary is needed for reading and listening?


My figures may have been off, but the fact remains that to understand fully any one work you only need a small
vocabulary relative to that of many works. So the question isn't how many words do you need to understand all of
the Harry Potter books. It's how much do you need to know to understand one of them.

Something that hasn't been much studied is the number of words for informal conversation. I certainly don't have
any figures but I would think that relative to written literature, the number of word families necessary to have a
fluent 30-minute conversation about an everyday subject with a native speaker is quite tiny. But having that kind
of conversation can be very challenging for the language learner.

4 persons have voted this message useful



Jeffers
Senior Member
United Kingdom
Joined 4702 days ago

2151 posts - 3960 votes 
Speaks: English*
Studies: Hindi, Ancient Greek, French, Sanskrit, German

 
 Message 167 of 350
18 May 2015 at 6:45pm | IP Logged 
Let's be careful about comparing apples and oranges. The count of 1743 for Alice in Wonderland is a count of word families, defined very specifically as: "the base form of a word plus its inflected forms (third person -s, -ed, -ing, plural -s, possessive -s, comparative -er and superlative -est) plus derived forms made from certain uses of the following affixes (-able, -er, -ish, -less, -ly, -ness, -th, -y, non-, un-, -al, -ation, -ess, -ful, -ism, -ist, -ity, -ize, -ment, in-)."

I imagine rdearman's word counts count types [corrected from tokens, thanks to Daegga] (essentially, things that are spelled differently). In English this will inflate the count significantly. With a language like Italian or French the count inflation would be off the chart, because parler, parle, parles, parlons, parlez, parlent, parlions, parliez, etc, all count as separate words. Incidentally, emk had a tool to sort this out so that all the recognized forms were counted as a single word.

Edited by Jeffers on 18 May 2015 at 7:35pm

1 person has voted this message useful



daegga
Tetraglot
Senior Member
Austria
lang-8.com/553301
Joined 4314 days ago

1076 posts - 1792 votes 
Speaks: German*, EnglishC2, Swedish, Norwegian
Studies: Danish, French, Finnish, Icelandic

 
 Message 168 of 350
18 May 2015 at 6:55pm | IP Logged 
Jeffers wrote:

I imagine rdearman's word counts count tokens (essentially, things that are spelled differently). In English this will inflate the count significantly. With a language like Italian or French the count inflation would be off the chart, because parler, parle, parles, parlons, parlez, parlent, parlions, parliez, etc, all count as separate words. Incidentally, emk had a tool to sort this out so that all the recognized forms were counted as a single word.


You mean types. Tokens is what his "total wordcount" is based on.
Sorry for nitpicking, but this mixture of terminology gets confusing very quickly and might not be as obvious as the CPU/computer case example.

Afaik emk used SnowballStemmer, which has its own problems. From rdearman's description, he probably used a manual script to do the most obvious stemming for him. This is not very accurate either, but certainly better than just counting types. But yes, the numbers seem to be slightly inflated.

Edited by daegga on 18 May 2015 at 7:11pm



1 person has voted this message useful



This discussion contains 350 messages over 44 pages: << Prev 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3594 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.