Register  Login  Active Topics  Maps  

Proved: 8k are enough to cover %100!

 Language Learning Forum : General discussion Post Reply
14 messages over 2 pages: 1 2  Next >>
Hashimi
Senior Member
Oman
Joined 6258 days ago

362 posts - 529 votes 
Speaks: Arabic (Written)*
Studies: English, Japanese

 
 Message 1 of 14
04 November 2014 at 5:24pm | IP Logged 

8000 words only are enough to provide %98 - %100 coverage of authentic texts.

We know that the top 100 words in English, for example, account for approximately 50% of all written texts. But according to Dr. Paul Nation and other researchers, at least 40,000 words are needed to provide 98% coverage for most general texts in English. This is bad news for language learners, because various studies demonstrated that to effectively guess new words from context there has to be at least 95% coverage. Dr. Nation identifies this coverage level as the minimum required to read texts for meaning. He cites the 98% level as the absolute "minimum" for guessing in context.

Does this mean that an English learner should know 40,000 words at least before he can read authentic texts and guess the meaning of unknown words from context?

Even if he understand 98% of all words, this means that he will encounter THOUSANDS of unknown words in a typical book! ( 1 unknown word on every 2nd line of text, 16 unknown words per page).

The good news is that practically, these calculations are wrong!

In 2003, the Japan Association of College English Teacher (JACET), published "the JACET List of 8000 Basic Words" (thereafter JACET8000). The JACET 8000 is a radically new word list designed for all English learners in Japan. This list is based on two kinds of corpora: the British National Corpus and the JACET 8000 Subcorpus.

One committee member of JACET developed a program "JACET8000 Level Marker" which analyzes the words in actual English texts, and indicates their word ranks. Using this program, three authentic texts, taken from newspapers and governmental documents, were examined. The words in the boldface are those not included in the JACET8000. The first example is shown below:

Quote:
Text 1. A news article in The New York Times (from nytimes.com):

With hopes for finding anyone alive in the rubble fast fading, teams from a number of countries, including the United States and Switzerland, scaled back their search and rescue efforts and turned their attention to the thousands of survivors, many of them suddenly homeless and bereaved. With the surge of assistance, the tiny airstrip that serves this ancient backwater on the Silk Road was clogged.

State-run television said 16,000 people had been buried in the three days since the predawn quake in what had once been a city of more than 80,000.


Here, according to a mechanical calculation, the text cover rate is 93%. However, among the six unregistered words, two are numerals and one is a proper noun. This suggests that the JACET8000 covers almost all of the words in the newspapers, except for three difficult words of "bereaved", "airstrip," and "clogged."

The second example is taken from the Telegraph Newspaper and the Standard Aptitude Test (SAT), a commonest college admission test in the USA. The below is a part of the reading material in the "Critical Reading" section:

Quote:
Text 2. A Reading Text in the SAT:

The domestic cat is a contradiction. No other animal has developed such an intimate relationship with humanity, while at the same time demanding and getting such independent movement and action.

The cat manages to remain a tame animal because of the sequence of its upbringing. By living both with other cats (its mother and littermates) and with humans (the family that has adopted it) during its infancy and kittenhood, it becomes attached to and considers that it belongs to both species. It is like a child that grows up in a foreign country and as a consequence becomes bilingual.


In this case, the cover rate of the JACET 8000 is as high as 98%. Although "littermate" is a difficult word whose meaning we can hardly guess by its morphological constituents, "kittenhood" would be easily understood as a variation of "kitten," which is ranked at the level 7. This shows that the JACET8000 covers almost all of the words in the SAT reading section.

The third sample is taken from the US President's Addressee at the United Nations General Assembly in 2003:

Quote:
Text 3. President's Address at the UN Assembly (from whitehouse.gov):

Last month, terrorists brought their war to the United Nations itself. The U.N. headquarters in Baghdad stood for order and compassion -- and for that reason, the terrorists decided it must be destroyed. Among the 22 people who were murdered was Sergio Vieira de Mello. Over the decades, this good and brave man from Brazil gave help to the afflicted in Bangladesh, Cypress, Mozambique, Lebanon, Cambodia, Central Africa, Kosovo, and East Timor, and was aiding the people of Iraq in their time of need. America joins you, his colleagues, in honoring the memory of Senor Vieira de Mello.


A mechanical calculation shows that only 82% of the words are covered by the JACET8000. However, if we put aside the proper nouns and numerals as in the case of a newspaper article, all words in Text 3 are included in the JACET8000.

Our analysis of three example passages proves that the JACET8000 effectively covers the vocabulary in not only easy reading materials but also authentic contemporary English texts.


1 person has voted this message useful





emk
Diglot
Moderator
United States
Joined 5531 days ago

2615 posts - 8806 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 
 Message 2 of 14
04 November 2014 at 5:45pm | IP Logged 
Hashimi wrote:
Even if he understand 98% of all words, this means that he will encounter THOUSANDS of unknown words in a typical book! ( 1 unknown word on every 2nd line of text, 16 unknown words per page).

The good news is that practically, these calculations are wrong!

Um, yes, those calculatutions are indeed wrong. If you know 98% of the words on a page, that means you are unfamiliar with 2%. If we assume 350 words per page, then 2% of 350 words gives us 7 unknown words. Not 16.

1 person has voted this message useful



Elexi
Senior Member
United Kingdom
Joined 5564 days ago

938 posts - 1840 votes 
Speaks: English*
Studies: French, German, Latin

 
 Message 3 of 14
04 November 2014 at 6:00pm | IP Logged 
Where does Professor Nation say you need 40,000 words?

In the linked article , Nation says exactly what the OP stated - 'If we take 98% as the
ideal coverage, a 8,000–9,000 word-family vocabulary is needed for dealing with written
text, and 6,000–7,000 families for dealing with spoken text'. I am citing here from the
Conclusion.

http://www.victoria.ac.nz/lals/about/staff/publications/paul -nation/2006-How-large-a-
vocab.pdf

4 persons have voted this message useful



Cabaire
Senior Member
Germany
Joined 5598 days ago

725 posts - 1352 votes 

 
 Message 4 of 14
04 November 2014 at 6:01pm | IP Logged 
With the surge of assistance, the tiny ***** that serves this ancient backwater on the Silk Road was *****.

I think it is proved that I cannot understand this sentence.

1 person has voted this message useful



Ezy Ryder
Diglot
Senior Member
Poland
youtube.com/user/Kat
Joined 4348 days ago

284 posts - 387 votes 
Speaks: Polish*, English
Studies: Mandarin, Japanese

 
 Message 5 of 14
04 November 2014 at 6:05pm | IP Logged 
Have you checked a paragraph or two from a novel?
@Cabaire, a single sentence might be more difficult to understand, than a whole paragraph or
chapter, because you lack quite a lot of the context.

Edited by Ezy Ryder on 04 November 2014 at 6:07pm

2 persons have voted this message useful



Hashimi
Senior Member
Oman
Joined 6258 days ago

362 posts - 529 votes 
Speaks: Arabic (Written)*
Studies: English, Japanese

 
 Message 6 of 14
04 November 2014 at 6:48pm | IP Logged 

Quote:
If you know 98% of the words on a page, that means you are unfamiliar with 2%. If we assume 350 words per page, then 2% of 350 words gives us 7 unknown words. Not 16.


Why do you assume 350 words per page?!

I'm not talking about graded readers or Harry Potter-like fiction!

There are 600 to 1000 words per page in most academic books.

http://anycount.com/WordCountBlog/how-many-words-in-one-page /


Even if there were 7 unknown words per page, this means 2100 unknown words in a 300 pages book.


***

Quote:
@Cabaire, a single sentence might be more difficult to understand, than a whole paragraph or
chapter, because you lack quite a lot of the context.


That's right.

I myself don't know the meaning of "bereaved" and "clogged", but I understant that paragraph clearly because of the context.


3 persons have voted this message useful



Serpent
Octoglot
Senior Member
Russian Federation
serpent-849.livejour
Joined 6596 days ago

9753 posts - 15779 votes 
4 sounds
Speaks: Russian*, English, FinnishC1, Latin, German, Italian, Spanish, Portuguese
Studies: Danish, Romanian, Polish, Belarusian, Ukrainian, Croatian, Slovenian, Catalan, Czech, Galician, Dutch, Swedish

 
 Message 7 of 14
04 November 2014 at 6:55pm | IP Logged 
emk wrote:
Hashimi wrote:
Even if he understand 98% of all words, this means that he will encounter THOUSANDS of unknown words in a typical book! ( 1 unknown word on every 2nd line of text, 16 unknown words per page).

The good news is that practically, these calculations are wrong!

Um, yes, those calculatutions are indeed wrong. If you know 98% of the words on a page, that means you are unfamiliar with 2%. If we assume 350 words per page, then 2% of 350 words gives us 7 unknown words. Not 16.

Also, 7 unknown words on an individual page won't necessarily amount to 700 words in a 100-page long book.
2 persons have voted this message useful



luke
Diglot
Senior Member
United States
Joined 7204 days ago

3133 posts - 4351 votes 
Speaks: English*, Spanish
Studies: Esperanto, French

 
 Message 8 of 14
04 November 2014 at 7:43pm | IP Logged 
Hashimi wrote:
I myself don't know the meaning of "bereaved" and "clogged", but I understant that paragraph clearly because of the context.


I believe you.

You bring up a point from Professor Arguelles talk on reading vocabulary. What he says is that with less than 98% text coverage, one gradually uses the thread of a story. He also said that with 91-97% coverage one may think they are getting most of the story, but they are only getting the gist and they may not realize their comprehension isn't quite as good as they imagine.


4 persons have voted this message useful



This discussion contains 14 messages over 2 pages: 2  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3438 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.