Register  Login  Active Topics  Maps  

Algorithm: words needed to know % of text

 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
23 messages over 3 pages: 13  Next >>
translator2
Senior Member
United States
Joined 6854 days ago

848 posts - 1862 votes 
Speaks: English*

 
 Message 9 of 23
03 July 2011 at 5:28pm | IP Logged 
Iversen wrote:
Knowing the first 38% or so is like having the empty shelves ready in a supermarket, but all the goodies which make you come there are in the upper 62%


Nice one!
1 person has voted this message useful



zuneybunny
Diglot
Newbie
United States
turkishtrip.wordpres
Joined 4872 days ago

32 posts - 52 votes 
Speaks: English, Mandarin*
Studies: Spanish, Turkish

 
 Message 10 of 23
04 July 2011 at 5:25am | IP Logged 
Quote:
For an English-speaker, Dutch vocabulary is fairly transparent. Not so Turkish
for a Mandarin speaker!

I'm actually an English speaker. Yes, I did put down Mandarin as first native language
(born in China), but I live in the US and mainly speak English :)

Quote:
Out of curiosity, approximately how many words would you need to know to be able
to read, say, 85% of the text?

You would need 6105!
Obviously, the higher % you want, the words increase exponentially. For example you
only need 2227 words for 70% comprehension, which is 1/3 of what you need for 85%!

But like some replied, this isn't really a percentage of comprehension. It's just nice
to know :D
1 person has voted this message useful



leosmith
Senior Member
United States
Joined 6485 days ago

2365 posts - 3804 votes 
Speaks: English*
Studies: Tagalog

 
 Message 11 of 23
04 July 2011 at 7:47am | IP Logged 
Iversen wrote:
Knowing the first 38% or so is like having the empty shelves ready in a supermarket, but all the
goodies which make you come there are in the upper 62%

I think it's better to turn all of these into swim-related:
Knowing the first 38% or so is like going to do the 100 meter crawl, but without your torso, head, or left arm.
2 persons have voted this message useful



jean-luc
Senior Member
France
Joined 4895 days ago

100 posts - 150 votes 
Speaks: French*
Studies: German

 
 Message 12 of 23
04 July 2011 at 10:03am | IP Logged 
As a side note, are you willing to share your python script ? I would be interested to have a look on some texts I have.
2 persons have voted this message useful



zuneybunny
Diglot
Newbie
United States
turkishtrip.wordpres
Joined 4872 days ago

32 posts - 52 votes 
Speaks: English, Mandarin*
Studies: Spanish, Turkish

 
 Message 13 of 23
04 July 2011 at 4:42pm | IP Logged 
Quote:
As a side note, are you willing to share your python script ? I would be
interested to have a look on some texts I have.

Of course :)

http://codepad.org/9T0Ew6Cd

Replace "harrypotter.txt" with whatever your input text file is. It'll create a file
named "freqlist.txt" after you run it.
4 persons have voted this message useful



zerothinking
Senior Member
Australia
Joined 6307 days ago

528 posts - 772 votes 
Speaks: English*

 
 Message 14 of 23
04 July 2011 at 5:44pm | IP Logged 
The content words which make up most of the meaning are the rarer words. Knowing 80% of
the words on the page does not mean understanding 80% of the text. This is something all
language learners will learn. I know I got a rude awakening at how much more I had to
learn when I first opened a French novel.
1 person has voted this message useful



jean-luc
Senior Member
France
Joined 4895 days ago

100 posts - 150 votes 
Speaks: French*
Studies: German

 
 Message 15 of 23
04 July 2011 at 10:25pm | IP Logged 
zuneybunny wrote:

Of course :)


Thanks a lot, it works really well !

I just had to add «» in the regexp (and # -*- coding: utf-8 -*-
in the header) for using it on my German text.
1 person has voted this message useful



Cainntear
Pentaglot
Senior Member
Scotland
linguafrankly.blogsp
Joined 5946 days ago

4399 posts - 7687 votes 
Speaks: Lowland Scots, English*, French, Spanish, Scottish Gaelic
Studies: Catalan, Italian, German, Irish, Welsh

 
 Message 16 of 23
05 July 2011 at 8:38am | IP Logged 
zuneybunny wrote:
Quote:
For an English-speaker, Dutch vocabulary is fairly transparent. Not so Turkish
for a Mandarin speaker!

I'm actually an English speaker. Yes, I did put down Mandarin as first native language
(born in China), but I live in the US and mainly speak English :)

Fair enough, sorry.
(But the same still holds true for Turkish for English speakers anyway....)

zuneybunny wrote:
Quote:
As a side note, are you willing to share your python script ? I would be
interested to have a look on some texts I have.

Of course :)

http://codepad.org/9T0Ew6Cd

Replace "harrypotter.txt" with whatever your input text file is. It'll create a file
named "freqlist.txt" after you run it.

That is extremely useful.

Many, many thanks.


1 person has voted this message useful



This discussion contains 23 messages over 3 pages: << Prev 13  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3125 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.