Register  Login  Active Topics  Maps  

Algorithm: words needed to know % of text

 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
23 messages over 3 pages: 1 2 3  Next >>
zuneybunny
Diglot
Newbie
United States
turkishtrip.wordpres
Joined 4872 days ago

32 posts - 52 votes 
Speaks: English, Mandarin*
Studies: Spanish, Turkish

 
 Message 1 of 23
02 July 2011 at 9:19pm | IP Logged 
I read the someone was reading Harry Potter to
learn Turkish
.

This got me curious, because I'm learning Turkish as well, and I can't stand the fact
of not knowing how much Turkish I should know before I could kinda tackle the book.

So... I pulled together a simple Python program that calculates the # of words needed
to understand a % of a book, here are the results.

The words below are arranged in order of frequency that appears throughout the Turkish
Harry Potter and the Sorcerer's Stone (Harry Potter ve Felsefe Tasi)

The list shows the word, followed by the percentage of text you would understand if you
studied all the words up to that point. As you can see, studying about 200 words gives
you about 37% of comprehension of all words in the book.

NOTE: this is a very ROUGH estimate. I did not take into account of person names
(otherwise percentage would go DOWN), nor words in different tenses (otherwise
percentage would go UP).

It kind of surprises me that you don't need that many words to get a ROUGH
understanding of the book.

What do you guys think??

1.bir 2%
2.harry 4%
3. 5%
4.dedi 6%
5.de 8%
6.da 8%
7.ama 9%
8.diye 10%
9.bu 11%
10.o 11%
11.ne 12%
12.ron 12%
13.birs 13%
14.ey 13%
15.daha 14%
16.hagrid 14%
17.için 15%
18.kadar 15%
19.mi 16%
20.sonra 16%
21.onu 17%
22.hermione 17%
23.profesör 17%
24.bile 18%
25.gibi 18%
26.ki 18%
27.harrynin 19%
28.hiç 19%
29.öyle 19%
30.çok 19%
31.onun 20%
32.en 20%
33.bütün 20%
34.nasil 20%
35.ona 21%
36.ben 21%
37.vernon 21%
38.yok 21%
39.yine 21%
40.vardi 22%
41.sen 22%
42.bunu 22%
43.var 22%
44.tek 22%
45.snape 22%
46.dumbledore 23%
47.hemen 23%
48.imdi 23%
49.degil 23%
50.pek 23%
51.içinde 24%
52.önce 24%
53.oldu 24%
54.iki 24%
55.oldugunu 24%
56.ya 24%
57.dogru 25%
58.ve 25%
59.sanki 25%
60.büyük 25%
61.mcgonagall 25%
62.bakti 25%
63.harryye 26%
64.potter 26%
65.mr 26%
66.dudley 26%
67.neville 26%
68.eniste 26%
69.baska 26%
70.malfoy 27%
71.her 27%
72.artik 27%
73.bana 27%
74.sordu 27%
75.biri 27%
76.on 27%
77.iyi 27%
78.evet 28%
79.tam 28%
80.biraz 28%
81.ilk 28%
82.uzun 28%
83.sadece 28%
84.beni 28%
85.yere 28%
86.sonunda 29%
87.yoktu 29%
88.birkaç 29%
89.gözlerini 29%
90.üstüne 29%
91.herkes 29%
92.gün 29%
93.üç 29%
94.belki 29%
95.onlari 30%
96.gece 30%
97.ansizin 30%
98.seni 30%
99.fark 30%
100.üstünde 30%
101.çünkü 30%
102.harryyle 30%
103.petunia 30%
104.kendi 30%
105.herhalde 30%
106.geldi 31%
107.bagirdi 31%
108.son 31%
109.ronun 31%
110.eyler 31%
111.etti 31%
112.zaten 31%
113.sesle 31%
114.quirrell 31%
115.hagridin 31%
116.güzel 31%
117.gördü 32%
118.burada 32%
119.basladi 32%
120.sana 32%
121.mrs 32%
122.ise 32%
123.senin 32%
124.kere 32%
125.orada 32%
126.hers 32%
127.fisildadi 32%
128.dursley 32%
129.agir 33%
130.garip 33%
131.dört 33%
132.atti 33%
133.aldi 33%
134.çocuk 33%
135.kendini 33%
136.hayir 33%
137.harryyi 33%
138.zaman 33%
139.madam 33%
140.gözleri 33%
141.eyi 34%
142.döndü 34%
143.böyle 34%
144.bak 34%
145.boyuna 34%
146.quidditch 34%
147.onlara 34%
148.küçük 34%
149.kimse 34%
150.gryffindor 34%
151.degildi 34%
152.büyücü 34%
153.büyü 34%
154.yerde 34%
155.kim 35%
156.hadi 35%
157.wood 35%
158.snapein 35%
159.hiçbirs 35%
160.hep 35%
161.anda 35%
162.öteki 35%
163.yoksa 35%
164.yeni 35%
165.yaninda 35%
166.weasley 35%
167.siz 35%
168.piril 35%
169.olsa 35%
170.neden 36%
171.kocaman 36%
172.hâlâ 36%
173.hizla 36%
174.göz 36%
175.gitti 36%
176.bakalim 36%
177.teyze 36%
178.mu 36%
179.elini 36%
180.acaba 36%
181.tasi 36%
182.miydi 36%
183.misin 36%
184.karanlik 36%
185.dev 36%
186.basini 37%
187.çikardi 37%
188.yüz 37%
189.yanina 37%
190.yani 37%
191.siyah 37%
192.sinif 37%
193.gerek 37%
194.duydu 37%
195.biz 37%
196.ayaga 37%
197.çikti 37%
198.yana 37%
199.onlar 37%
200.duruyordu 37%
201.basina 37%

The program can be found here:
http://codepad.org/9T0Ew6Cd

Replace "harrypotter.txt" with whatever your input text file is. It'll create a file
named "freqlist.txt" after you run it.

Edited by zuneybunny on 05 July 2011 at 3:52pm

3 persons have voted this message useful



Volte
Tetraglot
Senior Member
Switzerland
Joined 6374 days ago

4474 posts - 6726 votes 
Speaks: English*, Esperanto, German, Italian
Studies: French, Finnish, Mandarin, Japanese

 
 Message 2 of 23
02 July 2011 at 10:32pm | IP Logged 
You've just reinvented Zipf's law.

3 persons have voted this message useful



Cabaire
Senior Member
Germany
Joined 5534 days ago

725 posts - 1352 votes 

 
 Message 3 of 23
02 July 2011 at 11:13pm | IP Logged 
If you know 38% of the words in a text, your comprehension of it is nearly nil. You need 90~95%.

Read for example your own text with a comprehension of two thirds of the words:

"This * me *, because I'm * * as well, and I can't * the *
of not * how much * I should * before I could * * the *.

So... I * * a * * * that * the # of * needed
to * a % of a *, here are the *
"

The lesson to draw: content words are the rarer words.
8 persons have voted this message useful



zuneybunny
Diglot
Newbie
United States
turkishtrip.wordpres
Joined 4872 days ago

32 posts - 52 votes 
Speaks: English, Mandarin*
Studies: Spanish, Turkish

 
 Message 4 of 23
03 July 2011 at 2:29am | IP Logged 
Ok, I see what you mean, that comprehension of keywords are more important than the
quantity of words you know.

But still, I think knowing 38% of the words drastically increase your pleasurability in
reading things, because now you can only look up 1 or 2 keywords per sentence, instead of
keep looking up unimportant words like "and", "or", etc.
1 person has voted this message useful





Iversen
Super Polyglot
Moderator
Denmark
berejst.dk
Joined 6638 days ago

9078 posts - 16473 votes 
Speaks: Danish*, French, English, German, Italian, Spanish, Portuguese, Dutch, Swedish, Esperanto, Romanian, Catalan
Studies: Afrikaans, Greek, Norwegian, Russian, Serbian, Icelandic, Latin, Irish, Lowland Scots, Indonesian, Polish, Croatian
Personal Language Map

 
 Message 5 of 23
03 July 2011 at 8:29am | IP Logged 
Knowing the first 38% or so is like having the empty shelves ready in a supermarket, but all the goodies which make you come there are in the upper 62%
4 persons have voted this message useful



Orangetuesday
Newbie
United Kingdom
orangeeasy.blogspot.
Joined 4828 days ago

1 posts - 1 votes
Studies: Mandarin, Dutch

 
 Message 6 of 23
03 July 2011 at 10:45am | IP Logged 
Perhaps it depends on how you do it.

I've been using some audio books in Dutch. The first book I used was Danny the World Champion, by Rhaol Dahl.

I certainly didn't know 37% of the words when I first listened to it. Instead, I listened for the story and I was able to get a grasp of it. This grasping of the story made it enjoyable for me.

Audio's great for moving you along. Also, you get used to common words quickly.

I've probably listened to some parts of Danny the World Champion 6-7 times now. And when I'm reading and listening, I get much more of the details now. I haven't used the dutch dictionary that much (perhaps 15 odd words).
1 person has voted this message useful



Cainntear
Pentaglot
Senior Member
Scotland
linguafrankly.blogsp
Joined 5946 days ago

4399 posts - 7687 votes 
Speaks: Lowland Scots, English*, French, Spanish, Scottish Gaelic
Studies: Catalan, Italian, German, Irish, Welsh

 
 Message 7 of 23
03 July 2011 at 11:33am | IP Logged 
Orangetuesday,

For an English-speaker, Dutch vocabulary is fairly transparent. Not so Turkish for a Mandarin speaker!

You're making the same mistake as Assimil -- you're taking an idea that works well for "easy" language pairs (eg French<->Italian) and trying to suggest it is generally applicable.
1 person has voted this message useful



jimbo
Tetraglot
Senior Member
Canada
Joined 6229 days ago

469 posts - 642 votes 
Speaks: English*, Mandarin, Korean, French
Studies: Japanese, Latin

 
 Message 8 of 23
03 July 2011 at 3:02pm | IP Logged 
zuneybunny wrote:
So... I pulled together a simple Python program that calculates the # of words needed
to understand a % of a book, here are the results.


Cool. Thanks. I haven't started studying Turkish yet but hope to sometime.

Out of curiosity, approximately how many words would you need to know to be able to read, say, 85% of the text?

(Since you have the Python script all set up, might as well ask you to run it a couple more times.) Cheers.

Edited by jimbo on 03 July 2011 at 3:21pm



1 person has voted this message useful



This discussion contains 23 messages over 3 pages: 2 3  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.4063 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.