Register  Login  Active Topics  Maps  

Algorithm: words needed to know % of text

 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
23 messages over 3 pages: 1 2
emkaos
Diglot
Newbie
Germany
Joined 5215 days ago

9 posts - 19 votes
Speaks: German*, English
Studies: Korean

 
 Message 17 of 23
06 July 2011 at 2:58am | IP Logged 
Cool.

Maybe you want to check out http://www.nltk.org/
It's a python library to analyze natural language
3 persons have voted this message useful



Cainntear
Pentaglot
Senior Member
Scotland
linguafrankly.blogsp
Joined 5946 days ago

4399 posts - 7687 votes 
Speaks: Lowland Scots, English*, French, Spanish, Scottish Gaelic
Studies: Catalan, Italian, German, Irish, Welsh

 
 Message 18 of 23
06 July 2011 at 7:58am | IP Logged 
emkaos wrote:
Cool.

Maybe you want to check out http://www.nltk.org/
It's a python library to analyze natural language

Ooh... just the thing....
Time for me to learn Python finally....
1 person has voted this message useful



Doitsujin
Diglot
Senior Member
Germany
Joined 5255 days ago

1256 posts - 2363 votes 
Speaks: German*, English

 
 Message 19 of 23
06 July 2011 at 11:10am | IP Logged 
Cainntear wrote:
Ooh... just the thing....
Time for me to learn Python finally....

Alternatively, you could download (Python-based) TextStat which should be sufficient for most language learner needs. Not only can it analyze .txt and .html files, it can also generate frequency lists, KWICs etc. from web sites and newsgroups.

It doesn't come with a help file, but a short English user guide for an earlier version can be downloaded here.

Edited by Doitsujin on 06 July 2011 at 11:35am

4 persons have voted this message useful



jimbo
Tetraglot
Senior Member
Canada
Joined 6229 days ago

469 posts - 642 votes 
Speaks: English*, Mandarin, Korean, French
Studies: Japanese, Latin

 
 Message 20 of 23
06 July 2011 at 12:51pm | IP Logged 
Oh, too cool.
1 person has voted this message useful



Lianne
Senior Member
Canada
thetoweringpile.blog
Joined 5050 days ago

284 posts - 410 votes 
Speaks: English*
Studies: Esperanto, Toki Pona, German, French

 
 Message 21 of 23
07 July 2011 at 1:15am | IP Logged 
I'm so happy I stumbled across this thread! I just ran it on La Mirinda Sorĉisto de Oz (The Wonderful Wizard of Oz in Esperanto). It's 35029 words long (including the title and a little forward), with 4859 unique words. Looking at the top of the list, I already know the top 18 well (except Doroteo, which I realised means Dorothy, so that doesn't count). Number 19 is birdotimigilo. I had to look that one up. It's obvious to me now that I know... it means scarecrow! I love that that's in the top 20 words of this book. So now I can talk about scarecrows in Esperanto, and who doesn't want that? The next word I didn't know was number 38, lignohakisto, which means a person who chops wood.

I think I'll use this list to choose the vocabulary I'll learn for the next little bit, so that the result will be me being able to read this story. That'll be my incentive!

Edited by Lianne on 07 July 2011 at 1:16am

1 person has voted this message useful



zuneybunny
Diglot
Newbie
United States
turkishtrip.wordpres
Joined 4872 days ago

32 posts - 52 votes 
Speaks: English, Mandarin*
Studies: Spanish, Turkish

 
 Message 22 of 23
07 July 2011 at 5:22am | IP Logged 
Quote:
I think I'll use this list to choose the vocabulary I'll learn for the next little
bit, so that the result will be me being able to read this story. That'll be my
incentive!

Yup, glad to know you're taking advantage of the program!

I think it's a great way to prepare reading a book in foreign language, tackle the first
50-100 most common words first. It'll make reading much easier!
1 person has voted this message useful



fnord
Triglot
Groupie
Switzerland
Joined 4968 days ago

71 posts - 124 votes 
Speaks: German*, Swiss-German, English
Studies: Luxembourgish, Dutch

 
 Message 23 of 23
08 July 2011 at 4:18am | IP Logged 
Cabaire wrote:
"This * me *, because I'm * * as well, and I can't * the *
of not * how much * I should * before I could * * the *.

So... I * * a * * * that * the # of * needed
to * a % of a *, here are the *
"

The lesson to draw: content words are the rarer words.

On the other hand, the rarer words are also often the easier words to infer. This is, of course, only true for
languages that are somehow related, thus sharing a greater amount of cognates and loanwords.

Let's take your example and pretend that - in addition to the given words and a really basic grasp of English
pronunciation and grammar - I knew only my native German. I'd try to fill in the dots (or rather asterisks here) by
trying to match the most fitting cognates and loanwords.


learn = lernen
Turkish = Türkisch
fact = Fakt
Turkish = Türkisch
tackle = (loanword in sports)
book = buch

simple = simpel
program = Programm
calculate = kalkulieren
word = Wort
book = Buch
result = Resultat

This gives:

This * me *, because I'm learning Turkish as well, and I can't * the fact
of not * how much Turkish I should * before I could * tackle the book.

So... I * * a simple * program that calculates the # of words needed
to * a % of a book, here are the results.


Not too bad, isn't it?

One of the supposedly still "unknown" words remaining is "to know"- which we can safely assume to be known to
any reader, as it's one of the most common verbs in English - even appearing twice in the example. From context
it would be rather easy to suppose that "pulling together" roughly means something like "to make" a simple
program.

Granted, English and German are more closely related than almost any other two languages. And there's always
the danger of running into false friends that could rather hinder than help ("curious" is a great example). But
generally my German/English/Latin and a tiny little French are often quite helpful when it comes to getting the
gist of texts in an Italic and / or Germanic language. Whether that alone makes for a good read is another
question. ;-)


Edited by fnord on 08 July 2011 at 4:24am



2 persons have voted this message useful



This discussion contains 23 messages over 3 pages: << Prev 1 2

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.2344 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.