emkaos Diglot Newbie Germany Joined 5215 days ago 9 posts - 19 votes Speaks: German*, English Studies: Korean
| Message 17 of 23 06 July 2011 at 2:58am | IP Logged |
Cool.
Maybe you want to check out http://www.nltk.org/
It's a python library to analyze natural language
3 persons have voted this message useful
|
Cainntear Pentaglot Senior Member Scotland linguafrankly.blogsp Joined 5946 days ago 4399 posts - 7687 votes Speaks: Lowland Scots, English*, French, Spanish, Scottish Gaelic Studies: Catalan, Italian, German, Irish, Welsh
| Message 18 of 23 06 July 2011 at 7:58am | IP Logged |
emkaos wrote:
Cool.
Maybe you want to check out http://www.nltk.org/
It's a python library to analyze natural language |
|
|
Ooh... just the thing....
Time for me to learn Python finally....
1 person has voted this message useful
|
Doitsujin Diglot Senior Member Germany Joined 5255 days ago 1256 posts - 2363 votes Speaks: German*, English
| Message 19 of 23 06 July 2011 at 11:10am | IP Logged |
Cainntear wrote:
Ooh... just the thing....
Time for me to learn Python finally.... |
|
|
Alternatively, you could download (Python-based) TextStat which should be sufficient for most language learner needs. Not only can it analyze .txt and .html files, it can also generate frequency lists, KWICs etc. from web sites and newsgroups.
It doesn't come with a help file, but a short English user guide for an earlier version can be downloaded here.
Edited by Doitsujin on 06 July 2011 at 11:35am
4 persons have voted this message useful
|
jimbo Tetraglot Senior Member Canada Joined 6229 days ago 469 posts - 642 votes Speaks: English*, Mandarin, Korean, French Studies: Japanese, Latin
| Message 20 of 23 06 July 2011 at 12:51pm | IP Logged |
Oh, too cool.
1 person has voted this message useful
|
Lianne Senior Member Canada thetoweringpile.blog Joined 5050 days ago 284 posts - 410 votes Speaks: English* Studies: Esperanto, Toki Pona, German, French
| Message 21 of 23 07 July 2011 at 1:15am | IP Logged |
I'm so happy I stumbled across this thread! I just ran it on La Mirinda Sorĉisto de Oz (The Wonderful Wizard of Oz in Esperanto). It's 35029 words long (including the title and a little forward), with 4859 unique words. Looking at the top of the list, I already know the top 18 well (except Doroteo, which I realised means Dorothy, so that doesn't count). Number 19 is birdotimigilo. I had to look that one up. It's obvious to me now that I know... it means scarecrow! I love that that's in the top 20 words of this book. So now I can talk about scarecrows in Esperanto, and who doesn't want that? The next word I didn't know was number 38, lignohakisto, which means a person who chops wood.
I think I'll use this list to choose the vocabulary I'll learn for the next little bit, so that the result will be me being able to read this story. That'll be my incentive!
Edited by Lianne on 07 July 2011 at 1:16am
1 person has voted this message useful
|
zuneybunny Diglot Newbie United States turkishtrip.wordpres Joined 4872 days ago 32 posts - 52 votes Speaks: English, Mandarin* Studies: Spanish, Turkish
| Message 22 of 23 07 July 2011 at 5:22am | IP Logged |
Quote:
I think I'll use this list to choose the vocabulary I'll learn for the next little
bit, so that the result will be me being able to read this story. That'll be my
incentive! |
|
|
Yup, glad to know you're taking advantage of the program!
I think it's a great way to prepare reading a book in foreign language, tackle the first
50-100 most common words first. It'll make reading much easier!
1 person has voted this message useful
|
fnord Triglot Groupie Switzerland Joined 4968 days ago 71 posts - 124 votes Speaks: German*, Swiss-German, English Studies: Luxembourgish, Dutch
| Message 23 of 23 08 July 2011 at 4:18am | IP Logged |
Cabaire wrote:
"This * me *, because I'm * * as well, and I can't * the *
of not * how much * I should * before I could * * the *.
So... I * * a * * * that * the # of * needed
to * a % of a *, here are the *"
The lesson to draw: content words are the rarer words. |
|
|
On the other hand, the rarer words are also often the easier words to infer. This is, of course, only true for
languages that are somehow related, thus sharing a greater amount of cognates and loanwords.
Let's take your example and pretend that - in addition to the given words and a really basic grasp of English
pronunciation and grammar - I knew only my native German. I'd try to fill in the dots (or rather asterisks here) by
trying to match the most fitting cognates and loanwords.
learn = lernen
Turkish = Türkisch
fact = Fakt
Turkish = Türkisch
tackle = (loanword in sports)
book = buch
simple = simpel
program = Programm
calculate = kalkulieren
word = Wort
book = Buch
result = Resultat
This gives:
This * me *, because I'm learning Turkish as well, and I can't * the fact
of not * how much Turkish I should * before I could * tackle the book.
So... I * * a simple * program that calculates the # of words needed
to * a % of a book, here are the results.
Not too bad, isn't it?
One of the supposedly still "unknown" words remaining is "to know"- which we can safely assume to be known to
any reader, as it's one of the most common verbs in English - even appearing twice in the example. From context
it would be rather easy to suppose that "pulling together" roughly means something like "to make" a simple
program.
Granted, English and German are more closely related than almost any other two languages. And there's always
the danger of running into false friends that could rather hinder than help ("curious" is a great example). But
generally my German/English/Latin and a tiny little French are often quite helpful when it comes to getting the
gist of texts in an Italic and / or Germanic language. Whether that alone makes for a good read is another
question. ;-)
Edited by fnord on 08 July 2011 at 4:24am
2 persons have voted this message useful
|