wojtasskorcz Diglot Newbie Poland Joined 4561 days ago 6 posts - 6 votes Speaks: Polish*, English Studies: German, Spanish
| Message 1 of 6 27 August 2012 at 7:51pm | IP Logged |
Hey guys,
I'm looking for a tool that seems to be hard to find on google. Something, that could
deconjugate German words (e.g. brauche -> brauchen). It seems like a silly thing to
look for, but I explain you already why I need it.
My goal now is to start reading some German literature (first something easy), and I
would like to make it easier by counting the words occurences frequency in a book I am
to read and first get a bit acquainted with those most frequent ones. The problem is,
the generated list is:
1. Bloated with different forms of the same words, therefore it's difficult to find the
ones I don't know.
2. Unreliable, because if one word can be conjugated into more forms than another, it
would probably end up somewhere down on the list, because it's different forms are not
so frequent, although summed up they may constitute a frequent word (I hope it's
understandable).
Now to the point, I'd like to first substitute each word with it's "main" form (e.g.
infinitives for verbs) and then count the occurrences. For this to be done, I need a
table in a text format, something like that:
brauche -> brauchen
brauchst -> brauchen
...
Or a program that would deconjugate the word from the input. Do you have any idea where
I could find something like that?
And if that's not possible, another solution would be to get normal conjugation tables
and write a program to inverse them, but I want to go to the trouble of doing it only
as the last resort.
Thanks in advance
Edited by Fasulye on 28 August 2012 at 7:08am
1 person has voted this message useful
|
Hampie Diglot Senior Member Sweden Joined 6657 days ago 625 posts - 1009 votes Speaks: Swedish*, English Studies: Latin, German, Mandarin
| Message 2 of 6 27 August 2012 at 7:54pm | IP Logged |
I know there is a programme that can do that to latin, and I know one that can do it for akkadian. However: it
requires a lot to make one and it's not easily done in a second.
1 person has voted this message useful
|
wojtasskorcz Diglot Newbie Poland Joined 4561 days ago 6 posts - 6 votes Speaks: Polish*, English Studies: German, Spanish
| Message 3 of 6 27 August 2012 at 7:59pm | IP Logged |
Well, it all depends on the resources you have. If I had some huge tables of conjugation,
in some easy-computer-readable format (let's say .txt), I see no problem in inverting
them to be able to backward-conjugate. But I think such tables are not available either.
Everything nowadays is wrapped in some fancy html code and is not so easily extractable
by computer programs. Although this might be some challenge :)
1 person has voted this message useful
|
Majka Triglot Senior Member Czech Republic kofoholici.wordpress Joined 4655 days ago 307 posts - 755 votes Speaks: Czech*, German, English Studies: French Studies: Russian
| Message 4 of 6 27 August 2012 at 8:59pm | IP Logged |
What you are looking for is called "lemmatizer" and it is often part of POS-tagger (part of speech tagger).
I am using Treetagger and it is working very well for French. Treetagger does work for German and several other languages too.
You can use it with a simple list of words but the ideal use is with complete sentences. Depending on where the word stands, the tagger decides which form of several possible it is. As any automatized process, it doesn't work 100%, but well enough.
I am using it to generate wordlists from complete text (books, articles etc.) and alternatively to generate alternative texts, where first row is the original text and the row below in basic form with marked verb tenses.
5 persons have voted this message useful
|
wojtasskorcz Diglot Newbie Poland Joined 4561 days ago 6 posts - 6 votes Speaks: Polish*, English Studies: German, Spanish
| Message 5 of 6 27 August 2012 at 9:30pm | IP Logged |
Wow! That's amazing! I mean AMAZING! And it has support for so many languages, and I
don't have to develop anything myself (apart from counting frequency, which is 5 mins of
work)! I don't know how to thank you Majka!
Thank you thank you thank you :)
1 person has voted this message useful
|
limey75 Senior Member United Kingdom germanic.eu/ Joined 4397 days ago 119 posts - 182 votes Speaks: English* Studies: German, Norwegian, Old English
| Message 6 of 6 10 November 2012 at 5:01am | IP Logged |
Try Verbix:
http://www.verbix.com/
1 person has voted this message useful
|