Register  Login  Active Topics  Maps  

Color-coding part of speech

  Tags: Colors/Colours
 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
14 messages over 2 pages: 1
luhmann
Senior Member
Brazil
Joined 5092 days ago

156 posts - 271 votes 
Speaks: Portuguese*
Studies: Mandarin, French, English, Italian, Spanish, Persian, Arabic (classical)

 
 Message 9 of 14
21 March 2014 at 3:46pm | IP Logged 
For tagging gender in french, you could use the database from Lexique.org, which is a dictionary of inflected forms containing grammatical information.

Also, one may also want to play with these:
http://en.wikipedia.org/wiki/Treebank
2 persons have voted this message useful



chokofingrz
Pentaglot
Senior Member
England
Joined 4948 days ago

241 posts - 430 votes 
Speaks: English*, French, Spanish, German, Italian
Studies: Russian, Japanese, Catalan, Luxembourgish

 
 Message 10 of 14
21 March 2014 at 11:34pm | IP Logged 
Yaan wrote:

There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger, here is an example:


I find this library pretty interesting and I might have a play with it this weekend, because I am intrigued by the challenge of parsing Russian text to extract some useful "root" vocabulary words rather than all the genitives and datives that Lingocracy wants me to learn!
1 person has voted this message useful



Yaan
Triglot
Groupie
France
Joined 3833 days ago

61 posts - 88 votes 
Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

 
 Message 11 of 14
22 March 2014 at 3:05am | IP Logged 
chokofingrz wrote:
Yaan wrote:

There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger, here is an example:


I find this library pretty interesting and I might have a play with it this weekend, because I am intrigued by the challenge of parsing Russian text to extract some useful "root" vocabulary words rather than all the genitives and datives that Lingocracy wants me to learn!


Here is the nltk class that may help you for finding the stem: nltk.stem.snowball.RussianStemmer
It seems that nlkt stemmers are based on another project called SnowBall, you can try the stemmer here: http://text-processing.com/demo/stem/

However, the stem is not exactly what we want as language learners, for example in English the stem of "removing" is "remov", but we want to learn "remove".

What you are looking for seems to be the lemma.
Wikipedia wrote:
In morphology and lexicography, a lemma (plural lemmas or lemmata) is the canonical form, dictionary form, or citation form of a set of words (headword)[citation needed]. In English, for example, run, runs, ran and running are forms of the same lexeme,
with run as the lemma.

Wikipedia entry for Lemma: http://en.wikipedia.org/wiki/Lemma_(morphology)


Edited by Yaan on 22 March 2014 at 3:05am

2 persons have voted this message useful



Doitsujin
Diglot
Senior Member
Germany
Joined 5079 days ago

1256 posts - 2363 votes 
Speaks: German*, English

 
 Message 12 of 14
22 March 2014 at 8:31am | IP Logged 
chokofingrz wrote:
I find this library pretty interesting and I might have a play with it this weekend, because I am intrigued by the challenge of parsing Russian text to extract some useful "root" vocabulary words rather than all the genitives and datives that Lingocracy wants me to learn!

You might find this Github website helpul, which hosts English, French, Italian, German, Spanish, Portuguese, Polish and Russian inflection lists, which make reducing inflected forms to their canonical forms relatively easy.

The entries have the following format:

Code:
окно: окно, окна, окна, окон, окну, окнам, окно, окна, окном, окнами, окне, окнах

As you can see, the list contains some redundant entries. However, the author also provided an undocumented Ruby script that apparently removes these entries. Since I'm not familiar with Ruby, I'm not 100% sure, though.

4 persons have voted this message useful



Yaan
Triglot
Groupie
France
Joined 3833 days ago

61 posts - 88 votes 
Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

 
 Message 13 of 14
22 March 2014 at 12:02pm | IP Logged 
Doitsujin wrote:
You might find this
Github website
helpul, which hosts English, French, Italian, German, Spanish, Portuguese, Polish and Russian
inflection lists, which make reducing inflected forms to their canonical forms relatively easy.


Very interesting resource! thank you for sharing that :)
I'm wondering what is the source of those dictionaries, and what is the methodology used to build them. It could be
great if more languages are supported.

The method used is a bit "brute force" with a list of all possible combinations that lead to huge dictionary files, the
files Polish and Russian dictionaries' size are respectively 57mo and 64mo.
1 person has voted this message useful



chokofingrz
Pentaglot
Senior Member
England
Joined 4948 days ago

241 posts - 430 votes 
Speaks: English*, French, Spanish, German, Italian
Studies: Russian, Japanese, Catalan, Luxembourgish

 
 Message 14 of 14
22 March 2014 at 4:10pm | IP Logged 
Doitsujin wrote:

You might find this Github website helpul, which hosts English, French, Italian, German, Spanish, Portuguese, Polish and Russian inflection lists, which make reducing inflected forms to their canonical forms relatively easy.


Great, this is precisely what I was attempting to piece together in a massive spreadsheet last night!

I also have a POS-tagged frequency list of Russian lemmas from here, so ultimately hoping to parse some text, stem the words, and match against lemmas to generate a frequency-sorted vocabulary list.

If I get anywhere I'll share the results here.


1 person has voted this message useful



This discussion contains 14 messages over 2 pages: << Prev 1

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.3125 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.