luhmann Senior Member Brazil Joined 5343 days ago 156 posts - 271 votes Speaks: Portuguese* Studies: Mandarin, French, English, Italian, Spanish, Persian, Arabic (classical)
| Message 1 of 14 19 March 2014 at 1:40pm | IP Logged |
Hi,
I have recently developed a method that can make reading much easier, and, by the by, it may give your the learning benefits synaesthesia is touted to have.
It consists of getting your reading material from a POS tagged corpus, then processing it so that different classes of words show in different colour, making thus sentence structure transparent at first glance.
My first experience with this makes my reading (of Classical Chinese) much easier. The corpus I'm using not only marks part of speech, but also identifies person and place names, which is a huge help.
my script (in Python 2.x):
http://pastebin.com/cyx1xYz4
POS tagged corpora I plan to use in the future:
http://ancientchinese.sinica.edu.tw/
http://early_mandarin.ling.sinica.edu.tw/
http://ece.ut.ac.ir/dbrg/bijankhan/ (for Persian)
Edited by luhmann on 19 March 2014 at 2:46pm
6 persons have voted this message useful
|
emk Diglot Moderator United States Joined 5542 days ago 2615 posts - 8806 votes Speaks: English*, FrenchB2 Studies: Spanish, Ancient Egyptian Personal Language Map
| Message 2 of 14 19 March 2014 at 1:51pm | IP Logged |
Great idea!
I keep on thinking that I ought to color-code a French corpus to indicate gender, with additional boldface for endings that could be used to predict the gender of nouns and adjectives. I really like your idea of starting from a pre-tagged corpus, which would certainly make the results more accurate.
2 persons have voted this message useful
|
luke Diglot Senior Member United States Joined 7215 days ago 3133 posts - 4351 votes Speaks: English*, Spanish Studies: Esperanto, French
| Message 3 of 14 19 March 2014 at 3:41pm | IP Logged |
I like that idea too. Especially marking gender in French, which isn't always obvious, at least when compared to Spanish.
1 person has voted this message useful
|
Retinend Triglot Senior Member SpainRegistered users can see my Skype Name Joined 4318 days ago 283 posts - 557 votes Speaks: English*, German, Spanish Studies: Arabic (Written), French
| Message 4 of 14 19 March 2014 at 11:43pm | IP Logged |
I do this by hand with highlighters. With German I was more elaborate, with 4
"dimensions" of colour coded annoatation: gender (blue yellow pink), verb-preposition
collocation (green), plural type (red) and case (orange; marking the case endings which
I had the most trouble with at a given point). It all evolved from simply marking
gender. With Spanish now I'm marking gender with blue and pink, and marking with red
the morphological deviation from the infinitive, along with type of infinitive. I make
the annotations at a later date from the original handwriting, and when I annotate
along one dimension, I focus only on that task. This adds a richer analysis of material
that I have already internalized. And it pushes it deeper. I never recalled with
accuracy the plurals, genders or verb-preposition collocations in German before I
employed these glossing techniques on top of my usual shadowing and writing activities.
Also, I don't commonly re-read my own notebook. The activity alone was enough to fix
these details in my memory. The repetitious nature of the mass of vocabulary meant that
I got sick of highlighting certain words and hence they became "obvious." Also, my
awareness of the "shortcuts" to ascertaining gender were sharpened. The same for
plurals. And for the conceptual framework of German prepositions. All due the colours
and the PROCESS.
I suppose that the effect of this colour coding is fully analogous to the computerized
version that you're all talking about, although I can definitely say that the physical
involvement of the pen and highlighter approach, and the sentimental, aesthetic
investment that you have with your own written pages, is something I would never
sacrifice personally.
The fact you have to do the annotations yourself is a good way of going deep into
already digested material and metaphorically ironing out the small kinks you have in
your knowledge of them. I recommend it to everyone as a very cheap trick to incorporate
into your self-study habits.
7 persons have voted this message useful
|
Glarus Girl Groupie United Kingdom Joined 4585 days ago 50 posts - 108 votes Speaks: English* Studies: German, Swiss-German
| Message 5 of 14 20 March 2014 at 1:04am | IP Logged |
There is an app for Chrome Genusly that highlights the gender in yellow,
blue and pink for German., which helps to spot accusative and dative. It only works in gmail I think but I alao
use it to check other things then just cut and paste elsewhere. The colours do not show up once you've sent
the email.
I have just started to use highlighters to check that my writing has the subject, verb, time, manner and place
in the right order. Makes for a bright, messy rough copy but helps enormously!
4 persons have voted this message useful
|
dmaddock1 Senior Member United States Joined 5443 days ago 174 posts - 426 votes Speaks: English* Studies: Italian, Esperanto, Latin, Ancient Greek
| Message 6 of 14 20 March 2014 at 2:42pm | IP Logged |
I've been meaning to try something like this with Latin for years. Transitioning from graded readers to original content, especially poetry, means getting used to less forgiving word order. Ambiguous case endings are so much harder to correctly identify when jumbled around for poetic effect. I'm going to give the highlighter method a go.
1 person has voted this message useful
|
Yaan Triglot Groupie France Joined 4084 days ago 61 posts - 88 votes Speaks: French*, English, Mandarin Studies: Spanish, Esperanto
| Message 7 of 14 21 March 2014 at 10:31am | IP Logged |
Very interesting initiative! I also find useful to have that kind of POS color code while learning languages.
Since tagged texts are quite rare, I think that POS taggers that work on untagged plain texts may have more potential. I found some python libraries that can do that and they may interest you.
There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger, here is an example:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
It seems that it won't work for chinese out of the box, but nltk can be trained. Here is an example:
https://github.com/a33kuo/postagger_zh
Edited by Yaan on 21 March 2014 at 10:31am
2 persons have voted this message useful
|
Doitsujin Diglot Senior Member Germany Joined 5330 days ago 1256 posts - 2363 votes Speaks: German*, English
| Message 8 of 14 21 March 2014 at 12:01pm | IP Logged |
Yaan wrote:
There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger ... |
|
|
Unfortunately, there are not that many free tagged corpora available. And those that do exist usually don't contain gender information. :-(
1 person has voted this message useful
|