Register  Login  Active Topics  Maps  

Color-coding part of speech

  Tags: Colors/Colours
 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
14 messages over 2 pages: 1 2  Next >>
Senior Member
Joined 3730 days ago

156 posts - 271 votes 
Speaks: Portuguese*
Studies: Mandarin, French, English, Italian, Spanish, Persian, Arabic (classical)

 Message 1 of 14
19 March 2014 at 1:40pm | IP Logged 
I have recently developed a method that can make reading much easier, and, by the by, it may give your the learning benefits synaesthesia is touted to have.

It consists of getting your reading material from a POS tagged corpus, then processing it so that different classes of words show in different colour, making thus sentence structure transparent at first glance.

My first experience with this makes my reading (of Classical Chinese) much easier. The corpus I'm using not only marks part of speech, but also identifies person and place names, which is a huge help.

my script (in Python 2.x):

POS tagged corpora I plan to use in the future: (for Persian)

Edited by luhmann on 19 March 2014 at 2:46pm

6 persons have voted this message useful

United States
Joined 3929 days ago

2615 posts - 8805 votes 
Speaks: English*, FrenchB2
Studies: Spanish, Ancient Egyptian
Personal Language Map

 Message 2 of 14
19 March 2014 at 1:51pm | IP Logged 
Great idea!

I keep on thinking that I ought to color-code a French corpus to indicate gender, with additional boldface for endings that could be used to predict the gender of nouns and adjectives. I really like your idea of starting from a pre-tagged corpus, which would certainly make the results more accurate.
2 persons have voted this message useful

Senior Member
United States
Joined 5602 days ago

3133 posts - 4350 votes 
Speaks: English*, Spanish
Studies: Esperanto, French

 Message 3 of 14
19 March 2014 at 3:41pm | IP Logged 
I like that idea too. Especially marking gender in French, which isn't always obvious, at least when compared to Spanish.
1 person has voted this message useful

Senior Member
SpainRegistered users can see my Skype Name
Joined 2705 days ago

283 posts - 557 votes 
Speaks: English*, German, Spanish
Studies: Arabic (Written), French

 Message 4 of 14
19 March 2014 at 11:43pm | IP Logged 
I do this by hand with highlighters. With German I was more elaborate, with 4
"dimensions" of colour coded annoatation: gender (blue yellow pink), verb-preposition
collocation (green), plural type (red) and case (orange; marking the case endings which
I had the most trouble with at a given point). It all evolved from simply marking
gender. With Spanish now I'm marking gender with blue and pink, and marking with red
the morphological deviation from the infinitive, along with type of infinitive. I make
the annotations at a later date from the original handwriting, and when I annotate
along one dimension, I focus only on that task. This adds a richer analysis of material
that I have already internalized. And it pushes it deeper. I never recalled with
accuracy the plurals, genders or verb-preposition collocations in German before I
employed these glossing techniques on top of my usual shadowing and writing activities.
Also, I don't commonly re-read my own notebook. The activity alone was enough to fix
these details in my memory. The repetitious nature of the mass of vocabulary meant that
I got sick of highlighting certain words and hence they became "obvious." Also, my
awareness of the "shortcuts" to ascertaining gender were sharpened. The same for
plurals. And for the conceptual framework of German prepositions. All due the colours
and the PROCESS.

I suppose that the effect of this colour coding is fully analogous to the computerized
version that you're all talking about, although I can definitely say that the physical
involvement of the pen and highlighter approach, and the sentimental, aesthetic
investment that you have with your own written pages, is something I would never
sacrifice personally.

The fact you have to do the annotations yourself is a good way of going deep into
already digested material and metaphorically ironing out the small kinks you have in
your knowledge of them. I recommend it to everyone as a very cheap trick to incorporate
into your self-study habits.

7 persons have voted this message useful

Glarus Girl
United Kingdom
Joined 2972 days ago

50 posts - 108 votes 
Speaks: English*
Studies: German, Swiss-German

 Message 5 of 14
20 March 2014 at 1:04am | IP Logged 
There is an app for Chrome Genusly that highlights the gender in yellow,
blue and pink for German., which helps to spot accusative and dative. It only works in gmail I think but I alao
use it to check other things then just cut and paste elsewhere. The colours do not show up once you've sent
the email.

I have just started to use highlighters to check that my writing has the subject, verb, time, manner and place
in the right order. Makes for a bright, messy rough copy but helps enormously!
4 persons have voted this message useful

Senior Member
United States
Joined 3830 days ago

174 posts - 426 votes 
Speaks: English*
Studies: Italian, Esperanto, Latin, Ancient Greek

 Message 6 of 14
20 March 2014 at 2:42pm | IP Logged 
I've been meaning to try something like this with Latin for years. Transitioning from graded readers to original content, especially poetry, means getting used to less forgiving word order. Ambiguous case endings are so much harder to correctly identify when jumbled around for poetic effect. I'm going to give the highlighter method a go.
1 person has voted this message useful

Joined 2471 days ago

61 posts - 88 votes 
Speaks: French*, English, Mandarin
Studies: Spanish, Esperanto

 Message 7 of 14
21 March 2014 at 10:31am | IP Logged 
Very interesting initiative! I also find useful to have that kind of POS color code while learning languages.

Since tagged texts are quite rare, I think that POS taggers that work on untagged plain texts may have more potential. I found some python libraries that can do that and they may interest you.

There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger, here is an example:
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

It seems that it won't work for chinese out of the box, but nltk can be trained. Here is an example:

Edited by Yaan on 21 March 2014 at 10:31am

2 persons have voted this message useful

Senior Member
Joined 3717 days ago

1255 posts - 2362 votes 
Speaks: German*, English

 Message 8 of 14
21 March 2014 at 12:01pm | IP Logged 
Yaan wrote:
There is a python library called NTLK (Natural Language Processing Toolkit), it has lot of functionalites, among them a POS Tagger ...

Unfortunately, there are not that many free tagged corpora available. And those that do exist usually don't contain gender information. :-(

1 person has voted this message useful

This discussion contains 14 messages over 2 pages: 2  Next >>

Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum

This page was generated in 0.3750 seconds.

DHTML Menu By Milonic JavaScript
Copyright 2020 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.