lwtproject Pentaglot Senior Member Netherlands https://learning-wit Joined 4841 days ago 149 posts - 264 votes Speaks: French, Dutch*, German, English, Mandarin Studies: Italian
| Message 249 of 355 23 January 2012 at 1:44pm | IP Logged |
Luk wrote:
Only one thing: I tried to use it with modern greek and couldn't. |
|
|
Did you try
RegExp Word Characters = \x{0370}-\x{03FF}\x{1F00}-\x{1FFF}
like explained in http://lwt.sourceforge.net/#langsetup
1 person has voted this message useful
|
lwtproject Pentaglot Senior Member Netherlands https://learning-wit Joined 4841 days ago 149 posts - 264 votes Speaks: French, Dutch*, German, English, Mandarin Studies: Italian
| Message 250 of 355 23 January 2012 at 1:46pm | IP Logged |
atama warui wrote:
I really wish I could use it for Japanese, but I don't get how Mecab works :/ Copy & Paste
after importing to LingQ results in cluttering. |
|
|
Did you look here?
http://forum.koohii.com/viewtopic.php?id=8603
1 person has voted this message useful
|
lwtproject Pentaglot Senior Member Netherlands https://learning-wit Joined 4841 days ago 149 posts - 264 votes Speaks: French, Dutch*, German, English, Mandarin Studies: Italian
| Message 251 of 355 23 January 2012 at 1:50pm | IP Logged |
If you ask yourself how to setup a specific language in LWT,
please see also this spreadsheet with setup parameters for different languages:
http://bit.ly/xwF3EJ
Edited by lwtproject on 23 January 2012 at 1:51pm
2 persons have voted this message useful
|
hrhenry Octoglot Senior Member United States languagehopper.blogs Joined 5079 days ago 1871 posts - 3642 votes Speaks: English*, SpanishC2, ItalianC2, Norwegian, Catalan, Galician, Turkish, Portuguese Studies: Polish, Indonesian, Ojibwe
| Message 252 of 355 23 January 2012 at 3:11pm | IP Logged |
Thank you very much! The dictionary I wanted to use (http://www.targmne.com/) is giving me some trouble, so for now at least, I'm just using Google Translate for both word and sentence translations.
R.
==
1 person has voted this message useful
|
atama warui Triglot Senior Member Japan Joined 4650 days ago 594 posts - 985 votes Speaks: German*, English, Japanese
| Message 253 of 355 23 January 2012 at 8:40pm | IP Logged |
Yes. However, using Mecab produces output in I think ANSI, which messes up the text. I have no clue how to utilize this for Japanese, except for putting soace in manually - which is the horror for longer texts. Can't be helped, it seems.
See: http://forum.koohii.com/viewtopic.php?id=9283&action=new
Edited by atama warui on 23 January 2012 at 8:41pm
1 person has voted this message useful
|
hrhenry Octoglot Senior Member United States languagehopper.blogs Joined 5079 days ago 1871 posts - 3642 votes Speaks: English*, SpanishC2, ItalianC2, Norwegian, Catalan, Galician, Turkish, Portuguese Studies: Polish, Indonesian, Ojibwe
| Message 254 of 355 23 January 2012 at 8:51pm | IP Logged |
I have a question (and suggestion) about something I've already run into and would like to see implemented.
Do you have any plans to be able to include other types of media besides audio along with the text? I ask because I just input a piece of text that references people and objects in a picture. It would be nice to be able to have that loaded as well as the text when reading.
R.
==
2 persons have voted this message useful
|
Quabazaa Tetraglot Senior Member United States Joined 5558 days ago 414 posts - 543 votes Speaks: English*, Spanish, German, French Studies: Japanese, Korean, Maori, Scottish Gaelic, Arabic (Levantine), Arabic (Egyptian), Arabic (Written)
| Message 255 of 355 24 January 2012 at 11:19am | IP Logged |
Thank you for this wonderful tool! It's really amazing, I've wanted to be able to use a
LingQ style thing with Arabic for a long time now. Benny many thanks to you as well, the
online version yo released is great!
I got my settings working fine for Arabic as outlined in the spreadsheet - but is there
anyway to separate (the) “ال” (and) “و” from the start of words? It is frustrating to
have to enter every word twice (with or without the) or to put spaces in a text between
the words and the connectors. So what to I put into LWT settings for it to recognise
these as not part of the words?
1 person has voted this message useful
|
lwtproject Pentaglot Senior Member Netherlands https://learning-wit Joined 4841 days ago 149 posts - 264 votes Speaks: French, Dutch*, German, English, Mandarin Studies: Italian
| Message 256 of 355 24 January 2012 at 12:18pm | IP Logged |
Quabazaa wrote:
So what to I put into LWT settings for it to recognise
these as not part of the words? |
|
|
In the moment you seem to use:
RegExp Word Characters = -ۿݐ-ݭﭐ-﷼ﹰ-ﻼ
and
RegExp Split Sentences = .،!؟؛:
In detail, the characters recognized in words are the following ranges:
-ۿ \x{0600}-\x{06FF}
ݐ-ݭ \x{0750}-\x{076D}
ﭐ-﷼ \x{FB50}-\x{FDFC}
ﹰ-ﻼ \x{FE70}-\x{FEFC}
Please look at these lists to see all characters in these 4 ranges:
Overview
http://en.wikipedia.org/wiki/Arabic_alphabet
In detail:
http://www.unicode.org/charts/PDF/U0600.pdf
http://www.unicode.org/charts/PDF/U0750.pdf
http://www.unicode.org/charts/PDF/UFB50.pdf
http://www.unicode.org/charts/PDF/UFE70.pdf
If you want to remove characters from these 4 ranges, you have to modify them so that the unwanted characters
aren't anymore in these ranges.
Ex.: Instead of specifying ranges A-I, you can also specify a string of allowed characters, like: ABCDEFGHI.
2 persons have voted this message useful
|