Hashimi Senior Member Oman Joined 6258 days ago 362 posts - 529 votes Speaks: Arabic (Written)* Studies: English, Japanese
| Message 1 of 8 23 March 2010 at 1:07pm | IP Logged |
A new tool from GoogleLabs to add missing diacritics to Arabic text:
http://tashkeel.googlelabs.com/
5 persons have voted this message useful
|
Teango Triglot Winner TAC 2010 & 2012 Senior Member United States teango.wordpress.comRegistered users can see my Skype Name Joined 5555 days ago 2210 posts - 3734 votes Speaks: English*, German, Russian Studies: Hawaiian, French, Toki Pona
| Message 2 of 8 23 March 2010 at 2:17pm | IP Logged |
I don't suppose you know what kind of accuracy Google achieves overall?
I made a system that combines Support Vector Machines with statistical machine learning tools and Buckwalter's Morphological Analyser, that looks very much the same, back in 2006 for Cambridge University. I published a thesis on it, making it public domain, and then contacted Google in great excitement, but they just weren't interested at the time.
My aim was to provide a free tool that could assist all future linguists and learners of Arabic and help semi-automate the construction of annotated Arabic corpora and databases. It's since been employed in several leading Arabic research projects, both in UK and in Egypt. And if I remember correctly, my final system achieved over 93% for words without case endings across almost a million words, which I was quite delighted with back then.
Edited by Teango on 23 March 2010 at 2:22pm
1 person has voted this message useful
|
translator2 Senior Member United States Joined 6918 days ago 848 posts - 1862 votes Speaks: English*
| Message 3 of 8 23 March 2010 at 2:19pm | IP Logged |
I tried it with the Arabic CNN site (http://arabic.cnn.com/) and it works great (inserts the vowel marks, etc.), but can a native speaker give an opinion regarding the accuracy of the vowels?
1 person has voted this message useful
|
Al-Irelandi Senior Member United Kingdom Joined 5534 days ago 111 posts - 177 votes Speaks: English*
| Message 4 of 8 23 March 2010 at 6:38pm | IP Logged |
Quite useful. It obviously doesn't attempt to make i3raab of the endings and give them their tashkeelaat. That would need another algorithm.
Edited by al-Irlandee on 23 March 2010 at 6:39pm
1 person has voted this message useful
|
ANK47 Triglot Senior Member United States thearabicstudent.blo Joined 7096 days ago 188 posts - 259 votes Speaks: English*, Arabic (Written), Arabic (classical)
| Message 5 of 8 24 March 2010 at 8:37am | IP Logged |
Wow, that site will be very useful for people just starting out in Arabic before they know the basic patters of how things are voweled. After several months of exposure you can be reasonably sure of how most words will be pronounced, but at the beginning if a word isn't voweled for you then you'll have no idea how to say it. I remember vocabulary lists when I was learning Arabic that were just the word written with no vowels. Lists like that are practically useless to beginners unless there's audio to go along with them. Anyway, I looked at the site and it seems to work quite well. If you need 100% accuracy for something professional I wouldn't use it, but it's a great helper for learning.
1 person has voted this message useful
|
Woodpecker Triglot Senior Member United States Joined 5810 days ago 351 posts - 590 votes Speaks: English*, Arabic (Written), Arabic (Egyptian) Studies: Arabic (classical)
| Message 6 of 8 24 March 2010 at 10:19am | IP Logged |
That's a pretty amazing resource, thank you.
1 person has voted this message useful
|
ehmoda Newbie United States Joined 5225 days ago 2 posts - 2 votes Speaks: English
| Message 7 of 8 04 August 2010 at 8:27pm | IP Logged |
Teango wrote:
I don't suppose you know what kind of accuracy Google achieves overall?
I made a system that combines Support Vector Machines with statistical machine learning tools and Buckwalter's Morphological Analyser, that looks very much the same, back in 2006 for Cambridge University. I published a thesis on it, making it public domain, and then contacted Google in great excitement, but they just weren't interested at the time.
My aim was to provide a free tool that could assist all future linguists and learners of Arabic and help semi-automate the construction of annotated Arabic corpora and databases. It's since been employed in several leading Arabic research projects, both in UK and in Egypt. And if I remember correctly, my final system achieved over 93% for words without case endings across almost a million words, which I was quite delighted with back then. |
|
|
1 person has voted this message useful
|
ehmoda Newbie United States Joined 5225 days ago 2 posts - 2 votes Speaks: English
| Message 8 of 8 04 August 2010 at 8:28pm | IP Logged |
Hashimi wrote:
A new tool from GoogleLabs to add missing diacritics to Arabic text:
http://tashkeel.googlelabs.com/
Teango, could you give me your email. I am very interested in the software you developed. I ama researcher also and I need that software. Please whenever you see my post just send me an email on oehmoda@gmail.com
|
|
|
1 person has voted this message useful
|