luhmann Senior Member Brazil Joined 5331 days ago 156 posts - 271 votes Speaks: Portuguese* Studies: Mandarin, French, English, Italian, Spanish, Persian, Arabic (classical)
| Message 1 of 3 26 March 2014 at 2:01am | IP Logged |
HI,
I just wrote a quick script that will turn the parallel corpora available at http://opus.lingfil.uu.se/ into a list of sentences of progressive difficulty, based on the frequency of the words contained therein.
Here's my code: http://pastebin.com/f9Kzzsd9 (Python 2.x)
Usage: Get the corpus of your choice in the MOSES format, extract it into the same folder as you save the source code. Change the file names in the first 2 lines of the code. Run it.
Eh voilá.
Edited by luhmann on 26 March 2014 at 2:17am
5 persons have voted this message useful
|
juman Diglot Senior Member Sweden Joined 5216 days ago 101 posts - 129 votes Speaks: Swedish*, English Studies: French
| Message 2 of 3 26 March 2014 at 8:14pm | IP Logged |
Interesting... how do you calculate the score telling if the sentence is easy or not?
1 person has voted this message useful
|
luhmann Senior Member Brazil Joined 5331 days ago 156 posts - 271 votes Speaks: Portuguese* Studies: Mandarin, French, English, Italian, Spanish, Persian, Arabic (classical)
| Message 3 of 3 27 March 2014 at 1:16am | IP Logged |
For now, I consider only the frequency of the rarest word in it. The resulting sort introduces vocabulary smoothly, one by one, as you advance.
1 person has voted this message useful
|
If you wish to post a reply to this topic you must first login. If you are not already registered you must first register
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3438 seconds.