17 messages over 3 pages: 1 2 3 Next >>
Rob Tickner Senior Member New Zealand Joined 4490 days ago 126 posts - 158 votes Speaks: English* Studies: GermanB1, French, Swedish
| Message 1 of 17 12 August 2012 at 12:45am | IP Logged |
Hi,
(previous username tomsawyer, account got messed up with the recent hacking)
Having finished the German and French FSI courses in the past, and knowing their value,
I started the Swedish FSI course yesterday, only to find that the poor legibility of
the typewritten PDF hinders reading of the Swedish text (it's sometimes difficult to
tell apart å and ä and a, for example).
Having a search around, I found this fellow:
http://www.ielanguages.com/fsi/fsiproject.html who seems to be trying to convert
several of the FSI course PDFs into HTML. I emailed him, but haven't heard back yet.
While I appreciate his efforts, I can't help to think that a .TXT file formal would be
much better than HTML - e.g. if you wanted to make an eBook out of it.
I ran the FSI Swedish course through Tesseract (an OCR engine), and it will probably
require about 20 - 30 hours of reformatting, fixing spelling mistakes, etc. to get to a
useful state.
My question: have others encountered legibility issues on these courses, to the point
that a .TXT OCR'd file would indeed be useful?
If enough people are interested, I'll do the work required, and post the .TXT file in a
public place for download (on my web server). If other people are interested in OCR'ing
some of the other manuals, I would be happy to put them up on my server too.
Thanks,
Rob.
Edited by Rob Tickner on 12 August 2012 at 12:46am
2 persons have voted this message useful
| Hampie Diglot Senior Member Sweden Joined 6661 days ago 625 posts - 1009 votes Speaks: Swedish*, English Studies: Latin, German, Mandarin
| Message 2 of 17 12 August 2012 at 2:53am | IP Logged |
An OCR'd PDF-file is nice too: it's searchable and you can copy small chunks out of it. Making it into pure text will
take long and you'll have to reformat it.. a lot...
1 person has voted this message useful
| tarvos Super Polyglot Winner TAC 2012 Senior Member China likeapolyglot.wordpr Joined 4709 days ago 5310 posts - 9399 votes Speaks: Dutch*, English, Swedish, French, Russian, German, Italian, Norwegian, Mandarin, Romanian, Afrikaans Studies: Greek, Modern Hebrew, Spanish, Portuguese, Czech, Korean, Esperanto, Finnish
| Message 3 of 17 12 August 2012 at 7:53am | IP Logged |
I don't find the Swedish FSI course that illegible. But if you can make it better, by all
means do.
1 person has voted this message useful
| maydayayday Pentaglot Senior Member United Kingdom Joined 5221 days ago 564 posts - 839 votes Speaks: English*, German, Italian, SpanishB2, FrenchB2 Studies: Arabic (Egyptian), Russian, Swedish, Turkish, Polish, Persian, Vietnamese Studies: Urdu
| Message 4 of 17 12 August 2012 at 9:47am | IP Logged |
Thank you for volunteering. Go for it! I am sure there will be a lot of people interested. The sound quality of the FSI materials diappointed me, do you plan to do anything with the sound ?
1 person has voted this message useful
| Rob Tickner Senior Member New Zealand Joined 4490 days ago 126 posts - 158 votes Speaks: English* Studies: GermanB1, French, Swedish
| Message 5 of 17 12 August 2012 at 10:34am | IP Logged |
I don't mind the audio quality, as long as I scoop the bass out of it with my speakers,
else it sounds a little muffled. I know my way around Audacity, but don't really have the
skills to improve the audio at this time. The best I could probably do is coax a few
Swedish backpackers into re-recording it. Swedish backpackers, if you're out there, free
accommodation in outback Australia for a few chapters of FSI!
1 person has voted this message useful
| maydayayday Pentaglot Senior Member United Kingdom Joined 5221 days ago 564 posts - 839 votes Speaks: English*, German, Italian, SpanishB2, FrenchB2 Studies: Arabic (Egyptian), Russian, Swedish, Turkish, Polish, Persian, Vietnamese Studies: Urdu
| Message 6 of 17 12 August 2012 at 10:51am | IP Logged |
I did some work on the FSI Spanish materials where I transcribed the text and cleaned up the sound, linking the two together. Was fun but work got busy so that fell off the to-do list.
1 person has voted this message useful
| Majka Triglot Senior Member Czech Republic kofoholici.wordpress Joined 4659 days ago 307 posts - 755 votes Speaks: Czech*, German, English Studies: French Studies: Russian
| Message 7 of 17 12 August 2012 at 12:39pm | IP Logged |
I did think about converting the French course to epub for my reader (the new Pocketbook touch). But I decided against it - the pdf is not that bad and the reader can crop the white borders. And the French course is searchable (I hope I downloaded it as such).
One tip for you - the free pdf-exchange reader can do this work for you - additional languages are here to download.
It works well, and if the source text has clear layout, there is no need for tesseract. One simply press "ocr the text" and lets the program run.
The other piece of free software is STDU Viewer which has some nifty features and allows to export a text file. But with embedded text, one needs lot of work with reformating. I find it easier to copy and paste parts of the text from the pdf directly, opening both files next to each other.
Again, in case of FSI French I decided against it. But I did convert some textbooks (parts of them, without the exercises) to epub, mainly because I wanted to use text-to-speech.
1 person has voted this message useful
| iguanamon Pentaglot Senior Member Virgin Islands Speaks: Ladino Joined 5264 days ago 2241 posts - 6731 votes Speaks: English*, Spanish, Portuguese, Haitian Creole, Creole (French)
| Message 8 of 17 12 August 2012 at 1:11pm | IP Logged |
FSI isn't the only public domain US government language learning resource out there. Many, if not most, of the DLI Courses are in desperate need of a rehabilitation project, I ocr'ed and de-skewed the Portuguese Basic Course with my adobe 9 pdf software and improved it a lot but the cleanup only goes so far. The old DLI courses were typewritten. They need to be re-transcribed. Perhaps if a project could be crowd-sourced with each person doing a few pages... whole volumes could be greatly improved.
Edited by iguanamon on 12 August 2012 at 2:43pm
2 persons have voted this message useful
|
This discussion contains 17 messages over 3 pages: 1 2 3 Next >>
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.7969 seconds.
DHTML Menu By Milonic JavaScript
|