Defense Language Institute Russian (Language Programs, Books & Tapes) Language Learning Forum

Defense Language Institute Russian
Tags: DLI \| Russian
Share with: Delicious Digg reddit Facebook StumbleUpon
Language Learning Forum : Language Programs, Books & Tapes

24 messages over 3 pages: 1 2 3

Ericounet Senior Member France yojik.euRegistered users can see my Skype Name Joined 5435 days ago 157 posts - 414 votes Studies: English, German, Russian	Message 17 of 24 31 July 2011 at 2:43pm \| IP Logged
	Hi, I just put on my website The Yojik a part of DLI Russian: "Individual weapons training". Some lessons from the Basic Course are allready done, but I have to check them before publishing: I did the formating some time ago, and I learned a lot (about docbook) after formating FSI, FSI 1973 and Verbs of Motion, so I have to make some changes. You'll find it in PDF and Epub format. I can generate the documents in .doc (Microsoft) format, but there will be a footer with "xmlmind" on every page, as I use their (free) tool to make the job. (They ask NOT to remove this footer) If you want it in an other format, tell me. The Russian part is not stressed for the moment (will do later). If someone wants to work with me, I can put the .xml docbook sources on my SVN server and give access to them. Note: we could use theses courses to build a perfect Russian course: for example, build Pimsleur type lessons, anki lists, grammar exercises and so on ... a lot of work, but interesting. 2 persons have voted this message useful
DavidW Hexaglot Senior Member United Kingdom Joined 6526 days ago 318 posts - 458 votes Speaks: English*, Spanish, French, Italian, Persian, Malay Studies: Russian, Arabic (Written), Portuguese, German, Urdu	Message 18 of 24 01 August 2011 at 2:50am \| IP Logged
	It's nice work your doing. It's a really great course and deserves some love. I can do something with the audio. The original is poor, and only 32kps, but it can improved somewhat. Obviously if someone could get a better copy, better to start with that. Here's an example of what can be done: Original: www.omilia.org/hosted_files/orig.mp3 Improved: www.omilia.org/hosted_files/proc.mp3 This has noise reduction with Izotope RX2, some EQ (a slight boost over 4.5Khz and cut below 80Hz) and compression. For the text, OCR gives the best appearance, but is also quite time consuming. There's a free program called 'Scan Tailor' that can do wonders to clean up the existing scans. The images must first be extracted from the PDFs. Two tools which can do this are Adobe Acrobat (not Reader), and 'somePDF' (buggy, but sometimes manages when acrobat fails, or gives strange results). Extract the pages from the PDF as TIFF image files. These, unlike JPEG, are lossless. Compare them to the original PDF to make sure no quality has been lost. Then import them into Scan Tailor. Make sure the DPI setting is set correctly in Scan Tailor. We should decide on a standard output format, perhaps A4 size, 600dpi black/white (not grayscale). It's less work, but you won't get the advantages of XML, or be able to reflow the text for smaller screens. Would be good if someone could work out the structure of the course, and we could work out whats missing, and prioritise the documents. We could also arrange readings of the reading texts. Edited by DavidW on 01 August 2011 at 5:08am 1 person has voted this message useful
Ericounet Senior Member France yojik.euRegistered users can see my Skype Name Joined 5435 days ago 157 posts - 414 votes Studies: English, German, Russian	Message 19 of 24 01 August 2011 at 8:53am \| IP Logged
	Hi, I think OCR (and xml) is better because it allows different formats (pdf, epub, doc ) and allows to generate interesting things: like extracting the Russian vocabulary and getting Anki lists (or whatever program people want); here is an example of what could be done (it was only a test, and I have to delete the duplicate): 1.pdf. I use FineReader and it's quite fast to OCR the pdfs: The most part of the work is the formating in docbook, and the checking ... but I'm now used to it and can do it fast. The only thing I miss is the time :) For the audio part, xml allows to extract the phrases, and theses phrases can be re-recorded with tools like shooka-recorder or yazik-recorder. The audio parts are taggued with linguistic informations and can be suitable for other uses. It's possible to record 400 words in 20 minutes with theses tools. Faster than dealing with the audio allready done by DLI. My wife will record the Russian parts (you can hear her voice in shtooka.net, wikidictionary and many other places). I'll record the French parts (translations of the English parts): but what will be lacking is the recording of the English parts. Once theses recordings done, we could imagine some courses like Pimsleur, or use them with Anki, (learning) games ...). Scripts (in an IT way) can be used to generate theses. Yesterday, I worked on "Better Russian" (DLI): OCR finished, and first chapters allready formated. It will be finished end of week. ps: I have to create the missing pages in the 1st lesson of the DLI Basic Course. I'll be on hollidays next week for 2 weeks, and will manage to make them, so the course will be complete. Then, I'll publish the first lessons. Edited by Ericounet on 01 August 2011 at 11:36am 1 person has voted this message useful
DavidW Hexaglot Senior Member United Kingdom Joined 6526 days ago 318 posts - 458 votes Speaks: English*, Spanish, French, Italian, Persian, Malay Studies: Russian, Arabic (Written), Portuguese, German, Urdu	Message 20 of 24 01 August 2011 at 3:07pm \| IP Logged
	Ah, you're a programmer, I thought so :-). Could I take a look at the XML files, to see the structure? I'll make some time to do some thinking, maybe I can offer a couple of ideas. 1 person has voted this message useful
Ericounet Senior Member France yojik.euRegistered users can see my Skype Name Joined 5435 days ago 157 posts - 414 votes Studies: English, German, Russian	Message 21 of 24 01 August 2011 at 4:26pm \| IP Logged
	Hi, I just put the xml file Individual-weapons-training.xml; There were no images for this book. If you want to see it clearly, use the free software from xmlmind: xmleditor. It's written in Java so run on every platform (Linux, Mac and Windows). I use it with the help of "oXygen Author" (this one is not free). Next week, I'll put on my site all the sources of the documents (tiff files, images, xml files, working files), everything that is needed to work with. ps: I'm not a programmer, but wrote my first program (assembly language) in 1976 ... and never stopped since that time. I like languages ;) 1 person has voted this message useful
DavidW Hexaglot Senior Member United Kingdom Joined 6526 days ago 318 posts - 458 votes Speaks: English*, Spanish, French, Italian, Persian, Malay Studies: Russian, Arabic (Written), Portuguese, German, Urdu	Message 22 of 24 01 August 2011 at 7:43pm \| IP Logged
	I took a look. I saw in the Docbook reference documentation, there are elements to point to an external audio file. But, can this be referenced to a specific piece of text in the document? Are there any viewers that support this feature? The DAISY format supports syncronised audio and text, as does the upcoming version of EPUB. I thought you were storing the document in a special XML that includes special labels like 'vocab,' 'grammar explanation,' 'text for reading,' 'drill' etc., so that software could do stuff with the files automatically, like generate flashcards. I suppose this would need an XML language to be extended with these special tags, which would take some thought. At the moment, the main benefits of Docbook are: -copy and paste -ability to be automatically refomatted for viewing small-screen (less than 10") devices -improvement of appearance To be honest, I wonder if it's worth it. The quality of the original PDFs isn't bad, and they'll look a lot better when cleaned up. If I was doing it, I'd clean up the PDFs and organise them on their own website. I'd try to get better audio recordings, or otherwise clean up the existing ones. I'd then look into producing recordings for the Russian texts in more advanced parts of the course, which are quite interesting. All this I think would take much less time than OCRing the entire course, which can also introduce errors if you're not careful. That's just my opinion. Edit: Here's what a cleaned up PDF looks like. It took about 15 mins work + 15 mins processing time (once you know how...): Original: www.omilia.org/hosted_files/orig.pdf Processed: www.omilia.org/hosted_files/proc.pdf Scan Tailor can correct skewed text, centre pages, despeckle etc. Edited by DavidW on 01 August 2011 at 9:26pm 1 person has voted this message useful
Ericounet Senior Member France yojik.euRegistered users can see my Skype Name Joined 5435 days ago 157 posts - 414 votes Studies: English, German, Russian	Message 23 of 24 01 August 2011 at 9:49pm \| IP Logged
	Hi, docbook is only a xml shema for formating documents. Of course you can link to internal or external parts. There is no need to have a "viewer" for docbook documents: the docbook file is meant to be processed and give either other xml files (ex: the vocabulary file I gave to you), or pdf or epub or doc or html (or what you want: just write the xslt file to produce what you want (many allready exist)) : one source, multiple targets. For example, all the Russian parts are taggued as xml:lang="ru". You don't have to mix special labels: everything is allready in docbook: you have attributs (for elements) and you can invent the one you want: after that, the processing works with theses attributs. So you can tag a part as vocabulary, or drill or ... With images, as nice they can be, you cannot do anything: no search, no sorting, no extracting, nothing : you can only keep the book as it is. It could be a way, but I choose another :) I know, it's more work, but I like to do it. Yes, it's possible to add some errors: but theses are easylly corrected: xml files are only text files. And the production chain is totally free.(xml->pdf or html or xml). The other thing important to me is the possibility to translate the English text; with tools like OmegaT (free and multiplatforms), it's an easy task (but long as there are so many pages) Your corrected pages are nice, but I prefer my way (I do it for free, in my free time). In the next weeks, I'll publish the vocabulary lists for FSI, in many formats, so you'll see what is possible once the hard work done(formating in docbook). This vocabulary will also be recorded. With little stones, we can build castels. I only hope it can be usefull. 1 person has voted this message useful
DavidW Hexaglot Senior Member United Kingdom Joined 6526 days ago 318 posts - 458 votes Speaks: English*, Spanish, French, Italian, Persian, Malay Studies: Russian, Arabic (Written), Portuguese, German, Urdu	Message 24 of 24 01 August 2011 at 10:33pm \| IP Logged
	I think if you have a vision about how the XML could allow the materials to be used in new ways, it could be worthwhile. It would be necessary in this case go through the materials and think about the possibilities for each section. To do a translation would be a good reason, but there are thousands of pages: it would be many months work, working full time. If the only benefit is to provide the document in different formats, I personally wouldn't bother. Sure, you can't search, sort, extract, but the materials have already been carefully prepared and laid out for the student. My intonation in English is a little odd, so I don't think I'd make a good reader. But you should be able to recruit one here: https://forum.librivox.org/viewtopic.php?t=21482 Best of luck. Edited by DavidW on 01 August 2011 at 10:45pm 1 person has voted this message useful

This discussion contains 24 messages over 3 pages: << Prev 1 2 3

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register

Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum

This page was generated in 6.0000 seconds.