Register  Login  Active Topics  Maps  

Digitizing FSI

 Language Learning Forum : Language Programs, Books & Tapes Post Reply
237 messages over 30 pages: 1 2 3 4 5 6 7 ... 1 ... 29 30 Next >>
pentatonic
Senior Member
United States
Joined 7247 days ago

221 posts - 245 votes 

 
 Message 1 of 237
08 March 2005 at 8:03am | IP Logged 
The FSI recorded materials are on tape and so would need to be recorded to digital, the hiss removed/lessened, compressed to even levels, and encoded to mono mp3. It would also be good to even the sound (decibel) levels to a standard, at least across each package. The printed materials would need to be scanned and converted to some format. PDF would be good but HTML or possibly both would be another option. I personally would like the text processed with OCR software for import into flashcard/learning software.

I have the hardware and software to do all of this and would be willing to do this for the German and Spanish FSI courses (not the Barron's versions as they have be retypeset which created lots of errors). However I would have to be provided with both since I have neither.

Anyone who volunteers must realize that this represents hours and hours of work and cannot be done overnight.

Some sort of ditribution system for the finished product would be necessary. Bittorrent is an option but this site would be a good place
1 person has voted this message useful



administrator
Hexaglot
Forum Admin
Switzerland
FXcuisine.com
Joined 7376 days ago

3094 posts - 2987 votes 
12 sounds
Speaks: French*, EnglishC2, German, Italian, Spanish, Russian
Personal Language Map

 
 Message 2 of 237
08 March 2005 at 10:34am | IP Logged 
Pentatonic : you seem to know a great deal about sound, is this your profession or a hobby? I agree that we would need to OCRize the texts for use in other software and faster download.

For filesize, how many Megabytes per hour of sound if we use MP3? I'd be willing to supply the space and hardware to distribute it, but I need to calculate the bandwith since this can escalate fast.

How long per tape would you need to transfer from tapes to digital?

Would we do one file per lesson?

PDF is nice but I think it's especially useful if you make a nice layout. If we just OCRize and make the text clear but no-thrills, probably that HTML would be faster and more portable.
1 person has voted this message useful



pentatonic
Senior Member
United States
Joined 7247 days ago

221 posts - 245 votes 

 
 Message 3 of 237
08 March 2005 at 12:03pm | IP Logged 
administrator wrote:
Pentatonic : you seem to know a great deal about sound, is this your profession or a hobby?


I'm an amateur musician and do some home recording so I've spent a lot of time tweaking digital audio and have acquired some good software.

administrator wrote:
For filesize, how many Megabytes per hour of sound if we use MP3? I'd be willing to supply the space and hardware to distribute it, but I need to calculate the bandwith since this can escalate fast.


Voice audio can be mono and at a variable encoding rate of 48-56 KBPS (good quality) would be just under 400k per minute. I've heard acceptable quality mp3s of lower bit rates that were just under 250k per minute but I'd have to round up a codec for that as mine all sound bad at that bitrate.

administrator wrote:
How long per tape would you need to transfer from tapes to digital?


It's hard to say but obviously it would take an hour just to record an hour-long lesson. Then I'd have to process it and listen to it to make sure it's OK and find the points at which it should be split. I'd need to split the lesson up into several files and then encode it to MP3. It would probably be a little faster to encode the whole file and then split it up. Anyway, the review part will be time consuming. Most of the rest could be done while I do other things. Maybe I should do a lesson or two and see.

Of course the OCR part would be time consuming as well.

administrator wrote:
Would we do one file per lesson?


No, I think it would be better to split them up by drill, etc. That way it's easy to go to a drill you need specific practice on.

administrator wrote:
PDF is nice but I think it's especially useful if you make a nice layout. If we just OCRize and make the text clear but no-thrills, probably that HTML would be faster and more portable.


I was thinking PDFs from OCRed material. I agree that PDFs from images are unnecessarily big. PDFs are easy to download and print, but HTML is good.

1 person has voted this message useful



mahyar
Newbie
Canada
Joined 7201 days ago

34 posts - 31 votes
Speaks: English*
Studies: French

 
 Message 4 of 237
08 March 2005 at 2:38pm | IP Logged 
To record from tape, you need a portable tape player and a wire that looks like two male stereo jacks on both ends (pretty cheap). You then plug the tape player into the microphone jack on your computer. You set up a sound recording from the microphone port on the computer and then press play on the tape player. Then you wait until the tape is finished and you have your digitized tape recording.

If your willing to debind and rebind your books (or even better, they came in a ringed binder!), you can put the FSI sheets into a sheet feed scanner. It would then take about 10 minutes of just waiting and maybe putting another stack of papers once in a while.

OCR is another passive process once you've scanned the sheets in. My 6 year old computer can do a 400 page book over night.

Book digitizing teams usually go by this process:
*You get a request from someone
*The scanner person gets the book and scans it in.
*The scanner then posts it somewhere for the general public to proof-read and digitize.
*You get a proof reader person(s) who have the original non-OCRed version and the OCRed version. If they encounter a non obvious typo, they can look at the non-OCRed version to see what it is. Dividing the job or putting an OCRed version on a wiki or some other groupware or revision control system is what is done in the case of multiple proofreaders.
*After some good amount of proofreading has been done, the book is released in multiple formats, such as a CHM, PDF, HTML, and plain text to allow for maximum flexiblity. (PDAs for example work best with plain text and html, while PDF is great for printing, html is great for website publishing and and the CHM format is great when your sitting at your computer.

I've seen the entire pimsleur japanese course (all 90 of the 30 minute long units of it including their "readings" recordings) encoded clearly in 450MB in the ogg vorbis format. If people want to convert it to MP3s or ACCs for their portable players, we can also make a guide on how to do so too.

You can also use BitTorrent to solve the bandwidth problem. So if the files do become really popular, it wouldn't be a problem.

We can also look at the gutenburg project for help with copyright and scanning. They don't cover the specific government case that we are talking about with the FSI in their FAQ so sending them an email at help_AT_pglaf.org (replace the _AT_ with an @) could help clear up the USA copyright issue.

Edited by mahyar on 08 March 2005 at 2:49pm

1 person has voted this message useful



pentatonic
Senior Member
United States
Joined 7247 days ago

221 posts - 245 votes 

 
 Message 5 of 237
08 March 2005 at 3:33pm | IP Logged 
mahyar wrote:
To record from tape, you need a portable tape player and a wire that looks like two male stereo jacks on both ends (pretty cheap). You then plug the tape player into the microphone jack on your computer. You set up a sound recording from the microphone port on the computer and then press play on the tape player. Then you wait until the tape is finished and you have your digitized tape recording.


It depends on what level of quality you are willing to accept. These tapes where recorded in the 60's They are uneven and have lots of hiss. You could certainly speed things up by dumping an unedited, 30-minute-long MP3 out there and let user deal with the details, but I think it would be a better to spare them that. These are not tapes you listen to once and put away.

mahyar wrote:
If your willing to debind and rebind your books (or even better, they came in a ringed binder!), you can put the FSI sheets into a sheet feed scanner. It would then take about 10 minutes of just waiting and maybe putting another stack of papers once in a while.


That's a good idea and would making scanning easier. Unfortunately, I don't have a sheet feeder for my scanner.

mahyar wrote:
OCR is another passive process once you've scanned the sheets in. My 6 year old computer can do a 400 page book over night.


Sorry, but I think this is not a realistic view of the current state of OCR software. It has come a long way but there are still lots of errors and sometimes you can just type things by hand and be as productive. You still have to spell check and correct errors. We're talking about language courses so the text needs to be as error-free as possible. That's my main complaint with the Barron's series. They were retypeset and that introduced a lot of errors. How is someone who doesn't know the language supposed to catch such errors?

I like your suggestions/comments on book digitizing teams. That would be a good thing if we could team up on conversions.

As far as audio formats, I think MP3s are the way to go. Ogg Vorbis and AAC are better formats but the truth is that most mainstream people don't even know what they are, even though a lot are unknowingly using Apple's version of AAC when they download from iTunes. MP3s are playable from practically all portable players and computer media players, and converting from one compressed format to another results in quality degradation.


Edited by pentatonic on 08 March 2005 at 7:01pm

1 person has voted this message useful



administrator
Hexaglot
Forum Admin
Switzerland
FXcuisine.com
Joined 7376 days ago

3094 posts - 2987 votes 
12 sounds
Speaks: French*, EnglishC2, German, Italian, Spanish, Russian
Personal Language Map

 
 Message 6 of 237
08 March 2005 at 4:57pm | IP Logged 
You do seem to know the ropes!

Format - I think MP3 is the way to go, most common, everybody can read it now.

Distribution - I checked out BitTorrent, it seems like a smart, non-profit, collaborative way of distributing files while minimizing server bandwidth drain.

OCR - I've done a 19th century book (on this site) and confirm it's by far not automatic. I like the idea of the feeder, but most of the time is spent comparing each scanned letters where the OCR software has a doubt with the scan itself. It's not an impossible task but you need to know both English and the target language.

Audio - I'd say Pentatonic knows his stuff and a better source recording will be a nicer shared ressource. After all, if we do it once and for all language enthusiasts to share, let's try to do it well if possible. The idea of breaking down each lesson is good, we could then have a directory structure such as FSI_German/I/lesson01/04drill.torrent etc..., so that somebody who wishes to download the entire lesson 01 could do so, then play each track in the right order using a MP3 player.
1 person has voted this message useful



heartburn
Senior Member
United States
Joined 7207 days ago

355 posts - 350 votes 
Speaks: English*
Studies: Spanish

 
 Message 7 of 237
08 March 2005 at 5:52pm | IP Logged 
I'm no expert, but I've recorded, edited and encoded lots of lessons and other audio material. I've also OCRed tons of Spanish articles and stories. I'm very comfortable with this stuff and I'd be willing to help.
1 person has voted this message useful



ElComadreja
Senior Member
Philippines
bibletranslatio
Joined 7238 days ago

683 posts - 757 votes 
2 sounds
Speaks: English*
Studies: Spanish, Portuguese, Latin, Ancient Greek, Biblical Hebrew, Cebuano, French, Tagalog

 
 Message 8 of 237
08 March 2005 at 9:46pm | IP Logged 
ooh, ooooh, get someone to read those articles already on the computer! In a mumbly, non distinct sort of way.


1 person has voted this message useful



This discussion contains 237 messages over 30 pages: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.5313 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.