heartburn Senior Member United States Joined 7207 days ago 355 posts - 350 votes Speaks: English* Studies: Spanish
| Message 9 of 237 08 March 2005 at 10:44pm | IP Logged |
Actually the audio is a gimee! Most of the articles are from two magazines. Think Spanish! is for beginners and Puerta del Sol is more advanced. I posted more details here:
http://how-to-learn-any-language.com/forum/forum_posts.asp?T ID=92&PN=0&TPN=2
They arrive faster than I can get through them. I have quite a backlog.
I'm sorry to say, the recordings are quite clear.
1 person has voted this message useful
|
mahyar Newbie Canada Joined 7201 days ago 34 posts - 31 votes Speaks: English* Studies: French
| Message 10 of 237 09 March 2005 at 2:05am | IP Logged |
Your right about the audio and such.
I know that several photocopying places have high speed sheet feed scanners (photocopiers are the same thing, execpt they print too) at something like .03 euro a page (So 400 pages would cost you 12 euros, not bad at all). I can't find the service in my semi-small town although, but a place like the USA or big town in europe must definately have them.
I know that OCR software is not perfect, I guess i should of said the intial OCR process where the computer does the first big chunk of digitizing.
I've had good results with ABBY fine reader for OCRing and proofreading. (I've actually proofread about 100 pages of a badly scanned book with it) It has a file format where the OCRed results and original scanned results are both easily accessable and proof reading is really easy as a result. We could then split up the job amongst ourselves using that dual format. It doesn't have to be a one person job.
If the font scanned font is legible, I think just knowing how to type the taught language correctly into a computer would be sufficent to proofread the scanned results, since all your doing is a one to one simple copy, not a translation.
With my experience with BitTorrent, I've found it that you should really split the torrent files into big chunks like "All of FSI Spanish I" or "All of FSI Spanish I-IV plus progmatic". If you dont put them in big chunks and divide them up into really small bits like you gave an example, a dispersion effect tends to happen where the resources (people with complete files) are too spread out for anyone to get anything near complete. Plus people who are used to bit-torrent kind of things have been doing 3GB downloads for a while now anyway.
You can still divide the actual files into the fined grained way you specified, but make the .torrent downloadable bits into big chunks. Bit-torrent can make an entire folder with thousands of items in them availble as one downloadable unit. Think of it as a .zip file almost.
We really need to check with the project gutenburg people -before we start- if this would be ok copyright wise. I would do it myself, but I don't know the specifics of the FSI program as well as you guys do. If the copyright is cleared, then we can enlist the help of the all of the project gutenburg volunteers (a large amount of people) to help with digitizing and proofreading these FSI courses.
Edited by mahyar on 09 March 2005 at 2:14am
1 person has voted this message useful
|
administrator Hexaglot Forum Admin Switzerland FXcuisine.com Joined 7376 days ago 3094 posts - 2987 votes 12 sounds Speaks: French*, EnglishC2, German, Italian, Spanish, Russian Personal Language Map
| Message 11 of 237 09 March 2005 at 2:37am | IP Logged |
For large files we can probably find a compromise. I'm not sure I'd like people to start downloading huge files that they are not going to use. Although I'd favor a one-file-per-lesson format, I take your point about efficiency of BitTorrent. Perhaps we could do 5-lessons-per-file or slightly bigger, so that people who just wish to try their hand at the language could do it.
I think the big work is on the Audio side, the OCR is probably easier to manage. If one of us could do the scanning, the one could do the OCR and we could split the correction, that's a good idea.
Once we have started, it will probably be possible to translate the programs into other languages. For instance, FSI German could be have the English parts translated into French so that a Frenchman could use it to learn German. Of course some parts of the program have probably been designed with the specific problems of the English speaker in mind, but I don't think this would detract from the interest of the course. And we would not need perfect target-language speakers to do so, only fellow enthusiasts would have a reasonable voice in the 'From' language.
I think it would be terrific to be able to give free access to these great courses not only to English speakers, but to others as well.
Heartburn, which language would you be interested in digitizing?
Pentatonic, you seem to be the most knowledgeable about sound, would you be able to post some guidelines as to how other forum members might go about digitizing FSI tapes, such as what software and minimum hardware would be recommended? I think we need to achieve a minimum standard for those files and with a technical 'whitepaper' probably that we could all work on it and produce enough files.
In case somebody has interest in Mandarin Chinese, I have access to about 100 tapes of Defence Language Institute drills and dialogs. It's great material and could be made much more usable if digitized. I've never seen it anywhere in the trade and it came straight from NTIS.
Edited by administrator on 09 March 2005 at 6:11am
1 person has voted this message useful
|
Lunatic Newbie Joined 7200 days ago 1 posts - 1 votes
| Message 12 of 237 09 March 2005 at 6:17am | IP Logged |
Admin, you can specify which files you wish to download using Bittorrent.
So, for example, if I wanted to only download the first lesson then I could just select that file and download only those that I want.
As a general rule, Bittorrent is more effective with large files than smaller ones.
1 person has voted this message useful
|
administrator Hexaglot Forum Admin Switzerland FXcuisine.com Joined 7376 days ago 3094 posts - 2987 votes 12 sounds Speaks: French*, EnglishC2, German, Italian, Spanish, Russian Personal Language Map
| Message 13 of 237 09 March 2005 at 6:31am | IP Logged |
Lunatic, welcome to the forum!
I am not too informed about Peer to Peer networks but you and other forum members seem to be. So let me give the problems I face so you can tell us which options are the best:
-Distributing MP3 files that run for many hours
-Letting users access meaningful parts of the files directly (like a 'Scene Selection' on a DVD)
-Not letting quick-buck operators take advantage of our collaborative effort to take the files and sell them commercially
-Allow people to either download one or two lessons OR the whole 15 lessons (for instance)
-Keep the bandwidth needed on the server at an acceptable level - this site is costing me already and although I am willing to let people benefit from free knowledge, I can't actually pay the costs of their downloading myself. Each Gigabyte of traffic cost several dollars, so we need to find a way to keep it down.
-Not compromise the server security with a spammy software that opens connections to anybody and let them do what they want on the box
BitTorrent looks like it's designed to work in this context. However, if we distributed files for relatively rare languages such as Modern Greek, will there be any two people to download this particular file at the same time? If not, will BitTorrent bring any benefits?
1 person has voted this message useful
|
heartburn Senior Member United States Joined 7207 days ago 355 posts - 350 votes Speaks: English* Studies: Spanish
| Message 14 of 237 09 March 2005 at 9:50am | IP Logged |
Of course, I'd prefer to do Spanish. But I'd be ok doing any language that uses the Roman alphabet if I don't have to proofread.
Unfortunately, I only own the Barron's and Platiquemos versions of the Spanish program.
administrator wrote:
-Not letting quick-buck operators take advantage of our collaborative effort to take the files and sell them commercially
|
|
|
This one might be tough. I'm not a lawyer, but if the material is already public domain we might have no control over this. It is the only reason why we'd be able to do this in the first place.
Edited by heartburn on 09 March 2005 at 9:57am
1 person has voted this message useful
|
administrator Hexaglot Forum Admin Switzerland FXcuisine.com Joined 7376 days ago 3094 posts - 2987 votes 12 sounds Speaks: French*, EnglishC2, German, Italian, Spanish, Russian Personal Language Map
| Message 15 of 237 09 March 2005 at 10:18am | IP Logged |
There is a derived copyright for compilation work. For instance we could not take remastered tapes from commercial releases of FSI and rip them and offer them for free.
If we brand each file in a clear way saying where it came from and that it cannot be sold, I think it should work.
1 person has voted this message useful
|
heartburn Senior Member United States Joined 7207 days ago 355 posts - 350 votes Speaks: English* Studies: Spanish
| Message 16 of 237 09 March 2005 at 8:46pm | IP Logged |
I've been thinking about this copyright thing a little. The more I think about it, the more I think, "Why not let them use the files?" Here's my reasoning...
The goal is to make these programs freely available to everyone, right? When that happens, what will become of the companies who already repackage these programs? Some, like Platiquemos, might be ok because of the value that they add. Others, like AudioForum, will find themselves charging money for an inferior product.
In order to resell something that can otherwise be downloaded for free, the resellers will need to add value. That generally means that they will be making the programs better. Some people are willing to pay for something extra. Ultimately, the quality of commercial language programs will have to increase.
I'm envisioning something like a Free Software license. Maybe it could be based on the Apache license, or the BSD license, or something like that.
If we are doing this out of the goodness of our hearts anyway, why not be really good?
Edited by heartburn on 09 March 2005 at 8:49pm
1 person has voted this message useful
|