13 messages over 2 pages: 1 2 Next >>
antibozo Newbie United States fsi.antibozo.net/ Joined 4240 days ago 9 posts - 24 votes Speaks: English* Studies: Spanish, Swedish, Portuguese
| Message 1 of 13 17 April 2013 at 10:52am | IP Logged |
I've been working on OCR-ing the FSI Swedish student text, using the scanned document that is hosted on fsi-language-courses.org, but it's problematic. The problem is that the Swedish part of the text is set in a sans serif font, and, at the resolution of that scan, 'i', 'l', and 'I' are practically indistinguishable in a lot of places. I expect 'l' and 'I' to present a problem no matter what, but, with a higher-resolution scan, 'i' might be more easily distinguished.
I initially experimented with Tesseract, but tuning it was looking like a time-consuming problem, so i shelled out for a copy of ABBYY FineReader, which is somewhat easier to train. But there are still a lot of problem areas, and i don't think it's the fault of the OCR software.
So, does anyone have a higher-resolution scan of the FSI student text? Or, if not, can someone suggest a way to come up with one (short of finding a hard copy and scanning it myself)?
My ultimate objective is to generate an ebook version that will be more legible on a Kindle than the scanned PDF, which is pretty hard to read. So if anyone knows of an existing example of that, that would be a nice alternative.
Edited by antibozo on 17 April 2013 at 10:53am
1 person has voted this message useful
| tarvos Super Polyglot Winner TAC 2012 Senior Member China likeapolyglot.wordpr Joined 4706 days ago 5310 posts - 9399 votes Speaks: Dutch*, English, Swedish, French, Russian, German, Italian, Norwegian, Mandarin, Romanian, Afrikaans Studies: Greek, Modern Hebrew, Spanish, Portuguese, Czech, Korean, Esperanto, Finnish
| Message 2 of 13 17 April 2013 at 11:49am | IP Logged |
I think the only scan on the net is the standard one, and I had no trouble reading it
(it's the only Swedish textbook I've used).
1 person has voted this message useful
| antibozo Newbie United States fsi.antibozo.net/ Joined 4240 days ago 9 posts - 24 votes Speaks: English* Studies: Spanish, Swedish, Portuguese
| Message 3 of 13 17 April 2013 at 1:10pm | IP Logged |
tarvos> I had no trouble reading it
On a Kindle?
1 person has voted this message useful
| tarvos Super Polyglot Winner TAC 2012 Senior Member China likeapolyglot.wordpr Joined 4706 days ago 5310 posts - 9399 votes Speaks: Dutch*, English, Swedish, French, Russian, German, Italian, Norwegian, Mandarin, Romanian, Afrikaans Studies: Greek, Modern Hebrew, Spanish, Portuguese, Czech, Korean, Esperanto, Finnish
| Message 4 of 13 17 April 2013 at 1:17pm | IP Logged |
I don't own a Kindle, I was using a laptop. But the quality is the same across the
board... I don't think my laptop screen is that much better or worse than a Kindle's.
Edited by tarvos on 17 April 2013 at 1:17pm
1 person has voted this message useful
| antibozo Newbie United States fsi.antibozo.net/ Joined 4240 days ago 9 posts - 24 votes Speaks: English* Studies: Spanish, Swedish, Portuguese
| Message 5 of 13 17 April 2013 at 1:27pm | IP Logged |
Trust me. It's hard to read on a Kindle. And it should be obvious that an ebook version would be superior to a PDF scan for a number of reasons, including the ability to use the OCRed text to caption the audio.
I don't understand the intended usefulness of your response.
2 persons have voted this message useful
| tarvos Super Polyglot Winner TAC 2012 Senior Member China likeapolyglot.wordpr Joined 4706 days ago 5310 posts - 9399 votes Speaks: Dutch*, English, Swedish, French, Russian, German, Italian, Norwegian, Mandarin, Romanian, Afrikaans Studies: Greek, Modern Hebrew, Spanish, Portuguese, Czech, Korean, Esperanto, Finnish
| Message 6 of 13 17 April 2013 at 1:55pm | IP Logged |
Of course the ebook version would be superior, but that presumes you have one ;) if your
ebook version is just a copy of the pdf, then there'll be little difference. I know it's
an old document, but I think the version uploaded is the best short of getting a hold of
the book and scanning it yourself.
Edited by tarvos on 17 April 2013 at 1:56pm
1 person has voted this message useful
| Hampie Diglot Senior Member Sweden Joined 6658 days ago 625 posts - 1009 votes Speaks: Swedish*, English Studies: Latin, German, Mandarin
| Message 7 of 13 17 April 2013 at 2:35pm | IP Logged |
I think you'll have to correct the iIl manually.
1 person has voted this message useful
| antibozo Newbie United States fsi.antibozo.net/ Joined 4240 days ago 9 posts - 24 votes Speaks: English* Studies: Spanish, Swedish, Portuguese
| Message 8 of 13 18 April 2013 at 2:31am | IP Logged |
To clarify: here's what i would consider a useful response:
"Hey, i know this person who has a hard copy of the book; get in touch with him or her."
"Hey, i have a hard copy of the book and would be willing to scan it for you."
"Hey, you can purchase such-and-such version of the course to get a higher-resolution PDF."
[No response at all, because the prospective poster has no useful information to offer, and doesn't want to simply make noise restating what is already implied or explicit in the original query.]
And here's what i would not consider a useful response:
"I don't personally know where or how to get a better version. Therefore, you'll have to do it manually."
"I don't personally know where or how to get a better version. Therefore, there must be no way to get a better version."
"Get a copy of the book and scan it yourself." [i.e., precisely the response i specifically requested *not* to receive, because i already know that's one option; i stated as much in the original query.]
So, can we start over now? Does anyone know where or how to get a higher-resolution scan of the Swedish text? If not, that's cool; there's no need to say so.
2 persons have voted this message useful
|
This discussion contains 13 messages over 2 pages: 1 2 Next >>
You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum
This page was generated in 0.3125 seconds.
DHTML Menu By Milonic JavaScript
|