Kyle Corrie Senior Member United States Joined 4815 days ago 175 posts - 464 votes
| Message 1 of 20 12 October 2011 at 11:15pm | IP Logged |
If you're like me then a lot of the reading you do is from eBooks you download from the
internet. They're obviously very useful, but a major downfall from them is that most of
the time they are actually just image files that have been scanned from the pages of
the actual book and inserted into a PDF.
They are still readable, yes. However, it would be much more convenient if they were in
an actual 'editable text format'. This meaning that the user could add and take away as
they saw fit as well as selecting the text for quicker word look-up and so forth.
This led me to look for a solution and my search pointed me to OCR (Optical Character
Recognition). With OCR programs you can easily insert a PDF file, image file, or pretty
much anything with words on it into the program and it will spit out a text document of
what was on those pages for you.
If you're wondering what kind of uses this could produce for you; a few would be: 1) If
you use a service like LinQ you would be able to upload your own content of scanned PDF
books to disect. 2) You could also use the similarly useful service called 'Learn with
Texts' that helps keep track of learned words. 3) Simply copy and paste unknown words
rather than having to manually type them into your preferred dictionary.
Pretty much just use your imagination.
----------
So, for anyone that would be interested... here are some instructions.
1. You need to download an OCR program. The one I'll be using here is FileCenter7
(http://www.lucion.com/downloads.html) If you don't want that one then a quick Google
search will turn up many others.
2. After having downloaded FIleCenter you obviously need to open it.
3. Select the 'Edit' tab at the top of the interface.
4. Now, simply enough, just drag your preferred PDF file into the large white area and
you should now see your book.
5. Now click the down arrow on the 'Reorganize Text' button near the top middle and
select 'Send OCR Text to Word'
And that's all there is to it. Now you have an editable book whereas before you
couldn't select any words.
----------
I hope some people find this as useful as I have.
4 persons have voted this message useful
|
montmorency Diglot Senior Member United Kingdom Joined 4814 days ago 2371 posts - 3676 votes Speaks: English*, German Studies: Danish, Welsh
| Message 2 of 20 13 October 2011 at 10:40pm | IP Logged |
Kyle,
Thanks for this. The free OCR programe I have been trying out (originally not for
language learning purposes), does not seem to handle non-English (e.g. German)
characters very well. (Maybe there is an add-on I can look for).
Does Filecenter handle the major European language character sets ok, do you know?
The other problem is, as others have occasionally pointed out, that scans of books one
can find on the web are often of rather poor quality for this purpose.
Cheers,
M.
1 person has voted this message useful
|
georgiqg Triglot Newbie Spain Joined 4890 days ago 36 posts - 50 votes Speaks: Bulgarian*, Spanish, English Studies: German, Russian
| Message 3 of 20 14 October 2011 at 1:14am | IP Logged |
@Kyle Corrie, have you tried with PDF files, which have text in more than one language? For example, a PDF that contains some text in German and some in Spanish. Would that software recognize both German (ß, ä, etc.) and Spanish characters (ñ, ó, etc.)?
Thanks. ;)
-- Georgi -
1 person has voted this message useful
|
jerrypettit Groupie United States Joined 6012 days ago 79 posts - 103 votes Speaks: English*
| Message 4 of 20 14 October 2011 at 2:18am | IP Logged |
FileCenter 7 isn't free is it?
Any recommended free OCR software?
1 person has voted this message useful
|
Kyle Corrie Senior Member United States Joined 4815 days ago 175 posts - 464 votes
| Message 5 of 20 14 October 2011 at 3:34am | IP Logged |
@montmorency
I actually have been using this mainly for German so far and it dosn't miss a single
character. All yours ßs or umlauts will show up in the Word document with no problem.
----------
@georgiqg
I have tried German and Spanish PDFs now and it has handled all special characters
flawlessly, so I don't see why it would make a difference if they were both in the same
document.
----------
@jerrypettit
Unfortunately FileCenter is not a free program, but it does allow a 30 day free trial.
However, with all things digital and a little persistence; a more unscrupulous person
should be able to find a torrent. :)
Otherwise you could skower http://freeocr.net/ for an alternative version.
2 persons have voted this message useful
|
liddytime Pentaglot Senior Member United States mainlymagyar.wordpre Joined 6215 days ago 693 posts - 1328 votes Speaks: English*, Spanish, Italian, Portuguese, Galician Studies: Hungarian, Vietnamese, Modern Hebrew, Norwegian, Persian, Arabic (Written)
| Message 6 of 20 14 October 2011 at 4:29pm | IP Logged |
I'm still looking for a program that can convert Arabic script to OCR. Anyone know of one that won't break the
bank?
1 person has voted this message useful
|
Hashimi Senior Member Oman Joined 6245 days ago 362 posts - 529 votes Speaks: Arabic (Written)* Studies: English, Japanese
| Message 7 of 20 16 October 2011 at 11:59am | IP Logged |
@liddytime, ReadIRIS can OCR the Arabic script.
2 persons have voted this message useful
|
Doitsujin Diglot Senior Member Germany Joined 5306 days ago 1256 posts - 2363 votes Speaks: German*, English
| Message 8 of 20 16 October 2011 at 2:20pm | IP Logged |
Hashimi, some time ago I tested ReadIRIS with a picture perfect image of an Arabic text created from a text printed with a standard Naskh font without any optional ligatures and got way to many errors to consider using it with "real" texts.
What kind of documents did you use it with and was was your average error rate?
2 persons have voted this message useful
|