Register  Login  Active Topics  Maps  

Create editable text from scanned PDFs

  Tags: Gadget
 Language Learning Forum : Learning Techniques, Methods & Strategies Post Reply
20 messages over 3 pages: 1 2 3  Next >>
Kyle Corrie
Senior Member
United States
Joined 4764 days ago

175 posts - 464 votes 

 
 Message 1 of 20
12 October 2011 at 11:15pm | IP Logged 
If you're like me then a lot of the reading you do is from eBooks you download from the
internet. They're obviously very useful, but a major downfall from them is that most of
the time they are actually just image files that have been scanned from the pages of
the actual book and inserted into a PDF.

They are still readable, yes. However, it would be much more convenient if they were in
an actual 'editable text format'. This meaning that the user could add and take away as
they saw fit as well as selecting the text for quicker word look-up and so forth.

This led me to look for a solution and my search pointed me to OCR (Optical Character
Recognition). With OCR programs you can easily insert a PDF file, image file, or pretty
much anything with words on it into the program and it will spit out a text document of
what was on those pages for you.

If you're wondering what kind of uses this could produce for you; a few would be: 1) If
you use a service like LinQ you would be able to upload your own content of scanned PDF
books to disect. 2) You could also use the similarly useful service called 'Learn with
Texts' that helps keep track of learned words. 3) Simply copy and paste unknown words
rather than having to manually type them into your preferred dictionary.

Pretty much just use your imagination.

----------

So, for anyone that would be interested... here are some instructions.

1. You need to download an OCR program. The one I'll be using here is FileCenter7
(http://www.lucion.com/downloads.html) If you don't want that one then a quick Google
search will turn up many others.

2. After having downloaded FIleCenter you obviously need to open it.

3. Select the 'Edit' tab at the top of the interface.

4. Now, simply enough, just drag your preferred PDF file into the large white area and
you should now see your book.

5. Now click the down arrow on the 'Reorganize Text' button near the top middle and
select 'Send OCR Text to Word'

And that's all there is to it. Now you have an editable book whereas before you
couldn't select any words.

----------

I hope some people find this as useful as I have.
4 persons have voted this message useful



montmorency
Diglot
Senior Member
United Kingdom
Joined 4763 days ago

2371 posts - 3676 votes 
Speaks: English*, German
Studies: Danish, Welsh

 
 Message 2 of 20
13 October 2011 at 10:40pm | IP Logged 
Kyle,

Thanks for this. The free OCR programe I have been trying out (originally not for
language learning purposes), does not seem to handle non-English (e.g. German)
characters very well. (Maybe there is an add-on I can look for).

Does Filecenter handle the major European language character sets ok, do you know?


The other problem is, as others have occasionally pointed out, that scans of books one
can find on the web are often of rather poor quality for this purpose.


Cheers,
M.

1 person has voted this message useful



georgiqg
Triglot
Newbie
Spain
Joined 4839 days ago

36 posts - 50 votes
Speaks: Bulgarian*, Spanish, English
Studies: German, Russian

 
 Message 3 of 20
14 October 2011 at 1:14am | IP Logged 
@Kyle Corrie, have you tried with PDF files, which have text in more than one language? For example, a PDF that contains some text in German and some in Spanish. Would that software recognize both German (ß, ä, etc.) and Spanish characters (ñ, ó, etc.)?
Thanks. ;)

-- Georgi -
1 person has voted this message useful



jerrypettit
Groupie
United States
Joined 5961 days ago

79 posts - 103 votes 
Speaks: English*

 
 Message 4 of 20
14 October 2011 at 2:18am | IP Logged 
FileCenter 7 isn't free is it?

Any recommended free OCR software?
1 person has voted this message useful



Kyle Corrie
Senior Member
United States
Joined 4764 days ago

175 posts - 464 votes 

 
 Message 5 of 20
14 October 2011 at 3:34am | IP Logged 
@montmorency

I actually have been using this mainly for German so far and it dosn't miss a single
character. All yours ßs or umlauts will show up in the Word document with no problem.

----------

@georgiqg

I have tried German and Spanish PDFs now and it has handled all special characters
flawlessly, so I don't see why it would make a difference if they were both in the same
document.

----------

@jerrypettit

Unfortunately FileCenter is not a free program, but it does allow a 30 day free trial.
However, with all things digital and a little persistence; a more unscrupulous person
should be able to find a torrent. :)

Otherwise you could skower http://freeocr.net/ for an alternative version.
2 persons have voted this message useful



liddytime
Pentaglot
Senior Member
United States
mainlymagyar.wordpre
Joined 6164 days ago

693 posts - 1328 votes 
Speaks: English*, Spanish, Italian, Portuguese, Galician
Studies: Hungarian, Vietnamese, Modern Hebrew, Norwegian, Persian, Arabic (Written)

 
 Message 6 of 20
14 October 2011 at 4:29pm | IP Logged 
I'm still looking for a program that can convert Arabic script to OCR. Anyone know of one that won't break the
bank?
1 person has voted this message useful



Hashimi
Senior Member
Oman
Joined 6194 days ago

362 posts - 529 votes 
Speaks: Arabic (Written)*
Studies: English, Japanese

 
 Message 7 of 20
16 October 2011 at 11:59am | IP Logged 

@liddytime, ReadIRIS can OCR the Arabic script.


2 persons have voted this message useful



Doitsujin
Diglot
Senior Member
Germany
Joined 5255 days ago

1256 posts - 2363 votes 
Speaks: German*, English

 
 Message 8 of 20
16 October 2011 at 2:20pm | IP Logged 
Hashimi, some time ago I tested ReadIRIS with a picture perfect image of an Arabic text created from a text printed with a standard Naskh font without any optional ligatures and got way to many errors to consider using it with "real" texts.
What kind of documents did you use it with and was was your average error rate?


2 persons have voted this message useful



This discussion contains 20 messages over 3 pages: 2 3  Next >>


Post ReplyPost New Topic Printable version Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum


This page was generated in 0.6870 seconds.


DHTML Menu By Milonic JavaScript
Copyright 2024 FX Micheloud - All rights reserved
No part of this website may be copied by any means without my written authorization.