Create editable text from scanned PDFs (Learning Techniques, Methods & Strategies) Language Learning Forum

Create editable text from scanned PDFs
Tags: Gadget
Share with: Delicious Digg reddit Facebook StumbleUpon
Language Learning Forum : Learning Techniques, Methods & Strategies

20 messages over 3 pages: 1 2 3

Doitsujin
Diglot
Senior Member
Germany
Joined 5254 days ago
1256 posts - 2363 votes

Speaks: German*, English

Message 17 of 20

16 January 2012 at 9:25pm | IP Logged

fiziwig wrote:

The OCR is build right into Acrobat Reader, so you don't need a separate program.

Unfortunately, this information is not correct. While the full version of Acrobat has a built-in OCR option, in Acrobat Reader, you can only select text if the document was processed with an OCR program or originated from a word processor/DTP software file.
You cannot select text in a .pdf file that contains text as images.
2 persons have voted this message useful

fiziwig
Senior Member
United States
Joined 4799 days ago
297 posts - 618 votes

Speaks: English*
Studies: Spanish

Message 18 of 20

17 January 2012 at 5:19pm | IP Logged

Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. It worked quite well with http://ia700306.us.archive.org/15/items/firstspanishcour00hi lluoft/firstspanishcour00hilluoft.pdf for example, and you can even highlight and copy/paste text right in the browser. These old textbooks are all images of pages.
1 person has voted this message useful

Doitsujin
Diglot
Senior Member
Germany
Joined 5254 days ago
1256 posts - 2363 votes

Speaks: German*, English

Message 19 of 20

17 January 2012 at 7:33pm | IP Logged

fiziwig wrote:

Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. [...] These old textbooks are all images of pages.

They're pdfs with a text layer on top of the images, because Google processes all Google books with an OCR program. Luckily, the vast majority of older digitized textbooks available at archive.org have such as text layer. But some older textbooks found elsewhere on the Internet don't. I.e. don't expect Acrobat Reader to do the OCR for you in these cases.
1 person has voted this message useful

fiziwig
Senior Member
United States
Joined 4799 days ago
297 posts - 618 votes

Speaks: English*
Studies: Spanish

Message 20 of 20

18 January 2012 at 6:24am | IP Logged

Doitsujin wrote:

fiziwig wrote:

Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. [...] These old textbooks are all images of pages.

I see. I was obviously misunderstanding what I was seeing when I selected text in those books. I didn't realize there was an extra step involved.

1 person has voted this message useful

This discussion contains 20 messages over 3 pages: << Prev 1 2 3

If you wish to post a reply to this topic you must first login. If you are not already registered you must first register

Printable version

You cannot post new topics in this forum - You cannot reply to topics in this forum - You cannot delete your posts in this forum
You cannot edit your posts in this forum - You cannot create polls in this forum - You cannot vote in polls in this forum

This page was generated in 0.3359 seconds.