Doitsujin Diglot Senior Member Germany Joined 5254 days ago 1256 posts - 2363 votes Speaks: German*, English
| Message 17 of 20 16 January 2012 at 9:25pm | IP Logged |
fiziwig wrote:
The OCR is build right into Acrobat Reader, so you don't need a separate program. |
|
|
Unfortunately, this information is not correct. While the full version of Acrobat has a built-in OCR option, in Acrobat Reader, you can only select text if the document was processed with an OCR program or originated from a word processor/DTP software file.
You cannot select text in a .pdf file that contains text as images.
2 persons have voted this message useful
|
fiziwig Senior Member United States Joined 4799 days ago 297 posts - 618 votes Speaks: English* Studies: Spanish
| Message 18 of 20 17 January 2012 at 5:19pm | IP Logged |
Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. It worked quite well with http://ia700306.us.archive.org/15/items/firstspanishcour00hi lluoft/firstspanishcour00hilluoft.pdf for example, and you can even highlight and copy/paste text right in the browser. These old textbooks are all images of pages.
1 person has voted this message useful
|
Doitsujin Diglot Senior Member Germany Joined 5254 days ago 1256 posts - 2363 votes Speaks: German*, English
| Message 19 of 20 17 January 2012 at 7:33pm | IP Logged |
fiziwig wrote:
Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. [...] These old textbooks are all images of pages. |
|
|
They're pdfs with a text layer on top of the images, because Google processes all Google books with an OCR program. Luckily, the vast majority of older digitized textbooks available at archive.org have such as text layer. But some older textbooks found elsewhere on the Internet don't. I.e. don't expect Acrobat Reader to do the OCR for you in these cases.
1 person has voted this message useful
|
fiziwig Senior Member United States Joined 4799 days ago 297 posts - 618 votes Speaks: English* Studies: Spanish
| Message 20 of 20 18 January 2012 at 6:24am | IP Logged |
Doitsujin wrote:
fiziwig wrote:
Hmmm. I didn't know that. I've only used it with some old Spanish textbooks from around 1910 that were scanned into Google books. [...] These old textbooks are all images of pages. |
|
|
They're pdfs with a text layer on top of the images, because Google processes all Google books with an OCR program. Luckily, the vast majority of older digitized textbooks available at archive.org have such as text layer. But some older textbooks found elsewhere on the Internet don't. I.e. don't expect Acrobat Reader to do the OCR for you in these cases. |
|
|
I see. I was obviously misunderstanding what I was seeing when I selected text in those books. I didn't realize there was an extra step involved.
1 person has voted this message useful
|