Gregson Helledy on 22 Feb 2005 16:48:03 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] OCR in linux


I have a .pdf file which I'd like to convert to text.
I apt-got a package called gocr (and gocr-gtk, a frontend).
gocr wants .pbm files, so I converted the .pdf to .pbm with
ImageMagick, then used gocr on it.

What I got from my 25-page .pdf file (which is text, not images)
was a 1.7K text file of garbage.  I am using Libranet, targeted
at Debian stable.

I know nothing about how ocr software works and thought I'd ask
whether anyone has had luck with this package.  One thought
that occurred is that the creator of the document put in images
of text, rather than text itself...is that possible?

Greg Helledy

-- 
Privileged/Confidential information may be contained in this message.
If you are not the addressee indicated in this message (or responsible
for delivery of the message to such person), you may not copy or deliver
this message to anyone. In such case, you should destroy this message
and notify GRA, Inc. (postmaster@gra-inc.com) immediately. Please advise
immediately if you or your employer do not consent to Internet e-mail
for messages of this kind. Opinions, conclusions and other information
expressed in this message are not given or endorsed by GRA, Inc. unless
otherwise indicated by an authorized representative independent of this
message.

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug