Gregson Helledy on 22 Feb 2005 16:48:03 -0000 |
I have a .pdf file which I'd like to convert to text. I apt-got a package called gocr (and gocr-gtk, a frontend). gocr wants .pbm files, so I converted the .pdf to .pbm with ImageMagick, then used gocr on it. What I got from my 25-page .pdf file (which is text, not images) was a 1.7K text file of garbage. I am using Libranet, targeted at Debian stable. I know nothing about how ocr software works and thought I'd ask whether anyone has had luck with this package. One thought that occurred is that the creator of the document put in images of text, rather than text itself...is that possible? Greg Helledy -- Privileged/Confidential information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and notify GRA, Inc. (postmaster@gra-inc.com) immediately. Please advise immediately if you or your employer do not consent to Internet e-mail for messages of this kind. Opinions, conclusions and other information expressed in this message are not given or endorsed by GRA, Inc. unless otherwise indicated by an authorized representative independent of this message. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|