Walt Mankowski on 22 Feb 2005 19:22:28 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OCR in linux


On Tue, Feb 22, 2005 at 11:37:55AM -0500, Gregson Helledy wrote:
> I have a .pdf file which I'd like to convert to text.
> I apt-got a package called gocr (and gocr-gtk, a frontend).
> gocr wants .pbm files, so I converted the .pdf to .pbm with
> ImageMagick, then used gocr on it.
> 
> What I got from my 25-page .pdf file (which is text, not images)
> was a 1.7K text file of garbage.  I am using Libranet, targeted
> at Debian stable.
> 
> I know nothing about how ocr software works and thought I'd ask
> whether anyone has had luck with this package.  One thought
> that occurred is that the creator of the document put in images
> of text, rather than text itself...is that possible?

Have you tried pdftotext?  It's part of the xpdf package.

Walt

Attachment: signature.asc
Description: Digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug