Kuzman Ganchev on 23 Feb 2005 03:03:41 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OCR in linux


On Tue, Feb 22, 2005 at 11:37:55AM -0500, Gregson Helledy wrote:
> I have a .pdf file which I'd like to convert to text.
> I apt-got a package called gocr (and gocr-gtk, a frontend).
> gocr wants .pbm files, so I converted the .pdf to .pbm with
> ImageMagick, then used gocr on it.

What resolution was the output file in?  Take a look at the image
(e.g. using display)  and see if it looks pixelated.  OCR needs it to
be pretty clear. 

Oh, there are also a lot of other packages.  one list is at:

http://www.linux-ocr.ekitap.gen.tr/

> I know nothing about how ocr software works and thought I'd ask
> whether anyone has had luck with this package.  

I used it a long time ago, and it was OK but certainly not great. 

> One thought 
> that occurred is that the creator of the document put in images
> of text, rather than text itself...is that possible?

It is possible (esp. if the document was scanned), but that's what OCR
is for.  If it's not a scanned file pdftotext should be able to deal
with it (much better than doing OCR). 

Kuzman 

Attachment: signature.asc
Description: Digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug