Kuzman Ganchev on 23 Feb 2005 03:03:41 -0000 |
On Tue, Feb 22, 2005 at 11:37:55AM -0500, Gregson Helledy wrote: > I have a .pdf file which I'd like to convert to text. > I apt-got a package called gocr (and gocr-gtk, a frontend). > gocr wants .pbm files, so I converted the .pdf to .pbm with > ImageMagick, then used gocr on it. What resolution was the output file in? Take a look at the image (e.g. using display) and see if it looks pixelated. OCR needs it to be pretty clear. Oh, there are also a lot of other packages. one list is at: http://www.linux-ocr.ekitap.gen.tr/ > I know nothing about how ocr software works and thought I'd ask > whether anyone has had luck with this package. I used it a long time ago, and it was OK but certainly not great. > One thought > that occurred is that the creator of the document put in images > of text, rather than text itself...is that possible? It is possible (esp. if the document was scanned), but that's what OCR is for. If it's not a scanned file pdftotext should be able to deal with it (much better than doing OCR). Kuzman Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|