Walt Mankowski on 22 Feb 2005 19:22:28 -0000 |
On Tue, Feb 22, 2005 at 11:37:55AM -0500, Gregson Helledy wrote: > I have a .pdf file which I'd like to convert to text. > I apt-got a package called gocr (and gocr-gtk, a frontend). > gocr wants .pbm files, so I converted the .pdf to .pbm with > ImageMagick, then used gocr on it. > > What I got from my 25-page .pdf file (which is text, not images) > was a 1.7K text file of garbage. I am using Libranet, targeted > at Debian stable. > > I know nothing about how ocr software works and thought I'd ask > whether anyone has had luck with this package. One thought > that occurred is that the creator of the document put in images > of text, rather than text itself...is that possible? Have you tried pdftotext? It's part of the xpdf package. Walt Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|