Greg Lopp on 22 Feb 2005 18:15:21 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] OCR in linux


Gregson Helledy wrote:
I have a .pdf file which I'd like to convert to text.
How about...
$ apt-get install gs-common
$ pdf2ps $FILE.pdf $FILE.ps
$ ps2ascii $FILE.ps $FILE.txt

I apt-got a package called gocr (and gocr-gtk, a frontend).
gocr wants .pbm files, so I converted the .pdf to .pbm with
ImageMagick, then used gocr on it.

What I got from my 25-page .pdf file (which is text, not images)
was a 1.7K text file of garbage.
What did the .pdm look like? Perhaps the text of your .pdf did not survive that translation.




___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug