Greg Helledy on 23 Feb 2005 17:50:28 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] OCR in linux--results


"I suspect that recent versions of KWord will do a better single pass 
importation of PDFs than the method you're using.  If you have a recent 
version of KDE, you likely have a relatively recent of KOffice to go 
with it, merely select import and try it on the PDF in question.  It 
works a lot better than most of the standalone programs."

My version of KWord (KDE 2.2.2) is apparently too old, and lacks
that feature.

"How about...
$ apt-get install gs-common
$ pdf2ps $FILE.pdf $FILE.ps
$ ps2ascii $FILE.ps $FILE.txt"

pdf2ps produced a mangled, illegible .ps file.  It reported that
the .pdf file had a corruped EOF marker.

"What did the .pdm look like?  Perhaps the text of your .pdf did not 
survive that translation."

I had created both .pnm and .pdm files.  The .pnm looked great, an
exact duplicate of the .pdf.  The .pdm was also legible, but "fuzzy".
Surprisingly, gocr failed on both files.

"Have you tried pdftotext?  It's part of the xpdf package."

I do have xpdf, and in fact have a newer version than debian stable
(from backports.org).  This turned out to be the silver bullet...worked
perfectly.  The -layout option even did a half-decent job of preserving
the formatting.

I've now imported the document into OpenOffice and can update and
reformat to my heart's content.  I wanted to write this up to thank all
those who offered their help and to share my results.

Greg Helledy

-- 
Privileged/Confidential information may be contained in this message.
If you are not the addressee indicated in this message (or responsible
for delivery of the message to such person), you may not copy or deliver
this message to anyone. In such case, you should destroy this message
and notify GRA, Inc. (postmaster@gra-inc.com) immediately. Please advise
immediately if you or your employer do not consent to Internet e-mail
for messages of this kind. Opinions, conclusions and other information
expressed in this message are not given or endorsed by GRA, Inc. unless
otherwise indicated by an authorized representative independent of this
message.

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug