Rich Freeman on 6 May 2012 13:45:53 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Managing PDF Files on Linux?

On Sun, May 6, 2012 at 2:23 PM, Jack Hill <> wrote:
> I haven't done this, but it sounds like the sort of problem symantic
> desktops are targeted at solving. Maybe look at kde/dolphin/nepomuk
> <>.

Ugh, I have USE="-semantic-desktop" set for a reason...  :)

Systems like this rely on being able to search the content of file -
they usually don't involve tagging/browsing.  I really need manual
classification to ensure everything ends up being saved that needs to

I can't really see typing into a search box "unimportant stuff"
getting 500 results, and hitting delete without looking at them.  If I
had classified files into a "save for a year" folder I'd do the same
without any worry.

> Here is a thread
> that looks like it might be relevant
> <>.

Yeah, I've checked out a few of those options.  Referencer seems like
the best of these options, though it doesn't really match my use case
per se.  However, I can't seem to figure out where it is actually

I've seen lots of ways to tag files, but none with integrated PDF
viewers.  I can already manually sort my files into directory trees,
so that is really solving the part of the problem I already have a
solution for.  I just want to be able to run through 100 PDF files
without double-clicking on each one, then re-selecting the file and
dragging it to the right folder.

> From your description of the problem it sounds like you have two files for
> everything that you scan: the raster as a pdf and the OCRed text as plain
> text. Is this correct? It might symplify organization to get the text
> included in the pdf.

That is probably worth a shot.  If anybody has suggestions on how to
do this I'm open to them.

Googling around suggests that watchocr might be useful, but after a
few minutes of clicking around I can't find the source to download.
Just a bunch of useless .deb files.

Odd - two projects that both seem promising but which seem to make it
very hard to find the source tarballs...

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --