Rich Kulawiec on 26 Sep 2016 15:06:19 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Top-posting

On Thu, Sep 22, 2016 at 05:06:49PM -0400, Rich Freeman wrote:
> Ultimately both folders and tags are insufficient.  What you really
> want is full text indexing, in addition to indexing of folders and
> tags.  This really necessitates a server component that can look
> through your vast store of well-indexed emails and provide them as
> needed to clients when they connect.

Yes, you want full text indexing, but you also want indexing that
is header-aware, e.g., messages which match the regexp
"List-Id:.*<>" or messages between March 2, 2007
and March 22, 2007, and so on.

This can be done (I know, because I've done it) by using IMAP in
combination with a search engine.  Of course this requires retrieving
the messages via IMAP in order to index them, and that can be tedious.
It's better/faster done by indexing collections of mbox files and using
the filename/byte offset as the retrieval key for each message.  But if
you can't get at the messages that way: IMAP it is.

However: I think it's critical to separate functionality here, in keeping
with the "Software Tools" concept (and the foundations of Unix)
that a tool should do one thing and do it well.  Mail really, really
should be kept in mbox format (it's served us well, it's simple, and
a LOT of tools exist to work with it) and search really, really should
be separated from it, not integrated with it, because it would be
architectural error to do the latter.

That doesn't mean that it has to *look* separate: it can be as integrated
at the user level as anyone wants.  But it should be kept cleanly
separated internally so that if -- one day -- we decide to replace mbox
format, or we decide to swap out the search engine, or we come up with
something better than IMAP, we can change individual components of the
system without breaking everything else.

Oh, and if you want to overlay things like user tagging, e.g., "I want
to mark these messages as relevant to the Foo Project", then that should
*not* be done at the mbox level -- because it modifies messages.  It should
really be an overlay keyed to the Message-ID, which is putatively unique
and putatively well-formed (yeah, well, ignore Microsoft which has gotten
it wrong for 15+ years) and can be/should be one of the items kept in the
search engine's indices anyway.  This is cleaner (because if you decide
to delete the tag, you don't modify messages, only the search engine
indices) and faster (because if you want to generate a list of all
Foo Project messages, you only need the indices and don't need to
prowl through the messages themselves).

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --