Christopher Barry on 26 Sep 2016 21:15:17 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Top-posting

On Mon, 26 Sep 2016 18:06:11 -0400
Rich Kulawiec <> wrote:

>On Thu, Sep 22, 2016 at 05:06:49PM -0400, Rich Freeman wrote:
>> Ultimately both folders and tags are insufficient.  What you really
>> want is full text indexing, in addition to indexing of folders and
>> tags.  This really necessitates a server component that can look
>> through your vast store of well-indexed emails and provide them as
>> needed to clients when they connect.  
>Yes, you want full text indexing, but you also want indexing that
>is header-aware, e.g., messages which match the regexp
>"List-Id:.*<>" or messages between March 2,
>2007 and March 22, 2007, and so on.
>This can be done (I know, because I've done it) by using IMAP in
>combination with a search engine.  Of course this requires retrieving
>the messages via IMAP in order to index them, and that can be tedious.
>It's better/faster done by indexing collections of mbox files and using
>the filename/byte offset as the retrieval key for each message.  But if
>you can't get at the messages that way: IMAP it is.
>However: I think it's critical to separate functionality here, in
>keeping with the "Software Tools" concept (and the foundations of Unix)
>that a tool should do one thing and do it well.  Mail really, really
>should be kept in mbox format (it's served us well, it's simple, and
>a LOT of tools exist to work with it) and search really, really should
>be separated from it, not integrated with it, because it would be
>architectural error to do the latter.


>That doesn't mean that it has to *look* separate: it can be as
>integrated at the user level as anyone wants.  But it should be kept
>cleanly separated internally so that if -- one day -- we decide to
>replace mbox format, or we decide to swap out the search engine, or we
>come up with something better than IMAP, we can change individual
>components of the system without breaking everything else.
>Oh, and if you want to overlay things like user tagging, e.g., "I want
>to mark these messages as relevant to the Foo Project", then that
>should *not* be done at the mbox level -- because it modifies
>messages.  It should really be an overlay keyed to the Message-ID,
>which is putatively unique and putatively well-formed (yeah, well,
>ignore Microsoft which has gotten it wrong for 15+ years) and can
>be/should be one of the items kept in the search engine's indices
>anyway.  This is cleaner (because if you decide to delete the tag, you
>don't modify messages, only the search engine indices) and faster
>(because if you want to generate a list of all Foo Project messages,
>you only need the indices and don't need to prowl through the messages

extremely well put.

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --