Christopher Barry on 26 Sep 2016 21:15:17 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Top-posting |
On Mon, 26 Sep 2016 18:06:11 -0400 Rich Kulawiec <rsk@gsp.org> wrote: >On Thu, Sep 22, 2016 at 05:06:49PM -0400, Rich Freeman wrote: >> Ultimately both folders and tags are insufficient. What you really >> want is full text indexing, in addition to indexing of folders and >> tags. This really necessitates a server component that can look >> through your vast store of well-indexed emails and provide them as >> needed to clients when they connect. > >Yes, you want full text indexing, but you also want indexing that >is header-aware, e.g., messages which match the regexp >"List-Id:.*<plug.lists.phillylinux.org>" or messages between March 2, >2007 and March 22, 2007, and so on. > >This can be done (I know, because I've done it) by using IMAP in >combination with a search engine. Of course this requires retrieving >the messages via IMAP in order to index them, and that can be tedious. >It's better/faster done by indexing collections of mbox files and using >the filename/byte offset as the retrieval key for each message. But if >you can't get at the messages that way: IMAP it is. > >However: I think it's critical to separate functionality here, in >keeping with the "Software Tools" concept (and the foundations of Unix) >that a tool should do one thing and do it well. Mail really, really >should be kept in mbox format (it's served us well, it's simple, and >a LOT of tools exist to work with it) and search really, really should >be separated from it, not integrated with it, because it would be >architectural error to do the latter. *cough*systemd*cough* > >That doesn't mean that it has to *look* separate: it can be as >integrated at the user level as anyone wants. But it should be kept >cleanly separated internally so that if -- one day -- we decide to >replace mbox format, or we decide to swap out the search engine, or we >come up with something better than IMAP, we can change individual >components of the system without breaking everything else. > >Oh, and if you want to overlay things like user tagging, e.g., "I want >to mark these messages as relevant to the Foo Project", then that >should *not* be done at the mbox level -- because it modifies >messages. It should really be an overlay keyed to the Message-ID, >which is putatively unique and putatively well-formed (yeah, well, >ignore Microsoft which has gotten it wrong for 15+ years) and can >be/should be one of the items kept in the search engine's indices >anyway. This is cleaner (because if you decide to delete the tag, you >don't modify messages, only the search engine indices) and faster >(because if you want to generate a list of all Foo Project messages, >you only need the indices and don't need to prowl through the messages >themselves). > >---rsk extremely well put. -- Regards, Christopher ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug