Rich Kulawiec on 26 Sep 2016 15:06:19 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Top-posting |
On Thu, Sep 22, 2016 at 05:06:49PM -0400, Rich Freeman wrote: > Ultimately both folders and tags are insufficient. What you really > want is full text indexing, in addition to indexing of folders and > tags. This really necessitates a server component that can look > through your vast store of well-indexed emails and provide them as > needed to clients when they connect. Yes, you want full text indexing, but you also want indexing that is header-aware, e.g., messages which match the regexp "List-Id:.*<plug.lists.phillylinux.org>" or messages between March 2, 2007 and March 22, 2007, and so on. This can be done (I know, because I've done it) by using IMAP in combination with a search engine. Of course this requires retrieving the messages via IMAP in order to index them, and that can be tedious. It's better/faster done by indexing collections of mbox files and using the filename/byte offset as the retrieval key for each message. But if you can't get at the messages that way: IMAP it is. However: I think it's critical to separate functionality here, in keeping with the "Software Tools" concept (and the foundations of Unix) that a tool should do one thing and do it well. Mail really, really should be kept in mbox format (it's served us well, it's simple, and a LOT of tools exist to work with it) and search really, really should be separated from it, not integrated with it, because it would be architectural error to do the latter. That doesn't mean that it has to *look* separate: it can be as integrated at the user level as anyone wants. But it should be kept cleanly separated internally so that if -- one day -- we decide to replace mbox format, or we decide to swap out the search engine, or we come up with something better than IMAP, we can change individual components of the system without breaking everything else. Oh, and if you want to overlay things like user tagging, e.g., "I want to mark these messages as relevant to the Foo Project", then that should *not* be done at the mbox level -- because it modifies messages. It should really be an overlay keyed to the Message-ID, which is putatively unique and putatively well-formed (yeah, well, ignore Microsoft which has gotten it wrong for 15+ years) and can be/should be one of the items kept in the search engine's indices anyway. This is cleaner (because if you decide to delete the tag, you don't modify messages, only the search engine indices) and faster (because if you want to generate a list of all Foo Project messages, you only need the indices and don't need to prowl through the messages themselves). ---rsk ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug