JP Vossen on 16 Aug 2010 22:25:08 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Search Engines in a can


Last night at PLUG West Amul asked some questions about bug trackers and email archives and related things. Out of that we went off on a tangent about search engines, especially for archiving and searching email.

We mostly talked about the Apache Foundation's Lucene, Solr and Nutch:
	http://nutch.apache.org/about.html
	http://lucene.apache.org/solr/
	http://lucene.apache.org/java/docs/index.html

http://news.slashdot.org/article.pl?sid=09/01/30/2159239 News: Lucene and SOLR Get Commercial Support

Personally, I'm not a Java fan, and all that web server/container stuff just gives me a headache.


For a pure email archive, this sounds more my speed:
http://lurker.sourceforge.net/
    Demo: http://archives.free.net.ph/splash/index.html
Description: Archive tool for mailing lists with search engine
 lurker is an archiver which can handle extremely large amounts
 of email. It is fast, intuitive, and customisable.
 .
 lurker archives your mailing lists and imports new mail.
 It includes many features like powerful fast search engine,
 chronological threading, file attachment support,
 multi-lingual support, completely customisable output etc.


There's also http://code.google.com/p/subetha/, "a modern, sophisticated mailing list manager. [... with] Searchable, threaded archives..." (But also Java/JBoss.)


Also:
	http://swish-e.org/	Swish-e (MS Office, PDF, etc. also)
	http://www.htdig.org/	ht://Dig (web-based HTML and text only)
	http://search.mnogo.ru/	Mnogosearch


Finally, for a commercial solution, a Google Appliance (if they still even sell those) or a Splunk appliance would do the trick, if you want to spend lots of money. I've heard mostly bad things about the Google box, and mostly good things about the Splunk one. We use a lot of Splunk at work and I've seen some really impressive things, but I haven't personally worked with it, not do I really use it myself.


For whatever it's worth,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
My Account, My Opinions     |=========|      http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug