Toby DiPasquale on 10 Jun 2007 17:43:46 -0000 |
On Wed, Jun 06, 2007 at 03:52:32PM -0400, Evan Weaver wrote: > Toby, > > Can you give a quick overview of why I would want to use Hadoop? I > didn't really follow you last night, and figured we might as well have > the discussion on-list. This list is not really the place for this discussion, but here goes: For you guys, using Hadoop would allow you to put your data in HDFS and then MapReduce out whatever else you wanted (specifically, the indeces you are using Memcached for). Those could even still be loaded into Memcached, although you'd probably have to write some kind of adapter to get Memcached to talk to HDFS. The thing I was really trying to get at is that Hadoop supports your current style of infrastructure but also gives you more flexibility if you needed to change things. E.g. if you needed to tear up some of your data in a new way and export a new set of columns, that's one MapReduce job away from being reality. Also, it sounded like you guys weren't doing very frequent inserts and, more importantly, that random access was not the primary focus of your system, so Hadoop is a good fit for that style of computation. Hadoop itself is based on the core infrastructure at Google, as reflected by these papers: http://labs.google.com/papers/gfs-sosp2003.pdf http://labs.google.com/papers/mapreduce.html In any case, I was just saying that you guys should take a look at it because it sounded to me like you were recreating a bunch of pieces of it when its already in existence. I know its written in poopy ol' Java, but despite that its still worth a look, IMHO. Of course, I may also have misunderstood what you were saying, so I might be wrong about it. I guess let me know what you think either way. For more on Hadoop, I'd start off here: http://wiki.apache.org/lucene-hadoop/ProjectDescription -- Toby DiPasquale _______________________________________________ To unsubscribe or change your settings, visit: http://lists.phillyonrails.org/mailman/listinfo/talk
|
|