Toby DiPasquale on 10 Jun 2007 17:43:46 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PhillyOnRails] hadoop

On Wed, Jun 06, 2007 at 03:52:32PM -0400, Evan Weaver wrote:
> Toby,
> Can you give a quick overview of why I would want to use Hadoop? I
> didn't really follow you last night, and figured we might as well have
> the discussion on-list.

This list is not really the place for this discussion, but here goes:

For you guys, using Hadoop would allow you to put your data in HDFS and
then MapReduce out whatever else you wanted (specifically, the indeces you
are using Memcached for). Those could even still be loaded into Memcached,
although you'd probably have to write some kind of adapter to get
Memcached to talk to HDFS.

The thing I was really trying to get at is that Hadoop supports your
current style of infrastructure but also gives you more flexibility if you
needed to change things. E.g. if you needed to tear up some of your data
in a new way and export a new set of columns, that's one MapReduce job
away from being reality. Also, it sounded like you guys weren't doing very
frequent inserts and, more importantly, that random access was not the
primary focus of your system, so Hadoop is a good fit for that style of

Hadoop itself is based on the core infrastructure at Google, as reflected
by these papers:

In any case, I was just saying that you guys should take a look at it
because it sounded to me like you were recreating a bunch of pieces of it
when its already in existence. I know its written in poopy ol' Java, but
despite that its still worth a look, IMHO. Of course, I may also have
misunderstood what you were saying, so I might be wrong about it. I guess
let me know what you think either way.

For more on Hadoop, I'd start off here:

Toby DiPasquale
To unsubscribe or change your settings, visit: