Eric on 18 Sep 2007 11:53:44 -0000

Re: [PhillyOnRails] hadoop

In case anybody else is interested....


Toby DiPasquale wrote:
> On Wed, Jun 06, 2007 at 03:52:32PM -0400, Evan Weaver wrote:
>> Toby,
>> Can you give a quick overview of why I would want to use Hadoop? I
>> didn't really follow you last night, and figured we might as well have
>> the discussion on-list.
> This list is not really the place for this discussion, but here goes:
> For you guys, using Hadoop would allow you to put your data in HDFS and
> then MapReduce out whatever else you wanted (specifically, the indeces you
> are using Memcached for). Those could even still be loaded into Memcached,
> although you'd probably have to write some kind of adapter to get
> Memcached to talk to HDFS.
> The thing I was really trying to get at is that Hadoop supports your
> current style of infrastructure but also gives you more flexibility if you
> needed to change things. E.g. if you needed to tear up some of your data
> in a new way and export a new set of columns, that's one MapReduce job
> away from being reality. Also, it sounded like you guys weren't doing very
> frequent inserts and, more importantly, that random access was not the
> primary focus of your system, so Hadoop is a good fit for that style of
> computation.
> Hadoop itself is based on the core infrastructure at Google, as reflected
> by these papers:
> In any case, I was just saying that you guys should take a look at it
> because it sounded to me like you were recreating a bunch of pieces of it
> when its already in existence. I know its written in poopy ol' Java, but
> despite that its still worth a look, IMHO. Of course, I may also have
> misunderstood what you were saying, so I might be wrong about it. I guess
> let me know what you think either way.
