Re: [PLUG] Command line tools faster than Hadoop cluster

Rich Freeman on 10 Mar 2016 17:05:09 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Command line tools faster than Hadoop cluster

From: Rich Freeman <r-plug@thefreemanclan.net>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Subject: Re: [PLUG] Command line tools faster than Hadoop cluster
Date: Thu, 10 Mar 2016 20:05:03 -0500
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to; bh=VB6eFBp0PDXAGgU6Oye0n8X4aSPXgqPk2np13OF0cLU=; b=Ky2nzSIHo6O7JWW1ouuKlGDHp7lSrsFcGMH1ySzhZC4BfDsZvrRGLY5j1Bcoo4ywBq c/iS2ePPN54r75+J+Z90JQ47p15wOjTBBLE77CTQAwiLvQJcJnKGLyY87Gutr0EeSxHI JyVmnsaBkBTYldwjdEVpovw+YBldZthZkYACfqaP40KRLFGwujtXSbTz1dCnS+y1z/tD e9CWRatPWBopx4BPEio6ViCd9PuEEoGQog2Y/rXxGymII/sR82BJ9DtF2utfEFBL0FOz kRRf/KVXXQZDzs8DFHEzPmjnZff7XxP6Vf5Xhq5Sapr6a0PL//vhh+3/FV8hh3ejtQjW rNuQ==
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: "plug" <plug-bounces@lists.phillylinux.org>

On Thu, Mar 10, 2016 at 6:42 PM, Eric Lucas <eric@lucii.org> wrote:
> I just throw this out for your perusal - I stumbled across it this morning
> and found it interesting.
>
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
>
> I knew about the shell behavior but never really thought of it as 'parallel'
> processing.   DOH!
>

As an alternative to the xargs-based approach for parallel processing
here, you should also look at GNU Parallel, which is very similar but
it orders its output as if you had run all the jobs in serial.

I've used it to implement map-reduce problems using pipes.  For
smaller jobs where the overhead of hadoop is too great, it works
really well.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:
- Re: [PLUG] Command line tools faster than Hadoop cluster
  - From: Eric Lucas <eric@lucii.org>

References:
- [PLUG] Command line tools faster than Hadoop cluster
  - From: Eric Lucas <eric@lucii.org>

Prev by Date: Re: [PLUG] Convert from Hard Drive to SSD
Next by Date: Re: [PLUG] Convert from Hard Drive to SSD
Previous by thread: [PLUG] Command line tools faster than Hadoop cluster
Next by thread: Re: [PLUG] Command line tools faster than Hadoop cluster
Index(es):
- Date
- Thread