Rich Freeman on 10 Mar 2016 17:05:09 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Command line tools faster than Hadoop cluster


On Thu, Mar 10, 2016 at 6:42 PM, Eric Lucas <eric@lucii.org> wrote:
> I just throw this out for your perusal - I stumbled across it this morning
> and found it interesting.
>
> http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
>
> I knew about the shell behavior but never really thought of it as 'parallel'
> processing.   DOH!
>

As an alternative to the xargs-based approach for parallel processing
here, you should also look at GNU Parallel, which is very similar but
it orders its output as if you had run all the jobs in serial.

I've used it to implement map-reduce problems using pipes.  For
smaller jobs where the overhead of hadoop is too great, it works
really well.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug