Rich Freeman on 10 Mar 2016 17:05:09 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Command line tools faster than Hadoop cluster |
On Thu, Mar 10, 2016 at 6:42 PM, Eric Lucas <eric@lucii.org> wrote: > I just throw this out for your perusal - I stumbled across it this morning > and found it interesting. > > http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html > > I knew about the shell behavior but never really thought of it as 'parallel' > processing. DOH! > As an alternative to the xargs-based approach for parallel processing here, you should also look at GNU Parallel, which is very similar but it orders its output as if you had run all the jobs in serial. I've used it to implement map-reduce problems using pipes. For smaller jobs where the overhead of hadoop is too great, it works really well. -- Rich ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug