Julien Vehent on 4 Apr 2011 13:21:41 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] gnu parallel and tar




On Mon, 4 Apr 2011 15:45:55 -0400, Austin Murphy wrote:
On Mon, Apr 4, 2011 at 2:37 PM, Julien Vehent <julien@linuxwall.info> wrote:
On Mon, 4 Apr 2011 13:21:59 -0400, Austin Murphy wrote:

I've had a good experience with lbzip2, a multi-threaded
implementation of bzip.
....
Initial file:
$ ls -s jmeter-server-node1.log --block-size=1
689274880 jmeter-server-node1.log


=== with bzip2 ====
$ time bzip2 -z -9 jmeter-server-node1.log

real  Â8m33.220s
user  Â8m31.444s
sys   0m0.880s

$ ls -s jmeter-server-node1.log.bz2 --block-size=1
1589248 jmeter-server-node1.log.bz2
....
=== with lbzip2 ====
$ time lbzip2 -n 4 -z -9 -S jmeter-server-node1.log

real  Â5m37.425s
user  Â20m57.227s
sys   0m5.016s

$ ls -s jmeter-server-node1.log.bz2 --block-size=1
1601536 jmeter-server-node1.log.bz2
....
Compression is of the same level, but I'm surprised to see that while lbzip2 is 65% faster, it also uses 250% more user time than bzip2. The efficiency
per-core is a lot lower, but I'm happy to be using all my cores.


My understanding is that bzip2 is highly optimized to avoid cache
misses.  If you have too many threads running at once you might be
blowing out a shared cache.   You might try running with -n 2 or
letting it decide how many threads to run.



Well that's interesting:

$ time lbzip2 -n 2 -z -9 jmeter-server-node1.log

real	5m55.924s
user	11m47.732s
sys	0m2.480s


I'm only using 2 threads and I get almost the same performances as with 4 threads.

Parallelism is hard :)


Julien
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug