gabriel rosenkoetter on Thu, 27 Feb 2003 21:30:33 -0500


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Backup to CD-RW


On Thu, Feb 27, 2003 at 07:15:51PM -0500, Paul wrote:
> I didn't read far enough down. According to the author, "Tar does not 
> handle compression very well, so I don't use compression."
> 
> Here are the compression options for my version of tar:

Presuming you don't mean that you've written your own implementation
of tar(1), then you're using some GNU/Linux with its default tar;
that's (probably) GNU tar[1]. I don't know precisely to what the
author's referring by suggesting that "tar's not very good with
compression". Perhaps he just means that GNU tar's performance
sucks when using zlib (which is probably actually zlib's fault).

In general though, GNU tar's performance DOES suck. On a piece of
hardware I use frequently (an Exabyte EZ17 Mammoth2 tape loader[2]),
GNU tar averages 12-15 MB/s and Schily tar averages 25 MB/s or so.
What's more, GNU tar does certain things just flat-out wrong. See
ftp://ftp.berlios.de/pub/star/alpha/STARvsGNUTAR for more on that.

> -j, -I, --bzip
> -z, --gzip, --ungzip
> -Z, --compress, --uncompress

Each of these flags will farm the compression out to a different
library routine. This is faster than piping to the command line
utility in question, in theory, because there's no context switch
between applications on the processor and no passing through kernel
memory by way of a pipe. On an SMP system, you could maybe do better
having tar(1) write to a named pipe (or, better, a memory buffer
of some sort) and compression utility read from it, each spinning
on a processor, provided your compression utility can keep up with
your tar(1) (modulo reblocking or whatever in your memory buffer,
should such exist). Maybe.

[1] "tar" means many things to many people. It's part of the POSIX
standard, but it's also the default name for utilities implementing
(or, often, *pretending* to implement) that part of the standard.
It just means "tape archive", but there are some very definite
truths about what an archive should look like. Properly, what the
POSIX.2 (that is, IEEE standard 1003.2) defines is called ustar,
and it's something mostly like this format that most things passing
themselves off as tar(1) come close to writing and reading.

[2] Humorous note: I'd have linked to (or at least listed
advertising performance numbers from) Exabyte's page about this
device... only but all of http://www.exabyte.com/ seems to be hosed
right now because their ColdFusion server has died. If you don't
work with me, you probably won't find this quite as funny as you
would otherwise (though, if you've ever dealt with CF on the server
side, perhaps you will).

-- 
gabriel rosenkoetter
gr@eclipsed.net

Attachment: pgpMiVaCrRKq2.pgp
Description: PGP signature