Jason M. Lenthe on 10 Nov 2004 00:27:03 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] CD image packer


On Tue, 2004-11-09 at 17:48, Jeff Abrahamson wrote:
> I want to dump files onto CD's for backup purposes.  I have more files
> than will fit on a single CD, but I want each CD to stand unto itself
> (in case of loss of some of the others).  Ideally, each CD would
> simply have a fragment of my file system on it or at least a single
> but whole tar archive.

I've been thinking about an automated CD backup system a little lately
myself (though not too much).

> Does anyone know a tool that does this?  I haven't found one (google,
> freshmeat).

I don't, but I do have some other (probably) useless but interesting
information.

> In a way, I hope there's not, as it sounds like fun to code.  The
> family of algorithms behind it is called bin packing algorithms, and
> the problem (in general) is NP-complete.

Interesting observation..Indeed, optimal bin packing in NP-complete, but
not to worry!  If the average size of your files is small in comparison
to the CD capacity, the trivial algorithm should be efficient (the
trivial algorithm being packing the files in the arbitrary order that
they come in).  Its like filling a trash can with sand versus filling it
with basketballs.

I just determined that my home directory has 4915 files at an average
size of 156.6 KB.  If you do the math assuming a CD holds 700 MB of data
and ignoring filesystem overhead, then the average packing efficiency
(excluding the last CD in the backup set) for my home directory using
the trivial algorithm would be 99.99%...not too shabby! My largest file
was 124.3 MB which yields a pathological worst case packing efficiency
of 88% for my home directory.

I hope the average size of your files is small.

I calculated the average file size of my home directory using a
handy-dandy CLI program that I created called dstat (short for
Descriptive Statistics) which can be found at
http://home.comcast.net/~lenthe/dstat.cc.  With dstat the following
command did the trick: ls -alR ~ | egrep -v '^d' | awk '{print $5;}' |
dstat.  Someone should let me know if there's an easier way to calculate
the average size of a bunch of files.

Sincerely,
Jason



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug