Steve Litt via plug on 5 Nov 2020 03:17:55 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Copying files to multiple volumes


Hi Bhasker,

I have another idea, and my idea has the advantage that you don't need
to pre-guess the number of volumes necessary.

* Make a list of all the files with their sizes and directories, and
  sort them by filesize descending.

* Make your first volume.

* LABEL toploop

* If the list has a file smaller than the remaining capacity of the
  volume, put that file in the volume and remove that file from the
  list. Else make a new volume and do no copy.

* If write error (file didn't fit after all), make a new volume and do
  no copy.

* Obtain the current volume's new remaining capacity via df.

* Goto toploop

This algorithm also has pathological cases, and it's almost guaranteed
to leave some empty space on every volume, but it's pretty darn good.
Note that if you're doing lots of files, a shellscript and the calls to
df would be too slow, so you'd need to use Perl or Python or Ruby or
Lua or C, and deduce file sizes and remaining volume capacity with
language appropriate functions.

I think I'll write one of these for backing up my backup tarballs. I
wrote the last one in Ruby, but it was way too complicated. Because I'm
just dealing with maybe 30 huge files, I can do it with a shellscript,
and it would be trivial.

SteveT


On Wed, 4 Nov 2020 11:53:29 -0500
"K.S. Bhaskar via plug" <plug@lists.phillylinux.org> wrote:

> Apologies for the delayed reply. I was busy with election related
> stuff yesterday, and am still trying to make sense of it. Things
> would be so much simpler if the answer were 42, or some such…
> 
> Extending the heuristic to n drives is straightforward. After sorting
> the files by size, put the first n files in the first n drives from
> drive 1 to drive n. Then put the next n files the drives from drive n
> to drive 1 (i.e., in reverse order). Then put the next n files in
> drives 1 through n. And so on.
> 
> Like any simple-minded heuristic, this has pathological cases. For
> example, if you have one really big file that takes up most of one
> drive, and a gazillion small files that together fit onto a second
> drive, this heuristic can produce pessimal (the opposite of optimal)
> results. A modified heuristic is to fill each drive till it is as
> full as the last drive, and add one file to the drive when starting a
> new row (i.e., to drive 1 or drive n). I'm sure that also has
> pathological cases, but I have not thought through the problem enough
> to articulate what those look like.
> 
> Regards
> – Bhaskar
> 
> On Tue, Nov 3, 2020 at 3:03 AM Steve Litt via plug <
> plug@lists.phillylinux.org> wrote:  
> 
> > On Thu, 29 Oct 2020 10:35:06 -0400
> > "K.S. Bhaskar via plug" <plug@lists.phillylinux.org> wrote:
> >  
> > > You can of course extend the boustrophedon heuristic to n drives,
> > > if you have the drives, e.g., largest file on drive 1, next
> > > largest on drive 2, next two on drive 3, next on drive 2, next
> > > two on drive 1, etc. And maybe two drives will suffice if you
> > > compress each file before storing it.  
> >
> > Bhaskar,
> >
> > Long ago I made a backtracking algorithm (Ruby) to best-fit a bunch
> > of files onto a bunch of volumes. Can you spell overkill? I like
> > your idea much better.
> >
> > SteveT
> >
> > Steve Litt
> > Autumn 2020 featured book: Thriving in Tough Times
> > http://www.troubleshooters.com/thrive
> > ___________________________________________________________________________
> > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> >  



-- 
SteveT

Steve Litt 
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug