K.S. Bhaskar via plug on 6 Nov 2020 08:11:54 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Copying files to multiple volumes


Steve, yes your technique would work. It is simpler and you don't need to know the number of drives in advance. It reminds me of computing circa 1970 when you just wrote a tape (I thought getting 800 bits to the linear inch was terrific!) till the end, then the next tape, and the next, and so on. They were labeled 1 of n, 2 of n, 3 of n, etc. and we had optimum packing!

A related and also interesting problem is backup when you need the ability to delete/update individual files in the backup.

Regards
– Bhaskar


On Thu, Nov 5, 2020 at 6:17 AM Steve Litt via plug <plug@lists.phillylinux.org> wrote:
Hi Bhasker,

I have another idea, and my idea has the advantage that you don't need
to pre-guess the number of volumes necessary.

* Make a list of all the files with their sizes and directories, and
  sort them by filesize descending.

* Make your first volume.

* LABEL toploop

* If the list has a file smaller than the remaining capacity of the
  volume, put that file in the volume and remove that file from the
  list. Else make a new volume and do no copy.

* If write error (file didn't fit after all), make a new volume and do
  no copy.

* Obtain the current volume's new remaining capacity via df.

* Goto toploop

This algorithm also has pathological cases, and it's almost guaranteed
to leave some empty space on every volume, but it's pretty darn good.
Note that if you're doing lots of files, a shellscript and the calls to
df would be too slow, so you'd need to use Perl or Python or Ruby or
Lua or C, and deduce file sizes and remaining volume capacity with
language appropriate functions.

I think I'll write one of these for backing up my backup tarballs. I
wrote the last one in Ruby, but it was way too complicated. Because I'm
just dealing with maybe 30 huge files, I can do it with a shellscript,
and it would be trivial.

SteveT


On Wed, 4 Nov 2020 11:53:29 -0500
"K.S. Bhaskar via plug" <plug@lists.phillylinux.org> wrote:

> Apologies for the delayed reply. I was busy with election related
> stuff yesterday, and am still trying to make sense of it. Things
> would be so much simpler if the answer were 42, or some such…
>
> Extending the heuristic to n drives is straightforward. After sorting
> the files by size, put the first n files in the first n drives from
> drive 1 to drive n. Then put the next n files the drives from drive n
> to drive 1 (i.e., in reverse order). Then put the next n files in
> drives 1 through n. And so on.
>
> Like any simple-minded heuristic, this has pathological cases. For
> example, if you have one really big file that takes up most of one
> drive, and a gazillion small files that together fit onto a second
> drive, this heuristic can produce pessimal (the opposite of optimal)
> results. A modified heuristic is to fill each drive till it is as
> full as the last drive, and add one file to the drive when starting a
> new row (i.e., to drive 1 or drive n). I'm sure that also has
> pathological cases, but I have not thought through the problem enough
> to articulate what those look like.
>
> Regards
> – Bhaskar
>
> On Tue, Nov 3, 2020 at 3:03 AM Steve Litt via plug <
> plug@lists.phillylinux.org> wrote: 
>
> > On Thu, 29 Oct 2020 10:35:06 -0400
> > "K.S. Bhaskar via plug" <plug@lists.phillylinux.org> wrote:
> > 
> > > You can of course extend the boustrophedon heuristic to n drives,
> > > if you have the drives, e.g., largest file on drive 1, next
> > > largest on drive 2, next two on drive 3, next on drive 2, next
> > > two on drive 1, etc. And maybe two drives will suffice if you
> > > compress each file before storing it. 
> >
> > Bhaskar,
> >
> > Long ago I made a backtracking algorithm (Ruby) to best-fit a bunch
> > of files onto a bunch of volumes. Can you spell overkill? I like
> > your idea much better.
> >
> > SteveT
> >
> > Steve Litt
> > Autumn 2020 featured book: Thriving in Tough Times
> > http://www.troubleshooters.com/thrive
> > ___________________________________________________________________________
> > Philadelphia Linux Users Group         --
> > http://www.phillylinux.org
> > Announcements -
> > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > General Discussion  --
> > http://lists.phillylinux.org/mailman/listinfo/plug
> > 



--
SteveT

Steve Litt
Autumn 2020 featured book: Thriving in Tough Times
http://www.troubleshooters.com/thrive
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug