Rich Kulawiec on 17 Mar 2017 05:37:36 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Avoid Arvixe at all costs!


On Thu, Mar 16, 2017 at 08:31:16PM -0400, Greg Helledy wrote:
> >	- Why do you want to do this?  What is it that you anticipate
> >	breaking so badly that you need to do this more than once a day?
> 
> Maybe I should have written "other than just overnight interval"--I need
> copies from last night, the night before, and last week.
> 
> So that when the user tells me they accidentally deleted the folder
> containing all their current project emails, I can restore it.  Even if they
> don't tell me for a few days.

Gotcha.  This makes perfect sense.  The fastest, easiest solution to this
is probably to use dump(8) with what is often called a "towers of Hanoi"
sequence of dump levels.  (So-called because of the famous mathematical
problem of the same name.)

To explain, briefly:

All versions of dump support levels numbered from 0 to 9.   (The Linux
version supports levels up to 99, but I strongly recommend against using
those above 10 unless you have a very accurate, very extensive understanding
of exactly what you're doing.  I have such an understanding -- some of my
code went into the BSD and Sun versions of dump -- and I don't use these.)

A level 0 dump is referred to as a "full" or "epoch" dump.  It includes
all files (and I'm going to use "files" as a generic for "files, directories,
and everything else" because it's faster to type) in the target.

A level 1-9 dump is referred to as a "partial" or "incremental" dump.
It includes all files since the last dump at level N-1 or lower.

Thus if you do a level 0 dump on Sunday, a level 1 on Monday, a level 2 
on Tuesday, the dump made on Monday includes "everything that changed
since Sunday's dump" and the dump made on Tuesday includes "everything
that changed since Monday's dump".  If you then do a level 1 on Wednesday,
it will include "everything that's changed since Sunday".

Dump keeps track of when it's been run (provided you use the "u" flag,
which you'll need to if you want to use a scheme like this) in the file
/var/lib/dumpdates (Linux) or /etc/dumpdates (BSD, Solaris).  That's
how it knows what level was done when.

Now here's where the art of backups comes in: selecting the incremental
scheme depends on what's going on in your filesystems.  A very simple
approach is to do level 0-6 matching the day-of-the-week.  For many
use cases, this suffices.  BUT it means that you're doing level 0's
once a week, which means you have to store them, which means that all
those dumps of /usr will be highly redundant -- because it probably
doesn't change often.   So maaaaaybe you might want to do a level 0
once a month, do levels 1-9 on successive days, and then repeat.
Thus: 0-1-2-3-4-5-6-7-8-9-1-2-3-4-etc.  This reduces the total
size of the stored backups while still giving you daily coverage,
but it doesn't come for free: the second level 1 in that sequence
will include everything since the level 0, thus it is likely to 
be larger than the first level 1.

And so on: optimizing this scheme several different ways simultaneously
requires knowledge of what your data is doing, what your likely use
cases for backups/restores are, and how much backup space you have.
It's well beyond the scope of a brief message like this.  But having
done this for many, many years, I can tell you that it's worth the
learning and effort investment.

---rsk
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug