Rich Freeman on 11 Aug 2013 17:20:06 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Offline Backup Solutions


After attending yesterday's Bacula talk I am thinking about doing
offline backups to an eSATA drive.  I'm not sure if Bacula is actually
the right tool for the job though.

I'd like to define the following classes of jobs:
1. MythTV video - 1-2x/wk backup, no retention of deleted/changed
files (~1TB with high turnover)
2. Unimportant files - daily backup, short retention of
deleted/changed files (~1TB with low turnover)
3. Important files - hourly backup, long retention of deleted/changed
files (~30GB with low turnover)

Some of the important files might come from other hosts running
Windows (which makes something like Bacula more attractive).

I'd like all but #1 to run automatically (optional for #1).  I'd like
my large offline storage to remain, well, offline (not physically
connected).  Automated backups would all go into online storage, and
would be migrated to the offline storage when it is connected.  Online
storage would have a capacity of ~100-200GB tops (ie it cannot store a
full backup of anything but the important files).

I'm not sure if any of the out-of-the-box solutions will really handle
this.  For MythTV I'm thinking that must manual rsyncs might be my
best option as it would be fast and accurate (I can trust names/mtimes
and just want to mirror).  For the unimportant files I'd have to
ensure that all full backups are manual and any automated backups are
incrementals/differentials, since I can only perform the full backups
with the offline storage which I'd need to supervise.

Any suggestions?  How are others handling offline storage?  I could
just manually mirror things but then I lose the security of automated
backups.  I could leave the offline storage online, but then that
makes it vulnerable to many failures that would take out the originals
(even if unmounted when not in use).

I was looking at Bacula and it seems like I could sort-of do this.
I'd define the offline storage for full unimportant backups as a pool
and only manually trigger those, and then have regular
differential/incrementals directed to the online storage area.  I
could then migrate that data to the offline storage from time to time
to keep it from filling up.  The only problem with this is that the
retention periods in Bacula are a bit kludgy - I'd need many pairs of
pools on both physical devices to get all that to work out.  I'm not
sure if Bacula will even enforce retention during a migration (if you
migrate a volume into a pool that is full will it purge existing
volumes to make room for the new ones?).

This just seems more complicated than it needs to be.  Surely somebody
must be doing backups using offline disks?  Most of the logic is built
around having a box of tapes and rotating through those, but that is
incredibly expensive these days as tape just hasn't kept pace, and I'm
not going to rotate disks that will end up being 90% empty, or have
the system be doing full backups on multiple-TB of data with any
frequency.

I could just do manual rsyncs/etc, but then if I forget to do it for a
week I am taking a fair bit of risk, and managing retention with rsync
doesn't sound simple.  I could also just leave the drive online but
unmounted.  One advantage of rsync though is that recovery is
brain-dead simple.  I don't mind the thought of recovering onto bare
metal from something like tar/dar/etc, but for something like Bacula
the bar is considerably higher.

The important stuff is already being backed up to S3, and I don't
think I'm going to change that.  This is really about faster recovery
in the event of something other than a fire and backing up all the
other junk that doesn't warrant that kind of treatment.  I'm also
contemplating moving to btrfs and I'd really only want to do that if I
had a fairly full set of recent backups at all times.

How are others handling offline backup?  I may just be
over-engineering things.  I could probably script up manual backups
using rsync/sarab/dar fairly easily, and I know those would be easy to
restore.  (sarab is a script that wraps around dar, and dar is like
tar but with indexing so that most operations don't require scanning
the whole file)

Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug