Rich Freeman on 17 Mar 2017 09:46:58 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Avoid Arvixe at all costs!


On Fri, Mar 17, 2017 at 11:02 AM, Rich Kulawiec <rsk@gsp.org> wrote:
> On Fri, Mar 17, 2017 at 10:19:26AM -0400, Rich Freeman wrote:
>
> This may seem like overkill to some folks, but the first time that you
> have to invest an 80-week into recovering a DB that you could have
> recovered in an hour using your backups of the DB's text dump and
> a tiny bit of shell scripting, it will become obvious why it's not.
>

Agree.  The other part of it is that from a text dump it is pretty
easy to even extract a single record.  If you only have a filesystem
backup you're going to have to restore the entire DB, spin up a new
instance, then extract the records you care about, and then update
your other database.

>> If you want to backup very frequently, or while the system is in use,
>> you should probably be thinking about COW filesystems.
>
> Yes, that's one approach.  But for databases, it would be better to
> use the DB's own journaling facilities, as those should guarantee
> referential integrity and they should play nice with the DB.
> Cloning the journal output and replicating it elsewhere in real time
> should provide a recovery path even in the event of total system
> loss with no warning.

Certainly.  I was referring to backing up other stuff here.  The COW
filesystems do generally have atomic snapshots, but of course there is
no guarantee that it is in sync with the output of any particular
process including a database, while a database backup tool will ensure
the database is in a sane state at the moment the backup is taken.  A
filesystem-level snapshot could be consistent if you stop the database
while making it (which could be for a very short time, but not zero).

>
> I've used ZFS extensively in production and it does offer some advantages --
> but yes, it's important to know the caveats.  I don't have enough btrfs
> clue to comment.

So, the main caveats with zfs on linux are:
1.  Zfs on linux in general is newish though in my (limited)
experience it has been fairly robust.  On FreeBSD or especially
Solaris it is likely to be more robust.
2.  Zfs on linux suffers from license compatibility issues, so you're
going to have to mess with kernel modules and such.  Also, if anything
like / or /usr is on zfs you need to ensure you have a reasonable
rescue disk if you can't boot your system, as most canned rescue disks
do not support zfs out of the box on linux.

Btrfs really just has one caveat:
1.  It will eat your data some day when you'd really prefer that it didn't.

I've been fairly bullish on btrfs in the past, but now I'm hopefully
for the day I can be bullish again but honestly I feel like it has
taken 3 steps forward and 4 steps backwards lately.  Back when I first
embraced it the sense was that it was "almost there" and I was aiming
to gain experience in hope of going all in maybe a year down the road.
That never happened and it has been years, and if anything I trust it
less than I used to.  I never was completely burned because I kept
VERY rigorous backups of anything on btrfs, but I've had two cases
where I ended up doing a full restore and it was a big hassle.  The
most recent time I switched to zfs for that particular filesystem, so
right now I'm running a mix of zfs and btrfs.

I really WANT btrfs to take off, because it has some really useful
features zfs lacks that I miss.  (Don't get me wrong, zfs has some
features btrfs lacks as well but they are a bit overkill for my
needs.)  Things that I miss are first-class clones (the equivalent of
being able to create 3 zfs clones from a snapshot and then delete the
original dataset they came from), reflinks (single-file snaphots which
can be made the default operation of the cp command), being able to
remove/shrink devices, being able to add individual disks to a raid
array, and doing raid in mixed environments without leaving lots of
space unused.

> However, very robust "mini-backup" systems can be built by using a git
> or RCS or subversion or whatever repository that's replicated elsewhere.
> For critical, individual items -- like "our precious code base that 37
> people have been working on for three years" -- one-off solutions like
> this are a good idea *in addition* to normal backup procedures.  The cost
> of setting them up and running them is small and the payoff, if they're
> needed, is enormous.

Certainly.  If you really do have a code repository having a script
push/pull it to a remote repository is going to be a very effective
solution.  Git was designed to make this operation very inexpensive.
It is much more efficient for replicating changes than rsync, assuming
you intend to replicate all the intermediate versions.  If you're
replicating a fairly out-of-date tree and you don't care about the
intermediate versions then rsync could be faster (assuming you're just
duplicating a checked out tree minus the .git directory).  git and
btrfs are actually fairly similar in terms of how they determine what
has changed between commits/snapshots.  I suspect zfs is similar but
I'm less familiar with its on-disk format.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug