Rich Kulawiec on 17 Mar 2017 08:02:48 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Avoid Arvixe at all costs!


On Fri, Mar 17, 2017 at 10:19:26AM -0400, Rich Freeman wrote:
> If you are literally talking about a database I probably wouldn't be
> backing it up with dump, especially not multiple times per day.

Absolutely agreed, that would be a terrible approach.

> For my databases in mysql I have a script that I run before my backups
> which runs mysqldump into a directory that will be picked up by my
> regular backup.  It is going to be far easier to do a partial restore
> this way, and also it is going to be atomic (I don't think dump is).

Also absolutely agreed, and vital.  Dump cannot guarantee referential
integrity of a running database because of course to dump, it's just another
file (or files) and because it takes nonzero time for dump to make a
copy of it in the backup.  So your choices come down to (a) use the
DB's native tools, like mysqldump, to take a snapshot or (b) stop the
database, back it up, start it.  Clearly (b) is not a option in many
cases, nor is it desirable -- because tools like mysqldump create their
output in a format that can be undestood and manipulated *if necessary*,
so backing that version of the DB up is clearly preferable.

This may seem like overkill to some folks, but the first time that you
have to invest an 80-week into recovering a DB that you could have
recovered in an hour using your backups of the DB's text dump and
a tiny bit of shell scripting, it will become obvious why it's not.

> If you want to backup very frequently, or while the system is in use,
> you should probably be thinking about COW filesystems.  

Yes, that's one approach.  But for databases, it would be better to
use the DB's own journaling facilities, as those should guarantee
referential integrity and they should play nice with the DB.
Cloning the journal output and replicating it elsewhere in real time
should provide a recovery path even in the event of total system
loss with no warning.

> Both zfs and btrfs can determine what changed in an incremental backup
> without actually having to read all the inodes in the filesystem, unlike
> any ext2+ backup solution.  As I said, they both have some caveats on
> linux so I wouldn't use them lightly in a production setting, but it is
> something to keep an eye on.

I've used ZFS extensively in production and it does offer some advantages --
but yes, it's important to know the caveats.  I don't have enough btrfs
clue to comment.

> An earlier email mentioned git.  Git is great for some things, but
> keep in mind that you can't delete stuff and it is designed for text
> files.  I wouldn't be using it as a normal backup solution for
> arbitrary data.

Agreed.  Git (or other revision control systems) are highly useful for
text files, as you say, but not a good impedance match for other things.
However, very robust "mini-backup" systems can be built by using a git
or RCS or subversion or whatever repository that's replicated elsewhere.
For critical, individual items -- like "our precious code base that 37
people have been working on for three years" -- one-off solutions like
this are a good idea *in addition* to normal backup procedures.  The cost
of setting them up and running them is small and the payoff, if they're
needed, is enormous.

---rsk
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug