Re: [PLUG] Backup drive filling up

Rich Freeman on 25 Jul 2012 05:32:14 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Backup drive filling up

From: Rich Freeman <r-plug@thefreemanclan.net>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Subject: Re: [PLUG] Backup drive filling up
Date: Wed, 25 Jul 2012 08:31:56 -0400
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=IHVgoAu7/vePr7Mxmstu3rAbxLUKQDj05vBiwXEvqUg=; b=TkVvnKHtrTVcZw/3vTv10Bk2RxSVpsEQt5plDxImbBy9fqPIPaVoX1SJInxQVR3YYo AKuOK8EygKfqiqd4vrUamfRGU3jFuf9kst75h10R3hw1Cq8cYv0VErDayfehl6YgZHPu r/64TZuxMWWe6AprS/eQHZVGNarQ8v/PVdy+AaWB8ISIO7uGoEx346TFQ/DTnHEchhIw b3dkJetj1Q3UtQf8eBiPNEWb3a0QEhX7reZ1XMCmb+J4dAuqb1X9bxe4gqD8sNn5LvmX j1wPlo1/haRhhM2DwcKO9p2aj/ANrmIL5yH+bn41D7SycugXfXGbhe1MkJG6tG03Vi/p y1yA==
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: plug-bounces@lists.phillylinux.org

On Wed, Jul 25, 2012 at 7:51 AM, Walt Mankowski <waltman@pobox.com> wrote:
> I think it's got to do something close to a complete read pass.  It
> generally takes about 45 minutes on my box.

That is why I generally don't use rsync for backups.  It does have
benefits if you're running it as a daemon over a network, as it is
economical with network IO, but it does not do much to cut down on
disk IO.  Basically it assumes that files could have been modified
without their mtimes being updated, and that actually can happen.
Looking at the man pages it seems like you can tell it to ignore
times, but not to rely on times without checking checksums.

Ideally it should let you tell it to use checksums on only one side of
the operation, and cache them on the other.  There is a real risk that
mtimes might not be reliable in my source data, but an offline backup
disk shouldn't be touched except by rsync, so if it just kept a file
with a checksum index then it could read that file without having to
go recalculating 500GB of hashes.

If you use Amazon s3 and utilities like s3cmd sync you get this
automatically - Amazon S3 maintains hashes for everything they store
which can be read without retrieving the file (and incurring bandwidth
costs).  s3cmd sync reads files locally to determine checksums and
compares those to Amazon's, and only uploads files that don't match.
Alas, I don't think it supports binary diffs/etc.

While it would probably be harder to read without tools, a
content-hashed backup solution would get around some of this.  I used
to use a linux backup solution called backuppc which didn't use
content-hashing per-se, but it did make a pass at the backup
directories and replace files with hard-links to deduplicate them.
That is dangerous if the backups aren't only manipulated through the
software (or simply read from the directories), but the backups stored
by that software were ordinary directory trees.  A safer option would
be to convert the files to reflinks if btrfs were available (these are
COW copies that share blocks until they are modified).

The biggest limitation of anything that stores backups as copied
directory trees is that it is very wasteful of space, as filesystems
aren't super-efficient for files you don't intend to access often.

Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

References:
- [PLUG] Backup drive filling up
  - From: Walt Mankowski <waltman@pobox.com>
- Re: [PLUG] Backup drive filling up
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Backup drive filling up
  - From: Walt Mankowski <waltman@pobox.com>

Prev by Date: Re: [PLUG] Backup drive filling up
Next by Date: Re: [PLUG] php pipe to less broken?
Previous by thread: Re: [PLUG] Backup drive filling up
Next by thread: Re: [PLUG] Backup drive filling up
Index(es):
- Date
- Thread