bergman on 1 Nov 2005 17:03:59 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Re: RAID Backup Server



In the message dated: Tue, 01 Nov 2005 08:20:43 EST,
The pithy ruminations from Pat Regan on 
<Re: [PLUG] Re: RAID Backup Server> were:

=> 
=> Tom Diehl wrote:
=> > There is also rdiff-backup. Uses rsync technology, can talk to a remote
=> > site over ssh and keeps diffs going as far back as you have the disk space
=> > to store (if that is what you want). ;)
=> > 
=> 
=> rdiff-backup is a wonderful piece of software, especially if you want to
=> keep lots of small incremental backups around.   I like to use
=> rdiff-backup to backup all my data to a single location.  I keep just
=> about as many increments as will fit, and I use this location as a
=> staging area for generating real backups.

Yep. For some small sites, I'm doing something very similar, using BackupPC 
(http://backuppc.sourceforge.net/info.html). The server side is Perl, and it 
speaks rsync and samba (so clients can be *nix or Windoze boxes). It does some 
nice things with building "composite" backups, where identical files from your 
N different machines are really only stored as a single file.

For example, at site "A", I've got 4 machines with about 65GB of usable disk
space. There are 17 full backups, and 20 incrementals, going back about 2 
months. The space requirement prior to pooling and compression is about 172GB, 
and the actual space used is less than 1/3 of that. This is actually a bad 
case, as the 4 machines are different architectures, and have almost no files 
in common.

At site "B", there are 11 servers that are more similar. There are 28 full
backups and 58 incrementals in the last month, which would take up almost
exactly 1TB of space, but which really consume less than 1/2 of that.

Getting that data 
=> 
=> The only disadvantage is the CPU hit from generating the differentials
=> (for anyone who doesn't know, rdiff-backup only stores the pieces of a
=> file that changed instead of entire copies of changed files).

The rsync algorithm sends only the delta between versions of a file, but the
copy on the backup server is saved as a real file, not as just the changed
pieces. I'm guessing that the CPU hit is similar. Does rdiff require something
(ie, another invocation of rdiff) to reassemble the file, and can you choose
which version to recover. Does it function like [rcs|sccs|cvs]?

The real hit with rsync is in building the catalog of files that have changed, 
prior to sending them to the backup server. The memory use for a large file 
system is quite substantial.

=> 
=> > The only thing I would like to see is the ability to encrypt the stored
=> > data on the fly but you cannot have everything.

What's your concern here? Are you worried about sniffing the traffic, or about 
the security of the stored data? I assume that you could set things up to talk 
over an ssh or ssl tunnel.

=> > 
=> 
=> There is another related project called Duplicity, and I only played
=> with it a little bit.  Everything is stored encrypted using gpg.  I
=> didn't much care for the way it handled the encryption, though...
=> 
=> Apparently the default is to use symmetric encryption.  I wanted to use
=> asymmetric encryption...  I figured that way the backup process would
=> only need my public key, then I would be able to decrypt it with my
=> private key.


OK. Have you considered using an encrypted filesystem? It offers the advantage 
that (once the machine is up and you've entered your key), there's no need for 
more interaction, and it's transparent to the applications. It won't protect 
you from malicious users, but it's a possible solution for removable drives 
(pull out the disk, and send it to the Duluth office as an off-site backup), or 
securing data in the event that someone steals your server. It's also nice to 
know that the data's encrypted, so that if a drive dies, or needs to be RMA'd, 
even if you can't erase it, the data is reasonably safe.


Mark

=> 
=> Unfortunately, Duplicity needs access the encrypted archive.  That means
=> the backup process needs to be able to have access to the private key...
=>   At that point I stopped investigating Duplicity :).
=> 
=> Pat
=> 
=> 
=> --------------enig6C5E4321D769CE7643D391D3
=> Content-Type: application/pgp-signature; name="signature.asc"
=> Content-Description: OpenPGP digital signature
=> Content-Disposition: attachment; filename="signature.asc"
=> 
=> -----BEGIN PGP SIGNATURE-----
=> Version: GnuPG v1.4.2 (GNU/Linux)
=> 
=> iD8DBQFDZ2ur5xI+FcVJCrERAm6sAKCxBXWMFL9nxDahr6/tqD/49lkX2ACeMW6S
=> CyFRki3VL2hU/UkfHLfVa34=
=> =V7Bw
=> -----END PGP SIGNATURE-----
=> 
=> --------------enig6C5E4321D769CE7643D391D3--
=> 
=> --===============1226269513==
=> Content-Type: text/plain; charset="us-ascii"
=> MIME-Version: 1.0
=> Content-Transfer-Encoding: 7bit
=> Content-Disposition: inline
=> 
=> ___________________________________________________________________________
=> Philadelphia Linux Users Group         --        http://www.phillylinux.org
=> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
=> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
=> 
=> --===============1226269513==--
=> 



-----
Mark Bergman    Biker, Rock Climber, Unix mechanic, IATSE #1 Stagehand

http://wwwkeys.pgp.net:11371/pks/lookup?op=get&search=bergman%40merctech.com

I want a newsgroup with a infinite S/N ratio! Now taking CFV on:
rec.motorcycles.stagehands.pet-bird-owners.pinballers.unix-supporters
15+ So Far--Want to join? Check out: http://www.panix.com/~bergman

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug