Pat Regan on 1 Nov 2005 17:39:57 -0000

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Re: RAID Backup Server wrote:
> Yep. For some small sites, I'm doing something very similar, using BackupPC 
> ( The server side is Perl, and it 
> speaks rsync and samba (so clients can be *nix or Windoze boxes). It does some 
> nice things with building "composite" backups, where identical files from your 
> N different machines are really only stored as a single file.

I have often wished rdiff-backup could find identical (or similar) files
between backup sets and only store the differences.  When storing
incremental snapshots, does  BackupPC store entire changed files, or
just the blocks that changed?

> => The only disadvantage is the CPU hit from generating the differentials
> => (for anyone who doesn't know, rdiff-backup only stores the pieces of a
> => file that changed instead of entire copies of changed files).
> The rsync algorithm sends only the delta between versions of a file, but the
> copy on the backup server is saved as a real file, not as just the changed
> pieces. I'm guessing that the CPU hit is similar. Does rdiff require something
> (ie, another invocation of rdiff) to reassemble the file, and can you choose
> which version to recover. Does it function like [rcs|sccs|cvs]?

I think the CPU hit really comes in because of the way the incrementals
are stored.  With rdiff-backup, current backup is the full backup.
Anything behind that is stored as an rdiff against the tree before it...
 So if a file changed every day, and you wanted to restore the oldest
copy, rdiff-backup would have to apply every diff to get back to that point.

Recovery speed isn't something I have ever had to worry about though.
If I need an entire tree, I need the latest.  If I need something old, I
don't need much :).

> The real hit with rsync is in building the catalog of files that have changed, 
> prior to sending them to the backup server. The memory use for a large file 
> system is quite substantial.

rdiff-backup isn't very memory hungry, it goes one file at a time.  I
should probably quantify my statement saying that rdiff-backup is CPU
intensive.  On any backup after the first on the LAN, the backup speed
is usually limited by CPU and not network or disk.  As the network speed
drops to the 100k/sec range it becomes network bound.

> => 
> => > The only thing I would like to see is the ability to encrypt the stored
> => > data on the fly but you cannot have everything.
> What's your concern here? Are you worried about sniffing the traffic, or about 
> the security of the stored data? I assume that you could set things up to talk 
> over an ssh or ssl tunnel.

In my case, I have been thinking about storing my personal backups on my
virtual server.  Since I don't trust the people who run the machine with
my data, I would want my data encrypted.

I have been mulling over the idea of colocating a machine with tons of
disk to share with some friends for backing up data.  If the machine was
in a colo, I would want the data encrypted, too :).

> OK. Have you considered using an encrypted filesystem? It offers the advantage 
> that (once the machine is up and you've entered your key), there's no need for 
> more interaction, and it's transparent to the applications. It won't protect 
> you from malicious users, but it's a possible solution for removable drives 
> (pull out the disk, and send it to the Duluth office as an off-site backup), or 
> securing data in the event that someone steals your server. It's also nice to 
> know that the data's encrypted, so that if a drive dies, or needs to be RMA'd, 
> even if you can't erase it, the data is reasonably safe.

If the machine is available on the internet I would rather have the
individual backups encrypted.  Each user could have their own keys, and
wouldn't have to fully trust each other.


Attachment: signature.asc
Description: OpenPGP digital signature

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --