Chuck Peters on 5 Dec 2009 00:38:28 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Self-hosted online backups?




On Fri, Dec 4, 2009 at 3:36 PM, Fred Stluka <fred@bristle.com> wrote:
JP,

If you do choose to roll your own, here's what I did over 25 years
ago, and have evolved since.

To keep it small, do an incremental copy (modified files only)
into an empty local directory, and then copy that directory,
encrypting, compressing, etc., to a remote host.

You already seem to have a good handle on how to do the encryption,
compression, remote copy, etc., and several others have already
chimed in with suggestions along those lines.

The part I'm interested in is the
incremental copy into an empty
local directory.  
You could probably do it with rsync, using its
--compare-dest=DIR option to tell it to compare one fully populated
directory tree with another, but to copy into a 3rd empty tree.
However, I wrote mine long before rsync existed, and don't
necessarily have an old local fully populated tree handy at the
time that I want to do the incremental copy.  Therefore, I
accomplish the incremental backup into a local empty tree via my
own scripts using commands like:

    % find -s . -newer "$timestamp" -print0 | xargs -0 -n 1 -J % xargsabort cprelsafe % "$target_dir"
    % touch "$timestamp"

where:

    xargsabort is a script wrapper for any command that translates
    error return codes to 255 so that xargs will abort on the first
    error.

    cprelsafe is a script to copy a file specified by a relative
    pathname to the same relative pathname in a target directory
    tree, creating the nested directories as needed, and suppressing
    errors that would otherwise occur when passed a directory name
    instead of a file name.

I use this for very fast incremental backups.  I run it on a tree
of over 6,000 directories containing over 40,000 files with a total
size of over 5 GB.  It typically copies about 300 MB, since the rest
are unchanged, and typically takes less than 30 seconds.  I usually
run it each time I walk away from the computer for lunch or an
appointment, and at the end of each day. 

Each time, I copy into a new numbered directory on an external USB
drive, and then slip the USB drive into my pocket as I walk out, so
I can easily revert back to an old copy of a file.  I also stage the
numbered directories locally, so I can casually compare current files
with old files, or revert to an old version, w/o having to walk into
the other room, or come in the house from my garden office, to get
the USB drive and plug it in.

Less often, perhaps once a week or so, I also do backups via rsync
directly into a fully populated backup tree on a USB drive.  This
has the advantage of maintaining a fully populated backup tree, not
a set of sparsely populated numbered trees, but has the disadvantage
of overwriting the previously backed up files in that tree with newer
versions, so a corrupted new file would corrupt its own backup.

Between the two approaches, I have my bases covered pretty well. 
Never lost a file yet, in over 25 years!

If you like, I can send you the scripts.  I have Unix and Windows
versions.

I've been meaning for years to publish them at my Web site and
mail them to my Unix Tips mailing and my Windows Tips mailing
list, but busy, busy, busy...

--Fred
---------------------------------------------------------------------
Fred Stluka -- mailto:fred@bristle.com -- http://bristle.com/~fred/
Bristle Software, Inc -- http://bristle.com -- Glad to be of service!
---------------------------------------------------------------------


JP Vossen wrote:
As is probably the case for a lot of us, I control Linux servers in 
various locations (e.g, my house, my Mom's house).  I want to set up a 
self-hosted online backup service and copy Mom's data to my house and my 
data to her house.  I want the data to be compressed, encrypted (both in 
transit and at rest), have multiple copes/versions 
(daily/weekly/monthly) and to be disk and bandwidth efficient.

Obviously, I could script something using tar, GPG, rsync, and/or other 
tools, but I can't be the only person out there who wants this, and why 
reinvent the wheel?

I've considered rsync.net which sounds really cool, and I was just 
reading about tarsnap.com.  Tarsnap does exactly what I want, except it 
uses a pay-for hosted back-end (AWS).  While neither of them is 
expensive, I'd prefer not to use "the cloud" for various reasons 
including the fact that I'm paranoid, cheap and sometimes a 
control-freak. :-)  I could possibly modify the tarsnip code to work the 
way I want, but that is precluded by the ToS 
(http://www.tarsnap.com/legal.html).  I think the tarsnip setup is 
brilliant on several levels, it's just not what I personally want.

One really simple solution is to just create a local compressed tarball, 
then encrypt that, then rsync it.  But that's crappy because it needs 
2-3x local disk space, depending on how the encryption works the file 
may change so much that rsync is no use, it does not allow 
space-efficient versions, and probably other things I'm forgetting.

My data includes ~20G of pictures and that will only grow, and a mix of 
other static and dynamic data including revision control systems, 
documents and DB files.  Actually, I could get up to a bit under 200G if 
I was really sloppy about what I back up.  So the local 2-3x disk space 
and I/O is non-trivial, and even cheap storage and bandwidth would start 
to add up.

If I have to roll my own, I can and will--eventually.  Meanwhile does 
anyone know of anything that I can self-host without a lot of DIY?

Thanks,
JP
----------------------------|:::======|-------------------------------
JP Vossen, CISSP            |:::======|      http://bashcookbook.com/
My Account, My Opinions     |=========|      http://www.jpsdomain.org/
----------------------------|=========|-------------------------------
"Microsoft Tax" = the additional hardware & yearly fees for the add-on
software required to protect Windows from its own poorly designed and
implemented self, while the overhead incidentally flattens Moore's Law.
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


  

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug