Rich Freeman on 25 Apr 2016 16:44:38 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] [plug-announce] TOMORROW - Tue, Apr 19, 2016: PLUG North - "Linux Containers" by Jason Plum and Rich Freeman (*6:30 pm* at CoreDial in Blue Bell)


On Mon, Apr 25, 2016 at 5:45 PM, Keith C. Perry
<kperry@daotechnologies.com> wrote:
>
> True but again, there always a situation that can blow up a state machine.  We've just gotten better at it and at this point hard drives are consider very reliable.  However, we still do back-ups because hard drives do fail.  This has nothing to do with the file system but is does have something to do with the entire process of successfully storing and retrieving data.  People aren't going to stop doing back ups because they are running BTRFS, at least I hope not  :D

Sure, and perhaps that is also one of the other appeals of zfs/btrfs:
they're much more efficient to backup.  Both support efficiently
generating a stream containing all the changes between two snapshots,
and re-creating the filesystem from that stream.  You could use that
to drive a replica of the filesystem offsite, or you could just store
those streams in files on tapes/etc and replay them to restore a
backup.

Sure, you can generate incremental backups from any filesystem, but
btrfs/zfs allow you to do it in a way that:
1.  Is 100% reliable (that is, no changes will ever be missed).
2.  Does not require reading/diffing/hashing/etc all the data on the
filesystem, or even reading every directory tree on the filesystem.

Software like rsync usually can do #1 but usually don't because it
requires reading all the files to compare hashes (it isn't a default).
Rsync in its default gets close to #2 but still requires reading every
directory entry, and probably every inode (though I'm not 100% sure on
the latter).

zfs/btrfs take advantage of the COW design of the filesystem to
rapidly identify parts of the filesystem that have diverged and only
back up those, much as git can diff two commits without having to read
every file or even every directory tree.  In the case of btrfs if the
root of the tree has 9/10 nodes in common between two snapshots then
it knows nothing under those nodes has changed, and then only 10% of
the entire metadata (at most) needs to be examined to find where the
differences are.  At each level of the tree there is again the
opportunity to logrithmically eliminate portions of the search space.
A balanced btree can store an incredible number of records accessible
by only a few seeks.

I don't dispute that btrfs is still fairly experimental.  But, we're
talking about the Fedora team here as well, and as I said I'm sure if
somebody handed them code for an LVM version they'd probably accept
it.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug