Keith C. Perry on 25 Apr 2016 14:45:24 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] [plug-announce] TOMORROW - Tue, Apr 19, 2016: PLUG North - "Linux Containers" by Jason Plum and Rich Freeman (*6:30 pm* at CoreDial in Blue Bell)


"
The whole point of zfs/btrfs is that they DO detect these kinds of
corruptions, and they automatically use the redundant copy of the
data.  Sure, if the same block in n+2 drives gets hit by a cosmic ray
then you'll still lose it, but that is MUCH less likely to happen than
a single flip, or two flips of unrelated blocks.  The whole server
could get hit by an asteroid as well."

I think you might be missing my point that file system is only one layer of technology.  We can assume that one file system is superior to another an you would have have the problem of the hard disk failing.  Whether its do to cosmic rays or some other random event it is possible to have the physical HD fail in a way that would represent a silent failure in the file system.

Likewise you can realize a silent error from a higher level- if you were to write a file with bad data to the file system, that file system, any file system, would never know.  For instance, a topic I'm constantly discussing now is ransomware.  Despite specific methods of active detection for that, the general question remains, how can you determine in a absolute manner over time that a file has been encrypted without the users consent.  Those solution are out of scope for here but I will say how you do that has nothing to do with the file system.  

Where we might be diverging is that I'm more focus on the entire process of rewriting and retrieving data whereas you're focused on the file system.  In coming back to the containers, my only argument is that from a process point of view picking BTRFS was a bad idea because you're accept something that for now, relative to other parts of the process has a higher failure rate.  Even if someone said (and to be clear, I'm not saying this was the case), BTRFS was the only thing that could have been used, I would be fine with that but then it just has to be accepted that ephemeral functionality for containers will not be ready for "production" use until BTRFS is ready for "production" use.  My other point is that its generally a bad idea to tell someone you have to use this or that in unless there is some solid technical reasoning behind it and BTRFS doesn't have the time out in the field compared to other file systems to make that claim.

"Drives already anticipate a certain rate of bit flips and have
built-in ECC to handle this.  The errors that creep through only
happen when the loss exceeds what the ECC can handle."

True but again, there always a situation that can blow up a state machine.  We've just gotten better at it and at this point hard drives are consider very reliable.  However, we still do back-ups because hard drives do fail.  This has nothing to do with the file system but is does have something to do with the entire process of successfully storing and retrieving data.  People aren't going to stop doing back ups because they are running BTRFS, at least I hope not  :D

"I don't think Oracle would be putting so much money into these
filesystems if they didn't think they offered a practical advantage."

;)

I did a presentation in 1999 about the 2000 bios time bug and made the comment that it was much ado about nothing for various reasons.  During the Q&A someone said, "well, I don't think <whatever company he used> would be putting money into this if there wasn't something to it"

Again, I'm not saying that when you compare one fs to another you can't make the evaluation that one is better than the other.  That's pretty easy to do.  What I am saying is that, that is only one part of the process so, just like with security, you don't want be too focused on single components of an entire process.  If the future of file systems is pull in more of the volume management tasks then fine, we'll see over time how that fairs against what is done today.  I'm personally not convinced that that paradigm is "better" in terms of data durability.  I do see the management benefits because it certainly better to use one interface instead of two or three.

"Yup - snapper is great, and there is no reason that you couldn't use
LVM snapshots for ephemeral containers if they are writable.  I'm sure
the nspawn maintainers would accept a patch if you asked them about
it.

I'm not sure how well LVM handles a large number of snapshots - with
something like snapper you can end up with quite a few.  Though, to be
fair I try to avoid having too many with btrfs as well as when it goes
to clean up a large number of them at once I've run into bugs."

That would require a deeper dive than I currently have time to do but its definitely an interesting topic.  It occurs to me that I can also just loop mount an LVM container with whatever fs I want and then mount LV snapshots to use with nspawn.  Issue solved  :D

This would be fun to play with since I haven't had a need to use LVM snapshots. I run NILFS2 for data volumes need this.  That fs is purpose built for continuous check-pointing.  Any checkpoint can then be turned into a snapshot (and back) that will persist and not be automatically purged by the retention policy.  You can have lots of snapshots, it can just takes the cleaner daemon awhile to do all the garbage collection when you purge.  You do tend to use more space on NILFS2 file systems depending on the retention policy and how far back snapshots go.  I've been able to go back months or years to retrieve data without a problem.

I do run NILFS2 on LVs so it might interesting to see what happens if I created a snapshot LV for that fs to see if things work as expected.


~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
Keith C. Perry, MS E.E. 
Owner, DAO Technologies LLC 
(O) +1.215.525.4165 x2033 
(M) +1.215.432.5167 
www.daotechnologies.com

----- Original Message -----
From: "Rich Freeman" <r-plug@thefreemanclan.net>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Sent: Sunday, April 24, 2016 9:05:12 PM
Subject: Re: [PLUG] [plug-announce] TOMORROW - Tue, Apr 19, 2016: PLUG North - "Linux Containers" by Jason Plum and Rich Freeman (*6:30 pm* at CoreDial in Blue Bell)

On Sun, Apr 24, 2016 at 8:42 PM, Keith C. Perry
<kperry@daotechnologies.com> wrote:
> I understand what a silent corruption is however you have to make a fair comparison.  If cosmic rays can flip bits in hardware or software that goes undetected then all bets are off and you're going to have undetected data corruption no matter how your data is stored.
>
> Storage mechanisms are going to use reliable methods to correct or at least detect bad data but there is still a chance that the n+1 plan is defeated by the n+2 event.  In my experience, its just not that "easy".  Which is to say, it is rare those cosmic rays or random events silently flip a bit and human inspection is the only thing that reveals a problem.

The whole point of zfs/btrfs is that they DO detect these kinds of
corruptions, and they automatically use the redundant copy of the
data.  Sure, if the same block in n+2 drives gets hit by a cosmic ray
then you'll still lose it, but that is MUCH less likely to happen than
a single flip, or two flips of unrelated blocks.  The whole server
could get hit by an asteroid as well.

>
> When you stay within statistical norms, this is just not something you can make a choice of file system on.

There are a lot of people who think that silent corruptions for large
arrays are well within the statistical norms now.

> I don't see why there would be increased concern as storage system get larger.  Densities increases have been slowing in favor of a LVM constructs because there are physical limits to how much data can be stored in standard hard disk form factors.  One of those limiters is going to be how durable the data is.  If we can't reliably retrieve data at a certain density then we will never see those densities.

Of course, but the whole point is that filesystems like zfs/btrfs DO
increase the effectively durability of the data, because they increase
your fault tolerance.

Drives already anticipate a certain rate of bit flips and have
built-in ECC to handle this.  The errors that creep through only
happen when the loss exceeds what the ECC can handle.

>
> So, even though BTRFS and other "modern" COW files might have one type of advantage, practically speaking, all points together might not actually yield a detectable net benefit for this single point.

I don't think Oracle would be putting so much money into these
filesystems if they didn't think they offered a practical advantage.

> Coming back around to containers, I still think you nailed it earlier.  It would have been better to have some choices- every piece of tech has its fans so over time we could see how each ephemeral method worked.  Maybe it would have worked as developers conceived, maybe not.  Maybe in a couple of years we'll be talking about a new method all together.
>
> Here's something I just found...  (well, "pacman -Ss btrfs" pointed me in the right direction)
>
> http://snapper.io/overview.html
>
> Apparently someone figured out how to do snapshot management for LVM, BTRFS and EXT4  :D

Yup - snapper is great, and there is no reason that you couldn't use
LVM snapshots for ephemeral containers if they are writable.  I'm sure
the nspawn maintainers would accept a patch if you asked them about
it.

I'm not sure how well LVM handles a large number of snapshots - with
something like snapper you can end up with quite a few.  Though, to be
fair I try to avoid having too many with btrfs as well as when it goes
to clean up a large number of them at once I've run into bugs.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug