Re: [PLUG] Virtualization clusters & shared storage

Rich Freeman on 15 Aug 2018 05:42:19 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Virtualization clusters & shared storage

From: Rich Freeman <r-plug@thefreemanclan.net>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Subject: Re: [PLUG] Virtualization clusters & shared storage
Date: Wed, 15 Aug 2018 08:42:01 -0400
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: "plug" <plug-bounces@lists.phillylinux.org>

On Tue, Aug 14, 2018 at 5:59 PM Keith C. Perry
<kperry@daotechnologies.com> wrote:
>
>
> My ZFS was idea was a throw away LOL I don't know enough about that
> system to make a recommendation other than to 1) not use ZFS and 2)
> bring up another node if you want more redundancy.  Honestly, that
> would be my answer in any case unless you're want to archive.

Redundancy inside a cluster isn't the same as a backup.  Redundancy
protects against hardware failure and resulting downtime.  However, it
does not protect against bugs, administration errors, intrusion, and
so on.  Offline backups protect against more failure modes, but at the
cost of downtime during restoration.  There is a place for both, at
least for data of any importance.

>
> ZFS without ECC ram is NOT as safe as ZFS with ECC ram.

Sure, but if there is another layer above it that is doing its own
hashing using ECC RAM, then you're still protected.  I'm not talking
about eliminating ECC on the clients.  I'm talking about eliminating
it on the storage nodes, which are a lower layer of the design.

If a random router corrupts my data as it travels over the internet I
don't care, because TCP/IP contains checksums and when the packet
arrives a retransmission will be requested.  Since the higher layer
contains integrity checks, you don't need to spend a lot more money on
hardware at the lower layer to prevent errors that the higher layer is
already intended to catch.  Now, a bad router is still going to
increase latency as you end up with a ton of retransmission requests,
so sometimes it makes sense to have a reasonable amount of protection
at lower layers, but it isn't essential for the overall integrity of
the complete system.

>
> https://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
>
> The most important line in that is this:
>
> "All that stuff about ZFS self-healing goes down the drain if the system isn't using ECC RAM"
>
> Its not ZFS that makes you **safer** its ECC RAM.  You can debate the merits of ZFS versus other filesystems but that is a different conversation.
>
> Actually, after reading that link I definitely would not recommend
> ZFS under a parallel file system without ECC.  The potential for
> making matters worse is non-zero and as you point out, now we have the
> parallel file system correcting errors when it shouldn't have to.

What are you going to use instead of ZFS?  Every other filesystem has
the exact same problem when used with non-ECC RAM.

That article you linked has led to a LOT of misconceptions about ZFS.
ZFS is no worse than any other filesystem when used with non-ECC RAM.

ECC RAM protects against bit flips in RAM.

ZFS protects against bit flips in the drive, controller, or bus.

Either one on its own improves data integrity, and both together
improve it further.

Now, if your distributed filesystem calculates hashes OFF OF THE
STORAGE NODE, and verifies them there as well during read-back, then
you're getting the same protection at a higher layer, and you don't
NEED the protection of either ECC RAM or ZFS on the storage nodes.  It
might still be desirable if it reduces the need for recovering data
and that improves latency/etc.

If you are only protecting data with hashes that are
generated/verified on the storage node itself, then ECC RAM on the
storage node will improve the integrity of the operation, because now
there is no higher layer.

Here is another analogy that hopefully ZFS/btrfs fans will appreciate:

One of the big potential advantages of distributed filesystems from a
data integrity standpoint is that they make entire hosts redundant.
With RAID you're protected against the failure of a hard disk.  With
distributed filesystems you're protected against failure of an entire
host (CPU/motherboard/power supply/whatever).  Now let's take this a
step further...

In the earlier generation of RAID there was protection against the
total failure of a drive, but not against silent failure.  If you have
a conventional RAID1 and you yank the power on one drive the system
will keep running without any loss of data.  However, if you instead
modify some data on-disk (direct disk writes, or cosmic rays, or
whatever), or the drive controller goes senile, and the drive presents
data to the RAID controller/software without reporting any errors,
then RAID1 is going to cause you problems, because it has no way to
detect problems on its own.  At best during a scrub conventional RAID1
will tell you that the two copies don't match, and if you're VERY
lucky it might give you a way to try to pick which one is right (have
fun with that).

Since this was undesirable later generation filesystems like ZFS/btrfs
implement strong checksums in software that are capable of telling
which copy of the data is right, and which is wrong.  This protects
against silent corruptions AT THE DRIVE LEVEL.

I'm just suggesting that distributed filesystems ought to extend this
paradigm one step further.  Today many of them are like conventional
RAID1.  If the drives report problems or if hosts fail entirely they
can recover the data.  However, if a host presents bad data without an
error, many of these distributed filesystems contain the same flaw as
earlier conventional RAID.  What should be done is to use the same
approach of generating/verifying checksums at a higher layer to add
data security.  This costs very little in terms of space or
computation (this is all software-defined as it is).

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

References:
- [PLUG] Virtualization clusters & shared storage
  - From: JP Vossen <jp@jpsdomain.org>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: "Keith C. Perry" <kperry@daotechnologies.com>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: "Keith C. Perry" <kperry@daotechnologies.com>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: "Keith C. Perry" <kperry@daotechnologies.com>

Prev by Date: Re: [PLUG] Virtualization clusters & shared storage
Next by Date: Re: [PLUG] Speakers needed
Previous by thread: Re: [PLUG] Virtualization clusters & shared storage
Next by thread: Re: [PLUG] Virtualization clusters & shared storage
Index(es):
- Date
- Thread