Re: [PLUG] Virtualization clusters & shared storage

Keith C. Perry on 17 Aug 2018 14:28:04 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Virtualization clusters & shared storage

From: "Keith C. Perry" <kperry@daotechnologies.com>
To: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Subject: Re: [PLUG] Virtualization clusters & shared storage
Date: Fri, 17 Aug 2018 17:26:55 -0400 (EDT)
Reply-to: Philadelphia Linux User's Group Discussion List <plug@lists.phillylinux.org>
Sender: "plug" <plug-bounces@lists.phillylinux.org>
Thread-index: Q+SeFRJQ6CIrv28448IPUZVRV5AZmA==
Thread-topic: Virtualization clusters & shared storage

Rich,

In your scenario you're trying to have things both ways.  On the one hand you're saying that you would you have ECC ram on clients to make sure source data is trusted but then on the storage system, you would not use ECC ram and you want to use some ZFSy 'ish checksum magic to get to the same confidence level.

That's an asymmetric solution because doing things in hardware is not the same as software.  At best you have to have **expectation** that the confidence level is that of the weakest system.  In practice I'm sure it would be higher but as a rule I would not think that way.

I see I'm not going to convince you on this so I'm going to drop it and let readers do their own research to convince themselves one way or the other  :D

I am rather concerned about the ZFS fanboy'ism though- more generally since it doesn't apply to a conversation about parallel file systems.  As I illustrated before there is other hardware correction code implemented on motherboards today.  ECC is an additional step.  That doesn't mean data is not safe without ECC it means data is more safe with ECC.  Despite the engineering points and recent talk about cosmic rays flipping bits in flight, the reality still is that most people do not use ECC ram and have very high data fidelity.  People trust these systems to their finances, store memories and other important data every day.  This was going on long before all the cloud stuff too.

ZFS vs. everything-else is like systemd vs. everything-else... it is **another** approach to something.  It is not **the** approach.  I'm trying to avoid that kind of discussion.  If you like ZFS fine, if you like something else that is also fine.  You still have to have a complete data management strategy.  My simple test is this... can you rebuild your systems from a complete failure regardless of what caused it?  You can certainly achieve that WITHOUT ZFS or ECC ram.  Using parallel filesystems on commodity hardware is another tool in the tool box for long term data management.

Also, I don't agree with redundancy not being the same as backups.  It depends on how you do things.  People and companies are poor at managing offline backups at a minimum.  You end up with stacks of media (or files), off-line as you say, stored some place never to be touch and checked for fidelity and viability in a disaster.  It is far more efficient and practical to have your data system online with versioning and then at least have one copy off **premise** but also online.  With bandwidth availability being what it is today, it makes more sense to incrementally ingest block changes to an online system which can be tested for data fidelity at any time than it is to hope that your off-line storage has everything you need and is going to be more out of date.  In a worst case scenario of a ransomeware attack against our Windows friends, being able to quickly restore from a random version in the past may require testing many versions.  I would not want to have to deal with a tape system or discrete file sets for that.

The average joe might not want to spend money on a tape system or remote copy so the next best thing would be to keep multiple copies on premise.  Since parallel systems do re-balance you could keep a set of removable disks from a chuck server with versioned data and metadata in your safe as an offline backup.  I haven't worked that out exactly yet but once I do, that would get me to where my current data protection solution is.  Having dealt with both situations, I much prefer having all my versions online than having to search through discrete data.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
Keith C. Perry, MS E.E. 
Managing Member, DAO Technologies LLC 
(O) +1.215.525.4165 x2033 
(M) +1.215.432.5167 
www.daotechnologies.com

----- Original Message -----
From: "Rich Freeman" <r-plug@thefreemanclan.net>
To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
Sent: Wednesday, August 15, 2018 8:42:01 AM
Subject: Re: [PLUG] Virtualization clusters & shared storage

On Tue, Aug 14, 2018 at 5:59 PM Keith C. Perry
<kperry@daotechnologies.com> wrote:
>
>
> My ZFS was idea was a throw away LOL I don't know enough about that
> system to make a recommendation other than to 1) not use ZFS and 2)
> bring up another node if you want more redundancy.  Honestly, that
> would be my answer in any case unless you're want to archive.

Redundancy inside a cluster isn't the same as a backup.  Redundancy
protects against hardware failure and resulting downtime.  However, it
does not protect against bugs, administration errors, intrusion, and
so on.  Offline backups protect against more failure modes, but at the
cost of downtime during restoration.  There is a place for both, at
least for data of any importance.

>
> ZFS without ECC ram is NOT as safe as ZFS with ECC ram.

Sure, but if there is another layer above it that is doing its own
hashing using ECC RAM, then you're still protected.  I'm not talking
about eliminating ECC on the clients.  I'm talking about eliminating
it on the storage nodes, which are a lower layer of the design.

If a random router corrupts my data as it travels over the internet I
don't care, because TCP/IP contains checksums and when the packet
arrives a retransmission will be requested.  Since the higher layer
contains integrity checks, you don't need to spend a lot more money on
hardware at the lower layer to prevent errors that the higher layer is
already intended to catch.  Now, a bad router is still going to
increase latency as you end up with a ton of retransmission requests,
so sometimes it makes sense to have a reasonable amount of protection
at lower layers, but it isn't essential for the overall integrity of
the complete system.

>
> https://forums.freenas.org/index.php?threads/ecc-vs-non-ecc-ram-and-zfs.15449/
>
> The most important line in that is this:
>
> "All that stuff about ZFS self-healing goes down the drain if the system isn't using ECC RAM"
>
> Its not ZFS that makes you **safer** its ECC RAM.  You can debate the merits of ZFS versus other filesystems but that is a different conversation.
>
> Actually, after reading that link I definitely would not recommend
> ZFS under a parallel file system without ECC.  The potential for
> making matters worse is non-zero and as you point out, now we have the
> parallel file system correcting errors when it shouldn't have to.

What are you going to use instead of ZFS?  Every other filesystem has
the exact same problem when used with non-ECC RAM.

That article you linked has led to a LOT of misconceptions about ZFS.
ZFS is no worse than any other filesystem when used with non-ECC RAM.

ECC RAM protects against bit flips in RAM.

ZFS protects against bit flips in the drive, controller, or bus.

Either one on its own improves data integrity, and both together
improve it further.

Now, if your distributed filesystem calculates hashes OFF OF THE
STORAGE NODE, and verifies them there as well during read-back, then
you're getting the same protection at a higher layer, and you don't
NEED the protection of either ECC RAM or ZFS on the storage nodes.  It
might still be desirable if it reduces the need for recovering data
and that improves latency/etc.

If you are only protecting data with hashes that are
generated/verified on the storage node itself, then ECC RAM on the
storage node will improve the integrity of the operation, because now
there is no higher layer.

Here is another analogy that hopefully ZFS/btrfs fans will appreciate:

One of the big potential advantages of distributed filesystems from a
data integrity standpoint is that they make entire hosts redundant.
With RAID you're protected against the failure of a hard disk.  With
distributed filesystems you're protected against failure of an entire
host (CPU/motherboard/power supply/whatever).  Now let's take this a
step further...

In the earlier generation of RAID there was protection against the
total failure of a drive, but not against silent failure.  If you have
a conventional RAID1 and you yank the power on one drive the system
will keep running without any loss of data.  However, if you instead
modify some data on-disk (direct disk writes, or cosmic rays, or
whatever), or the drive controller goes senile, and the drive presents
data to the RAID controller/software without reporting any errors,
then RAID1 is going to cause you problems, because it has no way to
detect problems on its own.  At best during a scrub conventional RAID1
will tell you that the two copies don't match, and if you're VERY
lucky it might give you a way to try to pick which one is right (have
fun with that).

Since this was undesirable later generation filesystems like ZFS/btrfs
implement strong checksums in software that are capable of telling
which copy of the data is right, and which is wrong.  This protects
against silent corruptions AT THE DRIVE LEVEL.

I'm just suggesting that distributed filesystems ought to extend this
paradigm one step further.  Today many of them are like conventional
RAID1.  If the drives report problems or if hosts fail entirely they
can recover the data.  However, if a host presents bad data without an
error, many of these distributed filesystems contain the same flaw as
earlier conventional RAID.  What should be done is to use the same
approach of generating/verifying checksums at a higher layer to add
data security.  This costs very little in terms of space or
computation (this is all software-defined as it is).

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

Follow-Ups:
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>

References:
- [PLUG] Virtualization clusters & shared storage
  - From: JP Vossen <jp@jpsdomain.org>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: "Keith C. Perry" <kperry@daotechnologies.com>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: "Keith C. Perry" <kperry@daotechnologies.com>
- Re: [PLUG] Virtualization clusters & shared storage
  - From: Rich Freeman <r-plug@thefreemanclan.net>

Prev by Date: Re: [PLUG] Linux tip: Log IP addresses, not hostnames, for use by fail2ban...
Next by Date: Re: [PLUG] Virtualization clusters & shared storage
Previous by thread: Re: [PLUG] Virtualization clusters & shared storage
Next by thread: Re: [PLUG] Virtualization clusters & shared storage
Index(es):
- Date
- Thread