Rich Freeman via plug on 24 Apr 2020 18:42:14 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Gluster best practices?

On Fri, Apr 24, 2020 at 9:21 PM Will via plug
<> wrote:
> I'm glad the list picked up on my slacking. Finally people are listening to Keith about LizardFS after.... 3 years?

All of the distributed filesystems should handle this well, but
recently I had an HBA fail on one of my LizardFS nodes.  I was getting
tons of errors on multiple drives with zfs eventually failing several
pools.  The cluster just marked those chunks as endangered and began
replicating them to nodes that had space.  I ended up just removing
that node and watching my data rebuild, and then when I got a new HBA
I first did some testing just to make sure the drives were reliable
and then put it back into the cluster, and the data then rebalanced.
While I did end up with endangered data, the cluster never was
offline/unresponsive/etc.  Chunks set with 3x replication just became
undergoal dropping to 2x replication.

If I had been using classic RAID and lost the only HBA on a host I'd
have just lost the array, possibly with some data corruption if I
wasn't using zfs.  Granted, if you had multiple HBAs in a host and
carefully paired your drives you could endure something like this with
traditional RAID, especially with zfs.  However, with the distributed
filesystems you have redundancy above the host level, so you can lose
anything on a single host, or an entire host, and the cluster isn't

The only gotcha with lizardfs is that the cluster uses a single active
master server at any time, so that is a point of failure.  You can
have other masters shadowing it so that the data on the master is
replicated, and you can promote any of those to be active.  The next
version of lizardfs will have the high-availability features included
which will automate this.  With my setup I don't really need THAT much
reliability so I'm happy just to know that if my master fails I can
just ssh into my shadows, check the metadata version on each one, and
then promote the server with the newest data and tweak my DNS so that
everything finds it.  I've already moved my master server around by
basically doing exactly this - granted with nothing mounted and the
cluster idle.

And of course with the exception of the master server I can easily do
upgrades and reboots of individual nodes while the cluster is online.
When a node is down the cluster will start to replicate its data a
bit, but that really isn't a big deal and it all gets cleaned up in
the end.

Another gotcha is that the FUSE client seems to hog RAM sometimes.
Not the end of the world, but not great either.  The other thing that
would be nice is if it supported reflinks - you can do them with
mfsmakesnapshot at the command line, which is functionally equivalent,
but you can't do a cp --reflink=auto/always to get a COW snapshot of a

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --