Rich Freeman on 12 Aug 2018 03:54:48 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Virtualization clusters & shared storage


On Sat, Aug 11, 2018 at 6:22 PM Keith C. Perry
<kperry@daotechnologies.com> wrote:
>
> JP, not to give you more reading material but along the same lines...  https://docs.lizardfs.com/cookbook/hypervisors.html#using-lizardfs-as-shared-storage-for-proxmoxve
>

Keith - have you seen any documentation that compares LizardFS to some
of the newer options like CephFS?  I'm finding it difficult to
actually find comparisons of the various distributed options, and
every time somebody tosses one out I feel like I will have to end up
doing a deep dive and basically do my own comparison.

Things that I'm interested in are things like:

* Does the implementation protect against memory corruption on storage
nodes that do not use ECC?   (Note, I said storage nodes, NOT client
nodes.)
* Does the implementation protect against on-disk corruption for data
at rest?  (I'm lumping into "implementation" any disk-layer solutions
being used, like ZFS/Bluestore/whatever, as many distributed systems
separate these.)
* Does the implementation support EC/striping/etc so that physical
disk requirements aren't multiples of the usable capacity?
* What is the recommended RAM:storage ratio like?
* Complexity/etc.
* Options for backup.
* How well does it scale down?  I'm looking at alternatives to
largeish ZFS/NAS home-lab implementations, not to run Google.  Do I
need 12 physical hosts to serve 10TB of data with redundancy to
survive the loss of 1 physical host?
* How efficiently can it scrub/etc for bad on-disk storage?  (One
thing that concerns me about separate storage layers is whether there
is an automated way to actually fix issues the storage layer detects,
since they can't fix themselves without redundancy at that layer.)

I listed the first one first for a reason because this info is
frustratingly difficult to find.  Many of the options let you store
data on ZFS or something similar, but they are vague on whether they
actually guarantee that they'll detect corruption while the data is
being handled in RAM on the storage nodes.  I want cheap/disposable
storage nodes, so I'd prefer a system that assumes they're unreliable.
If a known-good hash is computed on the client, and preserved
end-to-end, or at least until AFTER another hash is computed and
checked when the disk layer takes over, then it should be safe.
However, if the data is sent over the network and the hash is
forgotten after network transmission is checked, and the data sits in
RAM unprotected until the disk layer takes over (which might just be 3
lines of code), then that is a vulnerability.

Right now CephFS seems to be the most attractive option for me, with
the caveat that the "FS" part of it is newish, and I'm not sure where
they are with the failure tolerance on the MDS layer.  Ceph for block
storage (and probably also for serving volumes for VMs) sounds like it
is much more mature.

So far you're the only one I've heard advocate LizardFS, which sounds
similar to CephFS, so I'm curious about how they compare.  CephFS also
separates the disk storage layer, though they now offer their own
which is optimized more for Ceph (they wanted the on-disk
checksumming/etc, but didn't want to implement all the other POSIX/etc
stuff for what is just a block storage back end, which I think makes
sense).  I'm pretty sure you could dump it on ext4, but I'm sure it is
much more common to put it on something like ZFS for the checksums.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug