Lee H. Marzke on 9 May 2014 08:25:23 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] iSCSI storage appliance(s)

The hardware and use of this system is a major factor.   ZFS isn't really good at
storage for one server,  but providing a pool of storage for 10 or more of your

The different ZFS terminology used tends to confuse people,  and when some of the
NAS vendors use incorrect terms it makes it worse.

For ZFS the top level storage is a pool,  and you can have several pools available.
Typically the OS is in 'syspool' and the data pool is often called 'tank' or 'zones'.

Each pool is composed of multiple vDEV's ( which are redundant disk groups )
A vDEV can be a mirror,  or a RAIDZ, RAIDZ2 , RAIDZ3 group.   Mirrors are typically 2
disks but can be 3.  RAIDZ is recommended for 3, 5 or 7 disks,  RAIDZ2 for 4, 6 or 8
and RAIDZ3 for 5, 7 or 9 disks. ( These are recommendations only, not enforced )

So you then join vDEV's to a pool and data is striped across the vDEV's  .  By itself
neither a vDEV or a pool is similar to traditional RAID-X, just the combination.  What
makes it even worse is that NFS shared 'volume' in ZFS is called a 'folder' or 'data set'
because it is just an allocation from the pool with a quota.   This is really neat because
you can change the size of your NFS shares on the fly up or down by just changing the
quota.   Typically it's recommended that all disks in the pool are the same size, speed,
etc.  ( This isn't enforced,  but the performance may degrade when you stop following these
recommendations )

Write performance in ZFS scales with the number of vDEV's ( not the number of disks ).
So creating on vDEV with 8 disks in RAIDZ2  is going to have the same pool write performance
as one disk.  Generally you are better of with creating a lot of 2 disk mirrors, or
a bunch of 3-disk RAIDZ1 vDEV's.  ( For example instead of one 9 disk RAIDZ3, create 3x
3-disk RAIDZ1 vDEV's ,  which has 3X the write performance ) 
If a disk in a vDEV fails,  you can remove and replace it.   

Some of the statements about disk adds are very misleading.   You add disks to a pool by
adding additional vDEV's close to the type of the existing vDEV's

So if you have a  pool with 2-disk mirrors, you can add additional 2-disk mirrors
vDEV's to a running pool.   If you have 3-disk RAIDZ vDEVS,  you now can add disks
3 at a time in new vDEV's to the pool.  ( So you have to plan ahead on storage growth , you can't
just add a single disk )   One thing to be careful is that you can't remove vDEVs' from a running
pool without destroying the pool.  ( You can replace failed disks inside a vDEV,  but
not remove the vDEV from the pool )  You can however remove spare disks,  and log and
cache disks from a pool.

ZFS was written for Solaris,  and while other OS's are working already they are not as stable
as the native Solaris OS.   To use OpenSolaris or the Illumos fork you can run the
basic OS (OpenIndiana),  or run SmartOS.    There is also NexentaStor (commercial) but
they have a community version that is free for 18TB of raw disk.  Ive used NexentaStor
commercial at a few clients and it works well.  I've also run NexentaStor community for
about a year as a VM and it was better then my physical NAS units.

SmartOS is a free cloud hypervisor OS from Joyant based on Illumos.  It runs from a flash
drive much like ESXi and provides NFS storage out of the Solaris global zone.   It also supports
Solaris Zones and KVM VM's but I'm not using those.  ( Joyant ported KVM from Linux to Illumos
just recently )

On BSD there is  FreeNAS and the NAS4Free fork.  
This has a good comparison of NAS4Free and FreeNAS.   Some reports of NAS4Free not being stable.

As far as performance goes, a lot of people run dd and compare disk read/write rates, however
this is sequential I/O.  Disks tend to be 100x worse at random 4k I/O writes so these numbers
tell you nothing about how a bunch of physical servers or VMware loads will perform. In
addition some loads such as VMware NFS use synchronous writes (which in POSIX means that
the write must be committed to permanent storage before it's acknowledged )  This means
that NFS performance can be very slow unless you use an SSD for the ZIL ( called a SLOG)

ZFS has unlimited snapshots,  and snapshot replication.  So even binary volumes ( iSCSI)
volumes can be replicate efficiently with ZFS to another ZFS unit.

In my case I require storage for running my VMware lab for training,  customer demos, etc. It
has about a dozen VM's including Active Directory,  Linux web and mail servers.  I also
run my pfSense Firewall as a VM,  and VDP backup as a VM.

That is a lot of random I/O hitting the storage array,  and much of it is synchronous NFS
writes from VMware.

What I'm currently using for my VMware cluster storage NAS is a Dell 2950 with 6 x 2TB disks,  in
one 6 disk RAIDZ vDEV, and two SSD's  , one SSD for the L2ARC cache, and one for the ZFS separate
intent log (SLOG)

My ZFS project is described here on my web page:

Lee Marzke
VMware , Infrastructure consultant


----- Original Message -----
> From: "PaulNM" <plug@paulscrap.com>
> To: plug@lists.phillylinux.org
> Sent: Friday, May 9, 2014 5:05:58 AM
> Subject: Re: [PLUG] iSCSI storage appliance(s)
> On 05/08/2014 01:32 PM, Carl Johnson wrote:
> > Who's familiar with any of the NAS distros out there?
> > FreeNAS/NAS4Free/NAPP-it/Openfiler/Openmediavault.....etc.?
> > 
> What kind of hardware do you plan to use? Roughly how much storage do
> you plan to manage? Is ISCSI the only thing you'd like to do?
> I have more personal experience with FreeNAS/NAS4Free than the others
> (except for the Webmin approach I'll mention later).  Actually, to be
> precise, I've never used NAS4Free. It's a continuation of older versions
> of FreeNAS that I have used, though.
> Openfiler appears to be a dead project.  Their last release is ~3 years
> old and there doesn't appear to be any real work going on.
> Never heard of or used NAPP-it, so can't really comment on it. It
> appears to be opensolaris/openindiana based? The site isn't very clear.
> Never heard of OpenMediaVault (OMV) either, though it looks *really*
> interesting as it's based on Debian.  Not thrilled that they're still
> using Squeeze as a base so close to when security support is ending.
> Yes, I know Squeeze now has long term support, but that's a *very*
> recent change.  Apparently there is a procedure to install OMV on
> Wheezy, though.
> I've done a project where we used a Debian install with Webmin. This
> approach is nice in that there's more flexibility to add other services
> down the road. Also, I have a great deal of experience managing Debian
> machines, so it's more comfortable for me. Webmin makes it easier for
> the less technical people to check up on things and handle simpler
> tasks. (I'll call this the WebMin approach.)
> > What I'd like to have :
> > 1. Flexibility of adding to the total unit capacity with drives of
> > different capacities.
> FreeNAS can handle this fine, it prefers using ZFS pools. (If you're
> familiar with LVM, ZFS is somewhat similar in concept but with more
> features.) NAS4Free and NAPP-it should be the same for the same reasons.
> OMV and the Webmin approach I mentioned are both linux based.  You can
> easily used LVM, RAID, or some combination of both.
> That said:
> You are aware that RAID/RAIDZ implementations are limited by the
> smallest member of their array/volume, right? No matter what solution
> you end up using, you'll hit this limitation.  There is unRAID, but
> that's not so good redundancy-wise.  ISCSI would be problematic with
> unRAID, and you have to pay if you use more than 3 drives.
> > 2. Fault tolerance of at least one drive failure; two preferred.
> Here's where it gets tricky. ZFS does support setting up a mirror as
> well as a few software raid implementations (RAIDZ1/RAIDZ2/RAIDZ3).
> What it doesn't support is adding drives to an existing RAIDZ set. Not a
> problem if you're starting with all the drives you plan to use, but if
> you ever want to add more drives to the RAIDZ:
> You'll need to backup the data,
> destroy the old RAIDZ,
> create a new RAIDZ consisting of the drives from the old one and any new
> drives,
> restore the backup.
> The other option is to add drives in pairs/triplets and make them
> separate RAIDZ volumes.
> MDADM (Linux RAID) can very easily add drives to existing arrays.
> You'll have to expand any LVM volume and filesystem on it afterwards.
> > 3. Presenting the storage via an iSCSI target.
> Trivial in FreeNAS/NAS4Free. NAPP-it can apparently do this as well. OMV
> has a plugin for this, as does Webmin.
> > 4. Adding and/or replacing disks without taking the ISCSI target offline.
> If the target is a RAIDZ or RAID volume, then yes.
> > 5. Admin/management via a web UI (not nearly as important as the other
> > four, if I have to use the CLI, so be it.)
> All of the examples at the top are geared towards web UI, though many
> also let you use a terminal or ssh in.
> > 
> > Pros/Cons/Suggestions/Thoughts/Tar/Feathers?
> > 
> The problem with ZFS is that it has many great features, but not all
> apply at once. I was looking into it for a major project and got really
> excited reading about all the great support it has for adding drives
> expanding pools, snapshots, and RAIDZ.  It wasn't until I got into the
> details via a test VM that I found out about RAIDZ volumes not being
> expandable.
> You also need to make sure that whatever OS you use has a version of ZFS
> that supports the feature(s) you want to use. I wouldn't mess with ZFS
> on linux at all.
> Also, ZFS isn't really recommended for 32-bit systems.  You can do it,
> but I really don't advise it if you'll be dealing with large amounts of
> storage.  Especially if combined with low amounts of RAM.
> On the other hand, LVM and Linux RAID are very mature approaches with
> easy to use tools.
> If a web UI is a lower priority for you, it sounds like this system will
> be run by a reasonably technically proficient person. The older I get,
> and the more projects I get under my belt, the less I like the
> all-in-one or "appliance" approaches.
> If you just do a standard install of a distro, you'll get continuous
> security updates and a great deal of flexibility.  The downside is it
> takes a little more know-how to get things setup. The really nice thing
> about Webmin vs some of the other admin interfaces like cpanel/plesk/etc
> is that Webmin doesn't really mess with the installed system or make
> specialized customizations to it.  It's really just a GUI that edits the
> config files for you, while still giving you the option to edit them
> yourself. I'm curious where OMV falls on this spectrum.
> - PaulNM
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

"Between subtle shading and the absence of light lies the nuance of iqlusion..." - Kryptos 

Lee Marzke, lee@marzke.net http://marzke.net/lee/ 
IT Consultant, VMware, VCenter, SAN storage, infrastructure, SW CM 
+1 800-393-5217 office +1 484-348-2230 fax 
+1 610-564-4932 cell sip://8003935217@4aero.com VOIP 

Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug