Lee H. Marzke on 9 May 2014 13:01:36 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] iSCSI storage appliance(s)


>I strongly discourage you from using software raid on such a large storage system. 

Software vs Hardware RAID has nothing to do with hot swap or the size of the system

>How do you plan to hot swap drives? 
With ZFS
zpool offline tank ada1p1         ( then remove and replace device ) 
zpool replace tank <failed device>  ada1p1 

Non-ZFS systems need HW RAID controllers because of the RAID5 write hole 
where writing a strip of data is not atomic.  The controller must read the strip 
calculate parity, then write the stripe back.   If you lose power the
whole strip is lost without hardware RAID.  The battery on HW RAID controllers
preserves the bits that were not written until the next boot after the power
failure and then replays the data back into the disk.

With ZFS,  since all writes are atomic operations ( due to copy-on-write)
there is no write hole,  and nothing is damaged if you lose power during
a write.   You either write the transaction, or it is aborted.   This is
because the original strip is not modified,  the entire stripe is written
again to a new location on disk,  and then pointers are updated to make
the new location active.

ZFS is designed to handle peta or 'zeta' bytes of storage and it's name implies
and ZFS is software RAID only.

There is no fsck with ZFS as the file system works like a database.  Transactions
of up to 5 seconds of writes are put into a transaction group,  and that group
is then written as one atomic operation , or aborted.   So the disk is never
inconsistant after a power loss. ( But it may be missing one 5 second transaction
if that was in-process during the power failure )

With ZFS you run a background 'scrub' operation about once a week that checks
all the disk meta data and data checksums and repairs any issues.


Lee Marzke
VMware , Infrastructure consultant
http://marzke.net/lee
http://plone.4aero.com/




----- Original Message ----- 

> From: "Carl Johnson" <cjohnson19791979@gmail.com>
> To: "Philadelphia Linux User's Group Discussion List"
> <plug@lists.phillylinux.org>
> Sent: Friday, May 9, 2014 3:30:08 PM
> Subject: Re: [PLUG] iSCSI storage appliance(s)

> I strongly discourage you from using software raid on such a large
> storage system.

> I've read a fair amount on both schools of thought. My take; in the end it's
> a matter of preference as an admin. What's your thoughts on it?

> How do you plan to hot swap drives?

> Kinda' depends on what I settle on as far as software goes. If I go with
> OMV/Debian for example, I'd be using mdadm probably. Is that what you meant?
> Or did you mean physically? Is that's the case, I'd imagined it working
> something like.....
> 1. Depending on the state of the drive being removed, I'd issue a command or
> two to remove it.
> 2. Lift the release tab on the tray and pull.
> 3. Unscrew the drive from the tray.
> 4. Screw a new one into the tray.
> 5. Re-insert the tray.
> 6. Issue a couple commands to realize/rebuild the drive into the array.

> On Fri, May 9, 2014 at 3:17 PM, Gavin W. Burris < bug@wharton.upenn.edu >
> wrote:

> > Hi, Carl.
> 

> > I don't work for SilMech. I have bought, configured and deployed their
> 
> > hardware in a variety of solutions.
> 

> > I strongly discourage you from using software raid on such a large
> 
> > storage system. How do you plan to hot swap drives?
> 

> > Cheers.
> 

> > On Fri 05/09/14 02:01PM -0400, Carl Johnson wrote:
> 
> > > I've already selected some norco hardware for the smaller sites. chenbro
> 
> > > makes a 50 bay case for the larger sites. i'm trying to stay away from
> 
> > > hardware RAID for flexibility reasons. i will say though..... that looks
> 
> > > like decent stuff. do you work/sell for them?
> 
> > >
> 
> > >
> 
> > > On Fri, May 9, 2014 at 11:42 AM, Gavin W. Burris < bug@wharton.upenn.edu
> > > >wrote:
> 
> > >
> 
> > > > Hi, Carl.
> 
> > > >
> 
> > > > How about a simple, big, rack-mount server with lots of disk bays? Take
> 
> > > > a look at something like this:
> 
> > > > http://www.siliconmechanics.com/i50434/4u-storage-server.php
> 
> > > >
> 
> > > > You can get it configured with CentOS and a MegaRAID controller, which
> 
> > > > has a command line and GUI utility. 36 2TB hot-swap drives. Make three
> 
> > > > virtual disks in sets of 12. You could do RAID 6, or even RAID 6 plus a
> 
> > > > hot spare. Keep some cold spares on site. When you get an email
> 
> > > > warning of a failed disk, swap it out. Done. Put XFS on it and you
> 
> > > > have a 50TB NFS server.
> 
> > > >
> 
> > > > Cheers.
> 
> > > >
> 
> > > >
> 
> > > > On Fri 05/09/14 11:25AM -0400, Carl Johnson wrote:
> 
> > > > > What kind of hardware do you plan to use?
> 
> > > > > C.O.T.S x86 server grade stuff mostly. Need more specifics?
> 
> > > > >
> 
> > > > > Roughly how much storage do you plan to manage? We're going to start
> > > > > with
> 
> > > > > about 20TB. It's tough to plan how much we'll need though, as this is
> > > > > for
> 
> > > > > CCTV DVR's with motion detection. Hence the scalablity requirement.
> > > > > I'm
> 
> > > > > told that the pesky insurance company says their system needs to be
> 
> > > > capable
> 
> > > > > of retaining three years of recordings. So it's tough to gauge how
> > > > > much
> 
> > > > > we'll really need to do this again, because of different camera frame
> 
> > > > > rates/resolutions/how much motion etc. you get the idea.....
> 
> > > > >
> 
> > > > > Is ISCSI the only thing you'd like to do? I had thought about using
> > > > > NFS,
> 
> > > > > but I ended up using iSCSI because I thought it'd fit better. I was
> 
> > > > trying
> 
> > > > > to avoid layers of abstraction/complexity. Am I wrong?
> 
> > > > >
> 
> > > > > My main reason for not going the ZFS route is what you confirmed.
> > > > > Easy
> 
> > > > > scalability and RAM both of which potentially change the hardware
> > > > > scope
> 
> > > > the
> 
> > > > > most and, therefore, the cost.
> 
> > > > >
> 
> > > > > If a web UI is a lower priority for you, it sounds like this system
> > > > > will
> 
> > > > > be run by a reasonably technically proficient person.
> 
> > > > > SystemS, probably 15ish in total all said and done. But yeah....Hi,
> > > > > I'm
> 
> > > > > Carl, nice to meet you. ;-). There may be occasion where I'll need to
> 
> > > > talk
> 
> > > > > someone else thru say, a disk replacement, via phone or something so
> 
> > > > hence
> 
> > > > > the webUI need.
> 
> > > > >
> 
> > > > > As it is right now, I've got two boxes in a test system. One, the
> > > > > storage
> 
> > > > > box, is running CentOS. The other, the DVR itself, is an Ubuntu 12LTS
> 
> > > > box.
> 
> > > > > I may try a wash/rinse/repeat on the storage box with OMV though and
> > > > > see
> 
> > > > if
> 
> > > > > I like it or not. Though honestly, after reading thru your response,
> > > > > I'm
> 
> > > > > probably going to go either the OMV/debian route or ditch the
> > > > > appliance
> 
> > > > > overlay completely and use Centos/SoftRAID/Btrfs.
> 
> > > > >
> 
> > > > > Can you elaborate on the webmin idea? Specifically, what *.wbm's do I
> 
> > > > need
> 
> > > > > to do all that I'm asking? That may be something else I'll have to
> > > > > test
> 
> > > > > drive too.
> 
> > > > >
> 
> > > > >
> 
> > > > >
> 
> > > > >
> 
> > > > > On Fri, May 9, 2014 at 5:05 AM, PaulNM < plug@paulscrap.com > wrote:
> 
> > > > >
> 
> > > > > > On 05/08/2014 01:32 PM, Carl Johnson wrote:
> 
> > > > > > > Who's familiar with any of the NAS distros out there?
> 
> > > > > > > FreeNAS/NAS4Free/NAPP-it/Openfiler/Openmediavault.....etc.?
> 
> > > > > > >
> 
> > > > > >
> 
> > > > > > What kind of hardware do you plan to use? Roughly how much storage
> > > > > > do
> 
> > > > > > you plan to manage? Is ISCSI the only thing you'd like to do?
> 
> > > > > >
> 
> > > > > > I have more personal experience with FreeNAS/NAS4Free than the
> > > > > > others
> 
> > > > > > (except for the Webmin approach I'll mention later). Actually, to
> > > > > > be
> 
> > > > > > precise, I've never used NAS4Free. It's a continuation of older
> 
> > > > versions
> 
> > > > > > of FreeNAS that I have used, though.
> 
> > > > > >
> 
> > > > > > Openfiler appears to be a dead project. Their last release is ~3
> > > > > > years
> 
> > > > > > old and there doesn't appear to be any real work going on.
> 
> > > > > >
> 
> > > > > > Never heard of or used NAPP-it, so can't really comment on it. It
> 
> > > > > > appears to be opensolaris/openindiana based? The site isn't very
> > > > > > clear.
> 
> > > > > >
> 
> > > > > > Never heard of OpenMediaVault (OMV) either, though it looks
> > > > > > *really*
> 
> > > > > > interesting as it's based on Debian. Not thrilled that they're
> > > > > > still
> 
> > > > > > using Squeeze as a base so close to when security support is
> > > > > > ending.
> 
> > > > > > Yes, I know Squeeze now has long term support, but that's a *very*
> 
> > > > > > recent change. Apparently there is a procedure to install OMV on
> 
> > > > > > Wheezy, though.
> 
> > > > > >
> 
> > > > > > I've done a project where we used a Debian install with Webmin.
> > > > > > This
> 
> > > > > > approach is nice in that there's more flexibility to add other
> > > > > > services
> 
> > > > > > down the road. Also, I have a great deal of experience managing
> > > > > > Debian
> 
> > > > > > machines, so it's more comfortable for me. Webmin makes it easier
> > > > > > for
> 
> > > > > > the less technical people to check up on things and handle simpler
> 
> > > > > > tasks. (I'll call this the WebMin approach.)
> 
> > > > > >
> 
> > > > > >
> 
> > > > > > > What I'd like to have :
> 
> > > > > > > 1. Flexibility of adding to the total unit capacity with drives
> > > > > > > of
> 
> > > > > > > different capacities.
> 
> > > > > >
> 
> > > > > > FreeNAS can handle this fine, it prefers using ZFS pools. (If
> > > > > > you're
> 
> > > > > > familiar with LVM, ZFS is somewhat similar in concept but with more
> 
> > > > > > features.) NAS4Free and NAPP-it should be the same for the same
> 
> > > > reasons.
> 
> > > > > >
> 
> > > > > > OMV and the Webmin approach I mentioned are both linux based. You
> > > > > > can
> 
> > > > > > easily used LVM, RAID, or some combination of both.
> 
> > > > > >
> 
> > > > > > That said:
> 
> > > > > >
> 
> > > > > > You are aware that RAID/RAIDZ implementations are limited by the
> 
> > > > > > smallest member of their array/volume, right? No matter what
> > > > > > solution
> 
> > > > > > you end up using, you'll hit this limitation. There is unRAID, but
> 
> > > > > > that's not so good redundancy-wise. ISCSI would be problematic with
> 
> > > > > > unRAID, and you have to pay if you use more than 3 drives.
> 
> > > > > >
> 
> > > > > >
> 
> > > > > > > 2. Fault tolerance of at least one drive failure; two preferred.
> 
> > > > > >
> 
> > > > > > Here's where it gets tricky. ZFS does support setting up a mirror
> > > > > > as
> 
> > > > > > well as a few software raid implementations (RAIDZ1/RAIDZ2/RAIDZ3).
> 
> > > > > >
> 
> > > > > > What it doesn't support is adding drives to an existing RAIDZ set.
> > > > > > Not
> 
> > > > a
> 
> > > > > > problem if you're starting with all the drives you plan to use, but
> > > > > > if
> 
> > > > > > you ever want to add more drives to the RAIDZ:
> 
> > > > > > You'll need to backup the data,
> 
> > > > > > destroy the old RAIDZ,
> 
> > > > > > create a new RAIDZ consisting of the drives from the old one and
> > > > > > any
> 
> > > > new
> 
> > > > > > drives,
> 
> > > > > > restore the backup.
> 
> > > > > >
> 
> > > > > > The other option is to add drives in pairs/triplets and make them
> 
> > > > > > separate RAIDZ volumes.
> 
> > > > > >
> 
> > > > > > MDADM (Linux RAID) can very easily add drives to existing arrays.
> 
> > > > > > You'll have to expand any LVM volume and filesystem on it
> > > > > > afterwards.
> 
> > > > > >
> 
> > > > > > > 3. Presenting the storage via an iSCSI target.
> 
> > > > > >
> 
> > > > > > Trivial in FreeNAS/NAS4Free. NAPP-it can apparently do this as
> > > > > > well.
> 
> > > > OMV
> 
> > > > > > has a plugin for this, as does Webmin.
> 
> > > > > >
> 
> > > > > > > 4. Adding and/or replacing disks without taking the ISCSI target
> 
> > > > offline.
> 
> > > > > >
> 
> > > > > > If the target is a RAIDZ or RAID volume, then yes.
> 
> > > > > >
> 
> > > > > > > 5. Admin/management via a web UI (not nearly as important as the
> 
> > > > other
> 
> > > > > > > four, if I have to use the CLI, so be it.)
> 
> > > > > >
> 
> > > > > > All of the examples at the top are geared towards web UI, though
> > > > > > many
> 
> > > > > > also let you use a terminal or ssh in.
> 
> > > > > >
> 
> > > > > > >
> 
> > > > > > > Pros/Cons/Suggestions/Thoughts/Tar/Feathers?
> 
> > > > > > >
> 
> > > > > >
> 
> > > > > > The problem with ZFS is that it has many great features, but not
> > > > > > all
> 
> > > > > > apply at once. I was looking into it for a major project and got
> > > > > > really
> 
> > > > > > excited reading about all the great support it has for adding
> > > > > > drives
> 
> > > > > > expanding pools, snapshots, and RAIDZ. It wasn't until I got into
> > > > > > the
> 
> > > > > > details via a test VM that I found out about RAIDZ volumes not
> > > > > > being
> 
> > > > > > expandable.
> 
> > > > > >
> 
> > > > > > You also need to make sure that whatever OS you use has a version
> > > > > > of
> 
> > > > ZFS
> 
> > > > > > that supports the feature(s) you want to use. I wouldn't mess with
> > > > > > ZFS
> 
> > > > > > on linux at all.
> 
> > > > > >
> 
> > > > > > Also, ZFS isn't really recommended for 32-bit systems. You can do
> > > > > > it,
> 
> > > > > > but I really don't advise it if you'll be dealing with large
> > > > > > amounts
> > > > > > of
> 
> > > > > > storage. Especially if combined with low amounts of RAM.
> 
> > > > > >
> 
> > > > > > On the other hand, LVM and Linux RAID are very mature approaches
> > > > > > with
> 
> > > > > > easy to use tools.
> 
> > > > > >
> 
> > > > > > If a web UI is a lower priority for you, it sounds like this system
> 
> > > > will
> 
> > > > > > be run by a reasonably technically proficient person. The older I
> > > > > > get,
> 
> > > > > > and the more projects I get under my belt, the less I like the
> 
> > > > > > all-in-one or "appliance" approaches.
> 
> > > > > >
> 
> > > > > > If you just do a standard install of a distro, you'll get
> > > > > > continuous
> 
> > > > > > security updates and a great deal of flexibility. The downside is
> > > > > > it
> 
> > > > > > takes a little more know-how to get things setup. The really nice
> > > > > > thing
> 
> > > > > > about Webmin vs some of the other admin interfaces like
> 
> > > > cpanel/plesk/etc
> 
> > > > > > is that Webmin doesn't really mess with the installed system or
> > > > > > make
> 
> > > > > > specialized customizations to it. It's really just a GUI that edits
> 
> > > > the
> 
> > > > > > config files for you, while still giving you the option to edit
> > > > > > them
> 
> > > > > > yourself. I'm curious where OMV falls on this spectrum.
> 
> > > > > >
> 
> > > > > >
> 
> > > > > > - PaulNM
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > > >
> 
> > > > ___________________________________________________________________________
> 
> > > > > > Philadelphia Linux Users Group --
> 
> > > > > > http://www.phillylinux.org
> 
> > > > > > Announcements -
> 
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> 
> > > > > > General Discussion --
> 
> > > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> 
> > > > > >
> 
> > > >
> 
> > > > >
> 
> > > > ___________________________________________________________________________
> 
> > > > > Philadelphia Linux Users Group --
> 
> > > > http://www.phillylinux.org
> 
> > > > > Announcements -
> 
> > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> 
> > > > > General Discussion --
> 
> > > > http://lists.phillylinux.org/mailman/listinfo/plug
> 
> > > >
> 
> > > >
> 
> > > > --
> 
> > > > Gavin W. Burris
> 
> > > > Senior Project Leader for Research Computing
> 
> > > > The Wharton School
> 
> > > > University of Pennsylvania
> 
> > > > ___________________________________________________________________________
> 
> > > > Philadelphia Linux Users Group --
> 
> > > > http://www.phillylinux.org
> 
> > > > Announcements -
> 
> > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> 
> > > > General Discussion --
> 
> > > > http://lists.phillylinux.org/mailman/listinfo/plug
> 
> > > >
> 

> > > ___________________________________________________________________________
> 
> > > Philadelphia Linux Users Group -- http://www.phillylinux.org
> 
> > > Announcements -
> > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> 
> > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
> 

> > --
> 
> > Gavin W. Burris
> 
> > Senior Project Leader for Research Computing
> 
> > The Wharton School
> 
> > University of Pennsylvania
> 
> > ___________________________________________________________________________
> 
> > Philadelphia Linux Users Group -- http://www.phillylinux.org
> 
> > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> 
> > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
> 

> ___________________________________________________________________________
> Philadelphia Linux Users Group -- http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug

-- 

-- 
"Between subtle shading and the absence of light lies the nuance of iqlusion..." - Kryptos 

Lee Marzke, lee@marzke.net http://marzke.net/lee/ 
IT Consultant, VMware, VCenter, SAN storage, infrastructure, SW CM 


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug