Gavin W. Burris on 9 May 2014 13:17:29 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] iSCSI storage appliance(s)


Hi, Lee.

I was implying that one should confirm that hot swapping drives under
Linux is a good idea with a particular hardware config sans raid
controller.  

I have considered ZFS for PB sized installations.  I was not convinced
that the state of ZFS on Linux was worth my reputation.  Who was it that
said the interface between ZFS and Linux was like nailing jello to a
tree?  I did seriously considering Nexenta, though, which is a x86
Solaris.  ZFS on Solaris does make sense to me.

Cheers.

On Fri 05/09/14 09:01PM +0100, Lee H. Marzke wrote:
> >I strongly discourage you from using software raid on such a large storage system. 
> 
> Software vs Hardware RAID has nothing to do with hot swap or the size of the system
> 
> >How do you plan to hot swap drives? 
> With ZFS
> zpool offline tank ada1p1         ( then remove and replace device ) 
> zpool replace tank <failed device>  ada1p1 
> 
> Non-ZFS systems need HW RAID controllers because of the RAID5 write hole 
> where writing a strip of data is not atomic.  The controller must read the strip 
> calculate parity, then write the stripe back.   If you lose power the
> whole strip is lost without hardware RAID.  The battery on HW RAID controllers
> preserves the bits that were not written until the next boot after the power
> failure and then replays the data back into the disk.
> 
> With ZFS,  since all writes are atomic operations ( due to copy-on-write)
> there is no write hole,  and nothing is damaged if you lose power during
> a write.   You either write the transaction, or it is aborted.   This is
> because the original strip is not modified,  the entire stripe is written
> again to a new location on disk,  and then pointers are updated to make
> the new location active.
> 
> ZFS is designed to handle peta or 'zeta' bytes of storage and it's name implies
> and ZFS is software RAID only.
> 
> There is no fsck with ZFS as the file system works like a database.  Transactions
> of up to 5 seconds of writes are put into a transaction group,  and that group
> is then written as one atomic operation , or aborted.   So the disk is never
> inconsistant after a power loss. ( But it may be missing one 5 second transaction
> if that was in-process during the power failure )
> 
> With ZFS you run a background 'scrub' operation about once a week that checks
> all the disk meta data and data checksums and repairs any issues.
> 
> 
> Lee Marzke
> VMware , Infrastructure consultant
> http://marzke.net/lee
> http://plone.4aero.com/
> 
> 
> 
> 
> ----- Original Message ----- 
> 
> > From: "Carl Johnson" <cjohnson19791979@gmail.com>
> > To: "Philadelphia Linux User's Group Discussion List"
> > <plug@lists.phillylinux.org>
> > Sent: Friday, May 9, 2014 3:30:08 PM
> > Subject: Re: [PLUG] iSCSI storage appliance(s)
> 
> > I strongly discourage you from using software raid on such a large
> > storage system.
> 
> > I've read a fair amount on both schools of thought. My take; in the end it's
> > a matter of preference as an admin. What's your thoughts on it?
> 
> > How do you plan to hot swap drives?
> 
> > Kinda' depends on what I settle on as far as software goes. If I go with
> > OMV/Debian for example, I'd be using mdadm probably. Is that what you meant?
> > Or did you mean physically? Is that's the case, I'd imagined it working
> > something like.....
> > 1. Depending on the state of the drive being removed, I'd issue a command or
> > two to remove it.
> > 2. Lift the release tab on the tray and pull.
> > 3. Unscrew the drive from the tray.
> > 4. Screw a new one into the tray.
> > 5. Re-insert the tray.
> > 6. Issue a couple commands to realize/rebuild the drive into the array.
> 
> > On Fri, May 9, 2014 at 3:17 PM, Gavin W. Burris < bug@wharton.upenn.edu >
> > wrote:
> 
> > > Hi, Carl.
> > 
> 
> > > I don't work for SilMech. I have bought, configured and deployed their
> > 
> > > hardware in a variety of solutions.
> > 
> 
> > > I strongly discourage you from using software raid on such a large
> > 
> > > storage system. How do you plan to hot swap drives?
> > 
> 
> > > Cheers.
> > 
> 
> > > On Fri 05/09/14 02:01PM -0400, Carl Johnson wrote:
> > 
> > > > I've already selected some norco hardware for the smaller sites. chenbro
> > 
> > > > makes a 50 bay case for the larger sites. i'm trying to stay away from
> > 
> > > > hardware RAID for flexibility reasons. i will say though..... that looks
> > 
> > > > like decent stuff. do you work/sell for them?
> > 
> > > >
> > 
> > > >
> > 
> > > > On Fri, May 9, 2014 at 11:42 AM, Gavin W. Burris < bug@wharton.upenn.edu
> > > > >wrote:
> > 
> > > >
> > 
> > > > > Hi, Carl.
> > 
> > > > >
> > 
> > > > > How about a simple, big, rack-mount server with lots of disk bays? Take
> > 
> > > > > a look at something like this:
> > 
> > > > > http://www.siliconmechanics.com/i50434/4u-storage-server.php
> > 
> > > > >
> > 
> > > > > You can get it configured with CentOS and a MegaRAID controller, which
> > 
> > > > > has a command line and GUI utility. 36 2TB hot-swap drives. Make three
> > 
> > > > > virtual disks in sets of 12. You could do RAID 6, or even RAID 6 plus a
> > 
> > > > > hot spare. Keep some cold spares on site. When you get an email
> > 
> > > > > warning of a failed disk, swap it out. Done. Put XFS on it and you
> > 
> > > > > have a 50TB NFS server.
> > 
> > > > >
> > 
> > > > > Cheers.
> > 
> > > > >
> > 
> > > > >
> > 
> > > > > On Fri 05/09/14 11:25AM -0400, Carl Johnson wrote:
> > 
> > > > > > What kind of hardware do you plan to use?
> > 
> > > > > > C.O.T.S x86 server grade stuff mostly. Need more specifics?
> > 
> > > > > >
> > 
> > > > > > Roughly how much storage do you plan to manage? We're going to start
> > > > > > with
> > 
> > > > > > about 20TB. It's tough to plan how much we'll need though, as this is
> > > > > > for
> > 
> > > > > > CCTV DVR's with motion detection. Hence the scalablity requirement.
> > > > > > I'm
> > 
> > > > > > told that the pesky insurance company says their system needs to be
> > 
> > > > > capable
> > 
> > > > > > of retaining three years of recordings. So it's tough to gauge how
> > > > > > much
> > 
> > > > > > we'll really need to do this again, because of different camera frame
> > 
> > > > > > rates/resolutions/how much motion etc. you get the idea.....
> > 
> > > > > >
> > 
> > > > > > Is ISCSI the only thing you'd like to do? I had thought about using
> > > > > > NFS,
> > 
> > > > > > but I ended up using iSCSI because I thought it'd fit better. I was
> > 
> > > > > trying
> > 
> > > > > > to avoid layers of abstraction/complexity. Am I wrong?
> > 
> > > > > >
> > 
> > > > > > My main reason for not going the ZFS route is what you confirmed.
> > > > > > Easy
> > 
> > > > > > scalability and RAM both of which potentially change the hardware
> > > > > > scope
> > 
> > > > > the
> > 
> > > > > > most and, therefore, the cost.
> > 
> > > > > >
> > 
> > > > > > If a web UI is a lower priority for you, it sounds like this system
> > > > > > will
> > 
> > > > > > be run by a reasonably technically proficient person.
> > 
> > > > > > SystemS, probably 15ish in total all said and done. But yeah....Hi,
> > > > > > I'm
> > 
> > > > > > Carl, nice to meet you. ;-). There may be occasion where I'll need to
> > 
> > > > > talk
> > 
> > > > > > someone else thru say, a disk replacement, via phone or something so
> > 
> > > > > hence
> > 
> > > > > > the webUI need.
> > 
> > > > > >
> > 
> > > > > > As it is right now, I've got two boxes in a test system. One, the
> > > > > > storage
> > 
> > > > > > box, is running CentOS. The other, the DVR itself, is an Ubuntu 12LTS
> > 
> > > > > box.
> > 
> > > > > > I may try a wash/rinse/repeat on the storage box with OMV though and
> > > > > > see
> > 
> > > > > if
> > 
> > > > > > I like it or not. Though honestly, after reading thru your response,
> > > > > > I'm
> > 
> > > > > > probably going to go either the OMV/debian route or ditch the
> > > > > > appliance
> > 
> > > > > > overlay completely and use Centos/SoftRAID/Btrfs.
> > 
> > > > > >
> > 
> > > > > > Can you elaborate on the webmin idea? Specifically, what *.wbm's do I
> > 
> > > > > need
> > 
> > > > > > to do all that I'm asking? That may be something else I'll have to
> > > > > > test
> > 
> > > > > > drive too.
> > 
> > > > > >
> > 
> > > > > >
> > 
> > > > > >
> > 
> > > > > >
> > 
> > > > > > On Fri, May 9, 2014 at 5:05 AM, PaulNM < plug@paulscrap.com > wrote:
> > 
> > > > > >
> > 
> > > > > > > On 05/08/2014 01:32 PM, Carl Johnson wrote:
> > 
> > > > > > > > Who's familiar with any of the NAS distros out there?
> > 
> > > > > > > > FreeNAS/NAS4Free/NAPP-it/Openfiler/Openmediavault.....etc.?
> > 
> > > > > > > >
> > 
> > > > > > >
> > 
> > > > > > > What kind of hardware do you plan to use? Roughly how much storage
> > > > > > > do
> > 
> > > > > > > you plan to manage? Is ISCSI the only thing you'd like to do?
> > 
> > > > > > >
> > 
> > > > > > > I have more personal experience with FreeNAS/NAS4Free than the
> > > > > > > others
> > 
> > > > > > > (except for the Webmin approach I'll mention later). Actually, to
> > > > > > > be
> > 
> > > > > > > precise, I've never used NAS4Free. It's a continuation of older
> > 
> > > > > versions
> > 
> > > > > > > of FreeNAS that I have used, though.
> > 
> > > > > > >
> > 
> > > > > > > Openfiler appears to be a dead project. Their last release is ~3
> > > > > > > years
> > 
> > > > > > > old and there doesn't appear to be any real work going on.
> > 
> > > > > > >
> > 
> > > > > > > Never heard of or used NAPP-it, so can't really comment on it. It
> > 
> > > > > > > appears to be opensolaris/openindiana based? The site isn't very
> > > > > > > clear.
> > 
> > > > > > >
> > 
> > > > > > > Never heard of OpenMediaVault (OMV) either, though it looks
> > > > > > > *really*
> > 
> > > > > > > interesting as it's based on Debian. Not thrilled that they're
> > > > > > > still
> > 
> > > > > > > using Squeeze as a base so close to when security support is
> > > > > > > ending.
> > 
> > > > > > > Yes, I know Squeeze now has long term support, but that's a *very*
> > 
> > > > > > > recent change. Apparently there is a procedure to install OMV on
> > 
> > > > > > > Wheezy, though.
> > 
> > > > > > >
> > 
> > > > > > > I've done a project where we used a Debian install with Webmin.
> > > > > > > This
> > 
> > > > > > > approach is nice in that there's more flexibility to add other
> > > > > > > services
> > 
> > > > > > > down the road. Also, I have a great deal of experience managing
> > > > > > > Debian
> > 
> > > > > > > machines, so it's more comfortable for me. Webmin makes it easier
> > > > > > > for
> > 
> > > > > > > the less technical people to check up on things and handle simpler
> > 
> > > > > > > tasks. (I'll call this the WebMin approach.)
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > > > What I'd like to have :
> > 
> > > > > > > > 1. Flexibility of adding to the total unit capacity with drives
> > > > > > > > of
> > 
> > > > > > > > different capacities.
> > 
> > > > > > >
> > 
> > > > > > > FreeNAS can handle this fine, it prefers using ZFS pools. (If
> > > > > > > you're
> > 
> > > > > > > familiar with LVM, ZFS is somewhat similar in concept but with more
> > 
> > > > > > > features.) NAS4Free and NAPP-it should be the same for the same
> > 
> > > > > reasons.
> > 
> > > > > > >
> > 
> > > > > > > OMV and the Webmin approach I mentioned are both linux based. You
> > > > > > > can
> > 
> > > > > > > easily used LVM, RAID, or some combination of both.
> > 
> > > > > > >
> > 
> > > > > > > That said:
> > 
> > > > > > >
> > 
> > > > > > > You are aware that RAID/RAIDZ implementations are limited by the
> > 
> > > > > > > smallest member of their array/volume, right? No matter what
> > > > > > > solution
> > 
> > > > > > > you end up using, you'll hit this limitation. There is unRAID, but
> > 
> > > > > > > that's not so good redundancy-wise. ISCSI would be problematic with
> > 
> > > > > > > unRAID, and you have to pay if you use more than 3 drives.
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > > > 2. Fault tolerance of at least one drive failure; two preferred.
> > 
> > > > > > >
> > 
> > > > > > > Here's where it gets tricky. ZFS does support setting up a mirror
> > > > > > > as
> > 
> > > > > > > well as a few software raid implementations (RAIDZ1/RAIDZ2/RAIDZ3).
> > 
> > > > > > >
> > 
> > > > > > > What it doesn't support is adding drives to an existing RAIDZ set.
> > > > > > > Not
> > 
> > > > > a
> > 
> > > > > > > problem if you're starting with all the drives you plan to use, but
> > > > > > > if
> > 
> > > > > > > you ever want to add more drives to the RAIDZ:
> > 
> > > > > > > You'll need to backup the data,
> > 
> > > > > > > destroy the old RAIDZ,
> > 
> > > > > > > create a new RAIDZ consisting of the drives from the old one and
> > > > > > > any
> > 
> > > > > new
> > 
> > > > > > > drives,
> > 
> > > > > > > restore the backup.
> > 
> > > > > > >
> > 
> > > > > > > The other option is to add drives in pairs/triplets and make them
> > 
> > > > > > > separate RAIDZ volumes.
> > 
> > > > > > >
> > 
> > > > > > > MDADM (Linux RAID) can very easily add drives to existing arrays.
> > 
> > > > > > > You'll have to expand any LVM volume and filesystem on it
> > > > > > > afterwards.
> > 
> > > > > > >
> > 
> > > > > > > > 3. Presenting the storage via an iSCSI target.
> > 
> > > > > > >
> > 
> > > > > > > Trivial in FreeNAS/NAS4Free. NAPP-it can apparently do this as
> > > > > > > well.
> > 
> > > > > OMV
> > 
> > > > > > > has a plugin for this, as does Webmin.
> > 
> > > > > > >
> > 
> > > > > > > > 4. Adding and/or replacing disks without taking the ISCSI target
> > 
> > > > > offline.
> > 
> > > > > > >
> > 
> > > > > > > If the target is a RAIDZ or RAID volume, then yes.
> > 
> > > > > > >
> > 
> > > > > > > > 5. Admin/management via a web UI (not nearly as important as the
> > 
> > > > > other
> > 
> > > > > > > > four, if I have to use the CLI, so be it.)
> > 
> > > > > > >
> > 
> > > > > > > All of the examples at the top are geared towards web UI, though
> > > > > > > many
> > 
> > > > > > > also let you use a terminal or ssh in.
> > 
> > > > > > >
> > 
> > > > > > > >
> > 
> > > > > > > > Pros/Cons/Suggestions/Thoughts/Tar/Feathers?
> > 
> > > > > > > >
> > 
> > > > > > >
> > 
> > > > > > > The problem with ZFS is that it has many great features, but not
> > > > > > > all
> > 
> > > > > > > apply at once. I was looking into it for a major project and got
> > > > > > > really
> > 
> > > > > > > excited reading about all the great support it has for adding
> > > > > > > drives
> > 
> > > > > > > expanding pools, snapshots, and RAIDZ. It wasn't until I got into
> > > > > > > the
> > 
> > > > > > > details via a test VM that I found out about RAIDZ volumes not
> > > > > > > being
> > 
> > > > > > > expandable.
> > 
> > > > > > >
> > 
> > > > > > > You also need to make sure that whatever OS you use has a version
> > > > > > > of
> > 
> > > > > ZFS
> > 
> > > > > > > that supports the feature(s) you want to use. I wouldn't mess with
> > > > > > > ZFS
> > 
> > > > > > > on linux at all.
> > 
> > > > > > >
> > 
> > > > > > > Also, ZFS isn't really recommended for 32-bit systems. You can do
> > > > > > > it,
> > 
> > > > > > > but I really don't advise it if you'll be dealing with large
> > > > > > > amounts
> > > > > > > of
> > 
> > > > > > > storage. Especially if combined with low amounts of RAM.
> > 
> > > > > > >
> > 
> > > > > > > On the other hand, LVM and Linux RAID are very mature approaches
> > > > > > > with
> > 
> > > > > > > easy to use tools.
> > 
> > > > > > >
> > 
> > > > > > > If a web UI is a lower priority for you, it sounds like this system
> > 
> > > > > will
> > 
> > > > > > > be run by a reasonably technically proficient person. The older I
> > > > > > > get,
> > 
> > > > > > > and the more projects I get under my belt, the less I like the
> > 
> > > > > > > all-in-one or "appliance" approaches.
> > 
> > > > > > >
> > 
> > > > > > > If you just do a standard install of a distro, you'll get
> > > > > > > continuous
> > 
> > > > > > > security updates and a great deal of flexibility. The downside is
> > > > > > > it
> > 
> > > > > > > takes a little more know-how to get things setup. The really nice
> > > > > > > thing
> > 
> > > > > > > about Webmin vs some of the other admin interfaces like
> > 
> > > > > cpanel/plesk/etc
> > 
> > > > > > > is that Webmin doesn't really mess with the installed system or
> > > > > > > make
> > 
> > > > > > > specialized customizations to it. It's really just a GUI that edits
> > 
> > > > > the
> > 
> > > > > > > config files for you, while still giving you the option to edit
> > > > > > > them
> > 
> > > > > > > yourself. I'm curious where OMV falls on this spectrum.
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > > - PaulNM
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > > >
> > 
> > > > > ___________________________________________________________________________
> > 
> > > > > > > Philadelphia Linux Users Group --
> > 
> > > > > > > http://www.phillylinux.org
> > 
> > > > > > > Announcements -
> > 
> > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > 
> > > > > > > General Discussion --
> > 
> > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > 
> > > > > > >
> > 
> > > > >
> > 
> > > > > >
> > 
> > > > > ___________________________________________________________________________
> > 
> > > > > > Philadelphia Linux Users Group --
> > 
> > > > > http://www.phillylinux.org
> > 
> > > > > > Announcements -
> > 
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > 
> > > > > > General Discussion --
> > 
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > 
> > > > >
> > 
> > > > >
> > 
> > > > > --
> > 
> > > > > Gavin W. Burris
> > 
> > > > > Senior Project Leader for Research Computing
> > 
> > > > > The Wharton School
> > 
> > > > > University of Pennsylvania
> > 
> > > > > ___________________________________________________________________________
> > 
> > > > > Philadelphia Linux Users Group --
> > 
> > > > > http://www.phillylinux.org
> > 
> > > > > Announcements -
> > 
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > 
> > > > > General Discussion --
> > 
> > > > > http://lists.phillylinux.org/mailman/listinfo/plug
> > 
> > > > >
> > 
> 
> > > > ___________________________________________________________________________
> > 
> > > > Philadelphia Linux Users Group -- http://www.phillylinux.org
> > 
> > > > Announcements -
> > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > 
> > > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
> > 
> 
> > > --
> > 
> > > Gavin W. Burris
> > 
> > > Senior Project Leader for Research Computing
> > 
> > > The Wharton School
> > 
> > > University of Pennsylvania
> > 
> > > ___________________________________________________________________________
> > 
> > > Philadelphia Linux Users Group -- http://www.phillylinux.org
> > 
> > > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > 
> > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
> > 
> 
> > ___________________________________________________________________________
> > Philadelphia Linux Users Group -- http://www.phillylinux.org
> > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
> 
> -- 
> 
> -- 
> "Between subtle shading and the absence of light lies the nuance of iqlusion..." - Kryptos 
> 
> Lee Marzke, lee@marzke.net http://marzke.net/lee/ 
> IT Consultant, VMware, VCenter, SAN storage, infrastructure, SW CM 
> 
> 
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

-- 
Gavin W. Burris
Senior Project Leader for Research Computing
The Wharton School
University of Pennsylvania
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug