Gavin W. Burris on 9 May 2014 13:17:29 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] iSCSI storage appliance(s) |
Hi, Lee. I was implying that one should confirm that hot swapping drives under Linux is a good idea with a particular hardware config sans raid controller. I have considered ZFS for PB sized installations. I was not convinced that the state of ZFS on Linux was worth my reputation. Who was it that said the interface between ZFS and Linux was like nailing jello to a tree? I did seriously considering Nexenta, though, which is a x86 Solaris. ZFS on Solaris does make sense to me. Cheers. On Fri 05/09/14 09:01PM +0100, Lee H. Marzke wrote: > >I strongly discourage you from using software raid on such a large storage system. > > Software vs Hardware RAID has nothing to do with hot swap or the size of the system > > >How do you plan to hot swap drives? > With ZFS > zpool offline tank ada1p1 ( then remove and replace device ) > zpool replace tank <failed device> ada1p1 > > Non-ZFS systems need HW RAID controllers because of the RAID5 write hole > where writing a strip of data is not atomic. The controller must read the strip > calculate parity, then write the stripe back. If you lose power the > whole strip is lost without hardware RAID. The battery on HW RAID controllers > preserves the bits that were not written until the next boot after the power > failure and then replays the data back into the disk. > > With ZFS, since all writes are atomic operations ( due to copy-on-write) > there is no write hole, and nothing is damaged if you lose power during > a write. You either write the transaction, or it is aborted. This is > because the original strip is not modified, the entire stripe is written > again to a new location on disk, and then pointers are updated to make > the new location active. > > ZFS is designed to handle peta or 'zeta' bytes of storage and it's name implies > and ZFS is software RAID only. > > There is no fsck with ZFS as the file system works like a database. Transactions > of up to 5 seconds of writes are put into a transaction group, and that group > is then written as one atomic operation , or aborted. So the disk is never > inconsistant after a power loss. ( But it may be missing one 5 second transaction > if that was in-process during the power failure ) > > With ZFS you run a background 'scrub' operation about once a week that checks > all the disk meta data and data checksums and repairs any issues. > > > Lee Marzke > VMware , Infrastructure consultant > http://marzke.net/lee > http://plone.4aero.com/ > > > > > ----- Original Message ----- > > > From: "Carl Johnson" <cjohnson19791979@gmail.com> > > To: "Philadelphia Linux User's Group Discussion List" > > <plug@lists.phillylinux.org> > > Sent: Friday, May 9, 2014 3:30:08 PM > > Subject: Re: [PLUG] iSCSI storage appliance(s) > > > I strongly discourage you from using software raid on such a large > > storage system. > > > I've read a fair amount on both schools of thought. My take; in the end it's > > a matter of preference as an admin. What's your thoughts on it? > > > How do you plan to hot swap drives? > > > Kinda' depends on what I settle on as far as software goes. If I go with > > OMV/Debian for example, I'd be using mdadm probably. Is that what you meant? > > Or did you mean physically? Is that's the case, I'd imagined it working > > something like..... > > 1. Depending on the state of the drive being removed, I'd issue a command or > > two to remove it. > > 2. Lift the release tab on the tray and pull. > > 3. Unscrew the drive from the tray. > > 4. Screw a new one into the tray. > > 5. Re-insert the tray. > > 6. Issue a couple commands to realize/rebuild the drive into the array. > > > On Fri, May 9, 2014 at 3:17 PM, Gavin W. Burris < bug@wharton.upenn.edu > > > wrote: > > > > Hi, Carl. > > > > > > I don't work for SilMech. I have bought, configured and deployed their > > > > > hardware in a variety of solutions. > > > > > > I strongly discourage you from using software raid on such a large > > > > > storage system. How do you plan to hot swap drives? > > > > > > Cheers. > > > > > > On Fri 05/09/14 02:01PM -0400, Carl Johnson wrote: > > > > > > I've already selected some norco hardware for the smaller sites. chenbro > > > > > > makes a 50 bay case for the larger sites. i'm trying to stay away from > > > > > > hardware RAID for flexibility reasons. i will say though..... that looks > > > > > > like decent stuff. do you work/sell for them? > > > > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 11:42 AM, Gavin W. Burris < bug@wharton.upenn.edu > > > > >wrote: > > > > > > > > > > > > > Hi, Carl. > > > > > > > > > > > > > > How about a simple, big, rack-mount server with lots of disk bays? Take > > > > > > > a look at something like this: > > > > > > > http://www.siliconmechanics.com/i50434/4u-storage-server.php > > > > > > > > > > > > > > You can get it configured with CentOS and a MegaRAID controller, which > > > > > > > has a command line and GUI utility. 36 2TB hot-swap drives. Make three > > > > > > > virtual disks in sets of 12. You could do RAID 6, or even RAID 6 plus a > > > > > > > hot spare. Keep some cold spares on site. When you get an email > > > > > > > warning of a failed disk, swap it out. Done. Put XFS on it and you > > > > > > > have a 50TB NFS server. > > > > > > > > > > > > > > Cheers. > > > > > > > > > > > > > > > > > > > > > On Fri 05/09/14 11:25AM -0400, Carl Johnson wrote: > > > > > > > > What kind of hardware do you plan to use? > > > > > > > > C.O.T.S x86 server grade stuff mostly. Need more specifics? > > > > > > > > > > > > > > > > Roughly how much storage do you plan to manage? We're going to start > > > > > > with > > > > > > > > about 20TB. It's tough to plan how much we'll need though, as this is > > > > > > for > > > > > > > > CCTV DVR's with motion detection. Hence the scalablity requirement. > > > > > > I'm > > > > > > > > told that the pesky insurance company says their system needs to be > > > > > > > capable > > > > > > > > of retaining three years of recordings. So it's tough to gauge how > > > > > > much > > > > > > > > we'll really need to do this again, because of different camera frame > > > > > > > > rates/resolutions/how much motion etc. you get the idea..... > > > > > > > > > > > > > > > > Is ISCSI the only thing you'd like to do? I had thought about using > > > > > > NFS, > > > > > > > > but I ended up using iSCSI because I thought it'd fit better. I was > > > > > > > trying > > > > > > > > to avoid layers of abstraction/complexity. Am I wrong? > > > > > > > > > > > > > > > > My main reason for not going the ZFS route is what you confirmed. > > > > > > Easy > > > > > > > > scalability and RAM both of which potentially change the hardware > > > > > > scope > > > > > > > the > > > > > > > > most and, therefore, the cost. > > > > > > > > > > > > > > > > If a web UI is a lower priority for you, it sounds like this system > > > > > > will > > > > > > > > be run by a reasonably technically proficient person. > > > > > > > > SystemS, probably 15ish in total all said and done. But yeah....Hi, > > > > > > I'm > > > > > > > > Carl, nice to meet you. ;-). There may be occasion where I'll need to > > > > > > > talk > > > > > > > > someone else thru say, a disk replacement, via phone or something so > > > > > > > hence > > > > > > > > the webUI need. > > > > > > > > > > > > > > > > As it is right now, I've got two boxes in a test system. One, the > > > > > > storage > > > > > > > > box, is running CentOS. The other, the DVR itself, is an Ubuntu 12LTS > > > > > > > box. > > > > > > > > I may try a wash/rinse/repeat on the storage box with OMV though and > > > > > > see > > > > > > > if > > > > > > > > I like it or not. Though honestly, after reading thru your response, > > > > > > I'm > > > > > > > > probably going to go either the OMV/debian route or ditch the > > > > > > appliance > > > > > > > > overlay completely and use Centos/SoftRAID/Btrfs. > > > > > > > > > > > > > > > > Can you elaborate on the webmin idea? Specifically, what *.wbm's do I > > > > > > > need > > > > > > > > to do all that I'm asking? That may be something else I'll have to > > > > > > test > > > > > > > > drive too. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, May 9, 2014 at 5:05 AM, PaulNM < plug@paulscrap.com > wrote: > > > > > > > > > > > > > > > > > On 05/08/2014 01:32 PM, Carl Johnson wrote: > > > > > > > > > > Who's familiar with any of the NAS distros out there? > > > > > > > > > > FreeNAS/NAS4Free/NAPP-it/Openfiler/Openmediavault.....etc.? > > > > > > > > > > > > > > > > > > > > > > > > > > > > What kind of hardware do you plan to use? Roughly how much storage > > > > > > > do > > > > > > > > > you plan to manage? Is ISCSI the only thing you'd like to do? > > > > > > > > > > > > > > > > > > I have more personal experience with FreeNAS/NAS4Free than the > > > > > > > others > > > > > > > > > (except for the Webmin approach I'll mention later). Actually, to > > > > > > > be > > > > > > > > > precise, I've never used NAS4Free. It's a continuation of older > > > > > > > versions > > > > > > > > > of FreeNAS that I have used, though. > > > > > > > > > > > > > > > > > > Openfiler appears to be a dead project. Their last release is ~3 > > > > > > > years > > > > > > > > > old and there doesn't appear to be any real work going on. > > > > > > > > > > > > > > > > > > Never heard of or used NAPP-it, so can't really comment on it. It > > > > > > > > > appears to be opensolaris/openindiana based? The site isn't very > > > > > > > clear. > > > > > > > > > > > > > > > > > > Never heard of OpenMediaVault (OMV) either, though it looks > > > > > > > *really* > > > > > > > > > interesting as it's based on Debian. Not thrilled that they're > > > > > > > still > > > > > > > > > using Squeeze as a base so close to when security support is > > > > > > > ending. > > > > > > > > > Yes, I know Squeeze now has long term support, but that's a *very* > > > > > > > > > recent change. Apparently there is a procedure to install OMV on > > > > > > > > > Wheezy, though. > > > > > > > > > > > > > > > > > > I've done a project where we used a Debian install with Webmin. > > > > > > > This > > > > > > > > > approach is nice in that there's more flexibility to add other > > > > > > > services > > > > > > > > > down the road. Also, I have a great deal of experience managing > > > > > > > Debian > > > > > > > > > machines, so it's more comfortable for me. Webmin makes it easier > > > > > > > for > > > > > > > > > the less technical people to check up on things and handle simpler > > > > > > > > > tasks. (I'll call this the WebMin approach.) > > > > > > > > > > > > > > > > > > > > > > > > > > > > What I'd like to have : > > > > > > > > > > 1. Flexibility of adding to the total unit capacity with drives > > > > > > > > of > > > > > > > > > > different capacities. > > > > > > > > > > > > > > > > > > FreeNAS can handle this fine, it prefers using ZFS pools. (If > > > > > > > you're > > > > > > > > > familiar with LVM, ZFS is somewhat similar in concept but with more > > > > > > > > > features.) NAS4Free and NAPP-it should be the same for the same > > > > > > > reasons. > > > > > > > > > > > > > > > > > > OMV and the Webmin approach I mentioned are both linux based. You > > > > > > > can > > > > > > > > > easily used LVM, RAID, or some combination of both. > > > > > > > > > > > > > > > > > > That said: > > > > > > > > > > > > > > > > > > You are aware that RAID/RAIDZ implementations are limited by the > > > > > > > > > smallest member of their array/volume, right? No matter what > > > > > > > solution > > > > > > > > > you end up using, you'll hit this limitation. There is unRAID, but > > > > > > > > > that's not so good redundancy-wise. ISCSI would be problematic with > > > > > > > > > unRAID, and you have to pay if you use more than 3 drives. > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. Fault tolerance of at least one drive failure; two preferred. > > > > > > > > > > > > > > > > > > Here's where it gets tricky. ZFS does support setting up a mirror > > > > > > > as > > > > > > > > > well as a few software raid implementations (RAIDZ1/RAIDZ2/RAIDZ3). > > > > > > > > > > > > > > > > > > What it doesn't support is adding drives to an existing RAIDZ set. > > > > > > > Not > > > > > > > a > > > > > > > > > problem if you're starting with all the drives you plan to use, but > > > > > > > if > > > > > > > > > you ever want to add more drives to the RAIDZ: > > > > > > > > > You'll need to backup the data, > > > > > > > > > destroy the old RAIDZ, > > > > > > > > > create a new RAIDZ consisting of the drives from the old one and > > > > > > > any > > > > > > > new > > > > > > > > > drives, > > > > > > > > > restore the backup. > > > > > > > > > > > > > > > > > > The other option is to add drives in pairs/triplets and make them > > > > > > > > > separate RAIDZ volumes. > > > > > > > > > > > > > > > > > > MDADM (Linux RAID) can very easily add drives to existing arrays. > > > > > > > > > You'll have to expand any LVM volume and filesystem on it > > > > > > > afterwards. > > > > > > > > > > > > > > > > > > > 3. Presenting the storage via an iSCSI target. > > > > > > > > > > > > > > > > > > Trivial in FreeNAS/NAS4Free. NAPP-it can apparently do this as > > > > > > > well. > > > > > > > OMV > > > > > > > > > has a plugin for this, as does Webmin. > > > > > > > > > > > > > > > > > > > 4. Adding and/or replacing disks without taking the ISCSI target > > > > > > > offline. > > > > > > > > > > > > > > > > > > If the target is a RAIDZ or RAID volume, then yes. > > > > > > > > > > > > > > > > > > > 5. Admin/management via a web UI (not nearly as important as the > > > > > > > other > > > > > > > > > > four, if I have to use the CLI, so be it.) > > > > > > > > > > > > > > > > > > All of the examples at the top are geared towards web UI, though > > > > > > > many > > > > > > > > > also let you use a terminal or ssh in. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Pros/Cons/Suggestions/Thoughts/Tar/Feathers? > > > > > > > > > > > > > > > > > > > > > > > > > > > > The problem with ZFS is that it has many great features, but not > > > > > > > all > > > > > > > > > apply at once. I was looking into it for a major project and got > > > > > > > really > > > > > > > > > excited reading about all the great support it has for adding > > > > > > > drives > > > > > > > > > expanding pools, snapshots, and RAIDZ. It wasn't until I got into > > > > > > > the > > > > > > > > > details via a test VM that I found out about RAIDZ volumes not > > > > > > > being > > > > > > > > > expandable. > > > > > > > > > > > > > > > > > > You also need to make sure that whatever OS you use has a version > > > > > > > of > > > > > > > ZFS > > > > > > > > > that supports the feature(s) you want to use. I wouldn't mess with > > > > > > > ZFS > > > > > > > > > on linux at all. > > > > > > > > > > > > > > > > > > Also, ZFS isn't really recommended for 32-bit systems. You can do > > > > > > > it, > > > > > > > > > but I really don't advise it if you'll be dealing with large > > > > > > > amounts > > > > > > > of > > > > > > > > > storage. Especially if combined with low amounts of RAM. > > > > > > > > > > > > > > > > > > On the other hand, LVM and Linux RAID are very mature approaches > > > > > > > with > > > > > > > > > easy to use tools. > > > > > > > > > > > > > > > > > > If a web UI is a lower priority for you, it sounds like this system > > > > > > > will > > > > > > > > > be run by a reasonably technically proficient person. The older I > > > > > > > get, > > > > > > > > > and the more projects I get under my belt, the less I like the > > > > > > > > > all-in-one or "appliance" approaches. > > > > > > > > > > > > > > > > > > If you just do a standard install of a distro, you'll get > > > > > > > continuous > > > > > > > > > security updates and a great deal of flexibility. The downside is > > > > > > > it > > > > > > > > > takes a little more know-how to get things setup. The really nice > > > > > > > thing > > > > > > > > > about Webmin vs some of the other admin interfaces like > > > > > > > cpanel/plesk/etc > > > > > > > > > is that Webmin doesn't really mess with the installed system or > > > > > > > make > > > > > > > > > specialized customizations to it. It's really just a GUI that edits > > > > > > > the > > > > > > > > > config files for you, while still giving you the option to edit > > > > > > > them > > > > > > > > > yourself. I'm curious where OMV falls on this spectrum. > > > > > > > > > > > > > > > > > > > > > > > > > > > - PaulNM > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > > > Philadelphia Linux Users Group -- > > > > > > > > > http://www.phillylinux.org > > > > > > > > > Announcements - > > > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > > > General Discussion -- > > > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > > > Philadelphia Linux Users Group -- > > > > > > > http://www.phillylinux.org > > > > > > > > Announcements - > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > > General Discussion -- > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Gavin W. Burris > > > > > > > Senior Project Leader for Research Computing > > > > > > > The Wharton School > > > > > > > University of Pennsylvania > > > > > > > ___________________________________________________________________________ > > > > > > > Philadelphia Linux Users Group -- > > > > > > > http://www.phillylinux.org > > > > > > > Announcements - > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > > General Discussion -- > > > > > > > http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > > > > > > > > > ___________________________________________________________________________ > > > > > > Philadelphia Linux Users Group -- http://www.phillylinux.org > > > > > > Announcements - > > > > http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug > > > > > > -- > > > > > Gavin W. Burris > > > > > Senior Project Leader for Research Computing > > > > > The Wharton School > > > > > University of Pennsylvania > > > > > ___________________________________________________________________________ > > > > > Philadelphia Linux Users Group -- http://www.phillylinux.org > > > > > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > > > > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug > > > > > ___________________________________________________________________________ > > Philadelphia Linux Users Group -- http://www.phillylinux.org > > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug > > -- > > -- > "Between subtle shading and the absence of light lies the nuance of iqlusion..." - Kryptos > > Lee Marzke, lee@marzke.net http://marzke.net/lee/ > IT Consultant, VMware, VCenter, SAN storage, infrastructure, SW CM > > > ___________________________________________________________________________ > Philadelphia Linux Users Group -- http://www.phillylinux.org > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug -- Gavin W. Burris Senior Project Leader for Research Computing The Wharton School University of Pennsylvania ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug