Rich Freeman on 5 Jan 2011 06:51:38 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Software mirror (RAID1)


On Wed, Jan 5, 2011 at 1:49 AM, JP Vossen <jp@jpsdomain.org> wrote:
> I'm going off on a tangent to Rich's RAID5 question.  I don't have a better
> answer for him, and he's already covered my objections to abusing RAID5.

Tangents are always welcome - usually I'm the one starting them, however...

> NOTE: RAID is **not** a backup!!!  RAID & backups are different things and
> using RAID does not allow you not to back up!

Couldn't agree more on this.  I use a script that employs sarab
(itself a script for dar with rotation logic), gpg, and s3cmd to get
encrypted backups of important stuff onto Amazon S3 reduced-redundancy
storage (with multiple copies of the key stashed away in secure
offsite locations).

I do depend on my RAID as a form of backup for less-important data, so
I wouldn't be able to safely use raid1 to the extent that you do.  I
probably will tolerate some loss of redundancy during my data
migrations to save myself the expense of an additional controller
card, but I wouldn't want to routinely degrade my array.  Stuff like
DVR recordings and various reproducible junk isn't worth 10
cents/month/GB to backup (plus transfer costs), but I'd still regret
losing them.

> RAID1 Cons:
>        * Arguably slower than other solutions

I think this one is oversold from what I've read.  It REALLY depends
on your use case.

Striped RAIDs of any kind (including raid5) are really great for
high-bandwidth streaming of large files (either reads or writes).
They're no better/worse than standalone drives for random read seeks,
and they're worse than standalone drives for random writes of small
amounts of data (must re-read entire stripe), except for COW
implementations (only ZFS that I'm aware of - btrfs doesn't support
raid5 yet).

RAID1 implementations are generally no better/worse than standalone
drives for high-bandwidth streaming of large files (either reads or
writes).  They're also the same as standalone drives for random writes
of small amounts of data.  They're double the performance of
standalone drives for random read seeks.

So, RAID5 performs the same or better than RAID1 for all use cases
except random reads of small amounts of data.  The problem is that
random reads of small amounts of data is probably 95% of what most
hard drives end up doing, and this is why SSDs do so well.

Some might take issue with my assertion that writes on RAID1 are the
same as a standard hard drive, since RAID1 is usually cited as having
a write penalty.  I think this is oversold, but I'm open to argument
here.  It is true that a write must tie up all drives, but that only
deprives you of the ability to do parallel reads on the additional
drives, which is something that standalone drives never had in the
first place.  It also isn't a differential penalty of RAID1 since
RAID5 has to do the same.  The main difference between RAID1 and RAID5
in this regard is that since RAID5 drives inevitably always end up
doing all seeks in parallel the heads of all drives are always in the
same place anyway, so there is no additional seeking for a write.
RAID1 tends to have more independent drive operation due to parallel
reads, and only on writes do they need to sync up.

Note also that the benefits of RAID5 for high-bandwidth reads only apply if:
1.  You really are sustaining high-bandwidth.  Simply reading a big
file slowly (most media playback) doesn't get you anything since any
RAID configuration can sustain this.

2.  Your bus/arch/software/cpu/etc can actually handle the bandwidth.
You might have 10 drives striped with 6Gb/s SATA, but I doubt that any
of the busses on your motherboard can really deal with 60Gb/s of data
running around, and unless you have 500GB of RAM you're not going to
be buffering it even if your memory had the bandwidth for it.

Note also that clever use of software/implementations/etc can probably
get very good bandwidth out of RAID1 - you just need to seek two
different parts of the same file and read them in parallel.  Also,
while most implementations of RAID1 limit you to a single mirror,
there is no reason you couldn't have 10 mirrors and 10x the read
performance.

>> Expanding a RAID1 is not practical, but expanding a RAID5 is trivial.
>
> Depending on how you define it, expanding RAID1 is easy, but a bit time
> consuming.  As noted above, you can expand onto bigger hard drives pretty
> easily without any reinstalls.  No, not the same as or quite as easy as
> expanding a RAID5, but not impractical either.

So, my thinking in this regard was that with a working RAID5 adding a
1TB disk gets you 1TB of additional usable space.  With a working
RAID1 you need to add 2TB of disk to get you 1TB of additional usable
space.  You keep paying that N/2 vs N-1 penalty over and over again.

On the other hand, that only applies with like-sized disks.  You're
only going to add identical disks if you expand storage not long after
initial setup.

Just look at my case - I had a 120GB drive fail.  Now, I could replace
that with a 120GB drive for $40, and get the same amount of space.
Or, I can spend $120 and get 750GB of additional usable space even
after retiring the failed and two additional old 120GB hard drives
(with a power savings).

Unless you're constantly expanding, new drive replacements will be so
much larger than existing drives that you're going to end up creating
new arrays anyway, which negates the RAID5 advantage here.

That's why in the end I ended up rethinking this and went with
conservative RAID1.  I can convert later if it makes sense, but most
likely if I add another drive it will be a 10TB drive or something
ridiculous like that anyway.  Maybe by then I can just buy 1, format
it with btrfs, copy the data over, and then mirror it across all my
old drives combined.  (Btrfs supports mixed collections of drives -
every file ends up on two independent drives, but if drive sizes are
seriously mismatched you might not be able to use all the space on the
largest drive since total space is limited to the combined total of
the smallest N-1 drives I guess - or something like that.)

Thanks for the insightful post, as always!

Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug