Rich Freeman on 21 Aug 2011 19:24:21 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Degraded RAID

On Sun, Aug 21, 2011 at 10:04 PM, Jeff Bailey <> wrote:
> How do I know why it got removed, and whether it is actually failed?  Can I
> just try to re-add it, and if it works, great and if it fails, I'll have an
> idea why?  I know which device it is - where do I go from there?

You can certainly try to re-add it.  To know why it failed you
probably need to check your logs.  However, I suspect the relevant
logs to to dmesg, and may or may not make it to syslog.  You can check
the status in /proc/mdstat and if it doesn't get added be sure to
check dmesg for anything interesting.

I had a motherboard that was a bit flaky with the IDE controller and
sometimes one of my drives wouldn't get detected.  If I ran in
degraded mode it might or might not get automatically re-added on the
next reboot.  Something like that is always a possibility.

Just be sure you're adding the right drive to the array so that you
don't hose something important (it will wipe the drive).  if you
actually lose a drive devices like /dev/sd[abcd] will get re-ordered
so /dev/sd# might not be what you think it is.  You could do a "file
-s <device>" to try to get an idea of what is already on the device
first, or use mdadm --examine <device> to see how it used to fit into
an array.

If you re-add the old device back to the raid it will probably rebuild
fairly quickly - the raid keeps track of what actually changed and
only rebuilds those regions.  If you want to be extra-safe you can
"echo check > /sys/block/md#/md/sync_action" once it is done
rebuilding - that will force the raid to check the parity on every
stripe and will detect any damage to the array (oterwise you'll find
it the first time you read a damaged stripe).  It can potentially try
to fix errors, but the design of linux software raid makes that
potentially imperfect.  Filesystems like btrfs that checksum
everything provide a higher degree of assurance that you're
overwriting the bad data with good data.

Hope that helps!

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --