Rich Freeman via plug on 7 Nov 2022 16:15:38 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Box won't boot after RAID drive swap


On Mon, Nov 7, 2022 at 6:09 PM Keith via plug
<plug@lists.phillylinux.org> wrote:
>
> 1) Has anyone every run a degraded RAID 1 (i.e. only one disk online)
> that was created with mdadm?  Was that a boot set or data set?

Yes.  Probably data/root.  Not sure I've had it happen to a boot disk
- it might have happened but I haven't used mdadm in a while.  Using
mdadm with a boot partition is a little tricky, especially with EFI.
Might be easier to just manually sync two FAT32 partitions these days,
since it doesn't change often.  The issue is that it needs to be
readable by the bootloader itself, and those aren't
super-sophisticated.

> 2) Has anyone ever replaced a failed RAID 1 disk with mdadm without
> first removing the bad disk while the system was up? What where your
> steps and is this (or your process) documented somewhere?

Sure.  It is a two-liner.

Let /dev/md0 be a RAID, and let /dev/sdbad be the bad disk in the
array.  Let /dev/sdgood be the new drive.

mdadm /dev/md0 --add /dev/sdgood

Now the new drive is a spare drive in the array.  If the array is
degraded it will probably just start rebuilding immediately.  If
you're replacing the bad drive before it has actually failed
completely and been dropped from the array then nothing will happen
immediately.  If that is the case, issue the next command.

mdadm /dev/md0 --replace /dev/sdbad   #  optionally add --with /dev/sdgood

This will find a spare drive in the array and use it to replace the
bad drive.  First the drive will be added as an additional replica
until it has fully rebuilt.  Then the bad drive will get dropped from
the array.  If you get a double failure during that process then any
good data on the bad drive will still be recovered in theory, and any
data that had been replicated so far would definitely be fine.

You could also manually increase the redundancy level, and then drop
the bad drive and reduce the redundancy level so that the array is no
longer degraded.  That basically is the same thing, just issuing them
as separate instructions.

If you had added the good drive as a spare before the original
failure, then if the array degrades the hot spare would immediately
get rebuilt.  Instead of a hot spare you could also tell mdadm to make
it a 3x mirror, in which case the array would still have a degraded
status after a failure, but you'd be safe even if a second drive
failed.  You could make it non-degraded simply by dropping the replica
target to 2.

> Sorry, I'm suddenly curious about this.

Lol - you don't have to apologize for asking linux nerds to talk about
linux.  :)

I've run mdadm+LVM in the past and it is pretty robust.  The main
weakness is that it doesn't detect silent corruption, and you do have
the traditional raid write hole.  The silent corruption issue could be
addressed by stacking it on top of dm_integrity.  There is no solution
to the RAID write hole with mdadm.  In theory mdadm could address it
with journaling at the RAID level but that would require double writes
and be very expensive.  COW filesystems address the write hole by
knowing what space is free and not overwriting data in place.  Non-COW
filesystems can address it via journaling, with less of a penalty than
doing it at the RAID layer.  The problem with doing it in RAID is that
the RAID layer has no idea what blocks are expendable and so it has to
just blindly protect them all, and of course writes are amplified in
length by the striping.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug