Rich Freeman via plug on 7 Nov 2022 13:30:32 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Box won't boot after RAID drive swap


On Mon, Nov 7, 2022 at 11:29 AM Keith via plug
<plug@lists.phillylinux.org> wrote:
>
> if you can't remove a
> drive from RAID 1 and replace it with a new drive then LVM is the clear
> winner.

Of course you can do this.  Nobody would use it otherwise.

If a drive fails during operation, the array becomes degraded and
keeps operating.  If during boot a drive is missing, the default is to
mount it degraded, but there is a command line option to tell mdadm to
refuse to mount a degraded array.  There are reasons why you might
want to do that, but it isn't the default.  Of course a distro could
make it their default by putting that option in their startup scripts.

If you install a new drive mdadm won't just wipe it out and use it.
You need to add it to the array first.  You can pre-add drives as
spares to an array in which case if a drive fails then a spare will be
selected and added, and the array will immediately begin to rebuild.
You can also replace a drive in an array while it is still present.
In this mode mdadm will just add it as an additional mirror and then
once it has fully rebuilt it will automatically remove the drive that
is to be replaced.

Basically it does the right thing in most circumstances.  Hard to be
certain what is going on here, but it could be that the distro has
overridden the behavior and is preventing a degraded array from
mounting, or the array just isn't finding any drives.  Keep in mind
this is a computer that can't even be reliably booted to firmware, so
this is getting beyond the scope of raid.

I'm definitely interested in how the data fares at the end of
everything, though it is definitely worth mentioning that mdadm has no
protection for silent corruptions (ie changes in data on disk that do
not trigger the drive to report a read error - that could be due to
bit flips, or it could be due to corrupted writes to the disk due to
hardware issues).

If you want protection against silent corruption without adopting
zfs/btrfs, check out dm_integrity.  I'm not sure offhand if you can
easily incorporate it into LVM, but it is a device mapper layer that
will turn silent corruptions into read errors which will trigger the
appropriate recovery in your raid software as long as it happens at a
lower layer.

None of this is to comment on LVM vs mdadm RAID.  I haven't looked
into the differences closely enough to comment on that.  I've only
used mdadm for this (and btrfs/zfs).  Just wanted to say that normally
mdadm handles disk failures as you'd expect from any RAID.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug