Keith via plug on 7 Nov 2022 08:29:55 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Box won't boot after RAID drive swap


On 11/7/22 10:10, Walt Mankowski via plug wrote:
On Mon, Nov 07, 2022 at 02:18:09AM +0000, LeRoy Cressy via plug wrote:
On 11/6/22 18:14, Walt Mankowski via plug wrote:
I tried again this afternoon. I put the old drive back in, checked all
the cables and connections, and turned it on. It booted up just fine!

So then I shut it down and put the new drive back in. It wouldn't boot
up, because apparently it *really* wants both drives n the array before
it will boot.

I tried booting into recovery mode. I tried commenting out some
references to the RAID in /etc/fstab. It still wouldn't boot.

So then I put the old drive back in. My plan was to boot it up,
explicitly tell mdadm to remove the bad drive from the array, then
shutdown and do another swap back to the new drive. Now we're back to it
spontaneously shutting down before it finishes booting!
I also have a system I built in 2017 which I built a RAID 1 array.
Needless to say I recently read the mdadm man page.  It seems that you
cannot just pull a drive and replace it.
That's my conclusion too, and it's really surprising to me. I figured
a common case mdadm would need to handle would be if one of the drives
died and never spun up during the boot process. Maybe I'm missing
something, but for RAID1 I don't see any reason why it wouldn't just
spin up with the remaining drive.


FYI... I don't use mdadm to do my volumes, only LVM.  Maybe there is a difference but you are correct in how RAID 1 is supposed to work.  The entire purpose is to provide redundancy so that a system can stay operation until the corrective actions can be taken.  My external backups are RAID 1s that 90% of the time I use in a degraded form (i.e. only one drive is online).  Every so often I bring out the second drive and let it sync.  I don't even have to do anything for that, once the system detects it, it goes to town.

Also, with just using LVM on a boot drive, I've also booted from either mirror.  That works as expected too.

The problem you had, I've had as well and the procedure was to remove the bad drive.  It doesn't have to be online to do that and in the case of a disk crash it won't be anyway.  How LVM reports the drive after that I forget but it whatever the case, it is actually still usable.  From there you can go through the steps to add the drive to the mirror and it will sync.  You can use it while this happening.

With LVM it is even possible to take an existing drive with data, get a new disk and form a RAID 1 without losing data (as I recall the trick there was to have already run pvcreate so that there was space reserved for LVM headers.  The alternative would be to create a degraded mirror from the start, copy your data to the second disk and then add the first as the mirror.  Lot of moves but that works too).

I've talked about this at PLUG before... using LVM alone vs. mdadm and at the time the thought was there functionally was no difference (mdadm probably still has better low level tooling) but if you can't remove a drive from RAID 1 and replace it with a new drive then LVM is the clear winner.  I should also mention that this process does need to be done with live boot media and that might be the difference.  Modern boot media will find and happily use LVs so you can do the swap.  I don't think that this the case with mdadm and perhaps that is why you have to have your existing system up to do so the config is available.


--

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Keith C. Perry, MS E.E.
Managing Member, DAO Technologies LLC
(O) +1.215.525.4165 x2033
(M) +1.215.432.5167
www.daotechnologies.com

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug