Keith via plug on 7 Nov 2022 15:09:40 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Box won't boot after RAID drive swap


On 11/7/22 17:54, Walt Mankowski via plug wrote:
On Mon, Nov 07, 2022 at 09:56:02PM -0500, Rich Mingin (PLUG) via plug wrote:
On Mon, Nov 7, 2022 at 4:30 PM Rich Freeman via plug
<plug@lists.phillylinux.org> wrote:
Basically it does the right thing in most circumstances.  Hard to be
certain what is going on here, but it could be that the distro has
overridden the behavior and is preventing a degraded array from
mounting, or the array just isn't finding any drives.  Keep in mind
this is a computer that can't even be reliably booted to firmware, so
this is getting beyond the scope of raid.
Getting ahead of the first issue. Don't blame failure to boot on the
array when the computer is frequently failing to complete basic power
on tests before turning off again. There is a hardware issue, beyond
just the disks. No OS is loaded at that time, if the box is powering
off mid-POST, there absolutely is a hardware problem to identify and
resolve before anything with md/LVM/etc come into play.

Could be a loose cable, could be power supply damage by the failing
disk, could be intermittent cosmic ray errors. Too little data to
guess meaningfully, beyond needing more troubleshooting.
It seems to be both an mdadm issue and a hardware issue. The first
thing I did was remove the old drive and replace it with a new
one. It got well into the boot process but refused to mount the array
with one of the drives missing. It also refused to boot without my
external backup drive plugged in, presumably because I was mounting
/dev/sde as /backup in /etc/fstab. (I really need to check my default
settings if I can ever get this box to boot again!)

This business with it shutting down before it even finishes booting
started after I put the old drive back.

I need to check the hardware cables again, and also dumb stuff like
maybe I'm just not plugging in the power cable all the way. But along
with all these computer problems I've also come down with a nasty
chest cold, and I'm just not feeling up to crawling under my desk
again today.

Walt


Feel better Walt !!

Based on your findings, one of Rich's post and one of Leroy's posts, I now have some questions for the list after doing a quick Google myself, I'm also not finding any examples of replacing a failed drive in a RAID 1 without removing the drive ***first*** while it is online.

1) Has anyone every run a degraded RAID 1 (i.e. only one disk online) that was created with mdadm?  Was that a boot set or data set?

2) Has anyone ever replaced a failed RAID 1 disk with mdadm without first removing the bad disk while the system was up? What where your steps and is this (or your process) documented somewhere?

Sorry, I'm suddenly curious about this.

--
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Keith C. Perry, MS E.E.
Managing Member, DAO Technologies LLC
(O) +1.215.525.4165 x2033
(M) +1.215.432.5167
www.daotechnologies.com

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug