Keith via plug on 7 Nov 2022 15:09:40 -0800 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
Re: [PLUG] Box won't boot after RAID drive swap |
On 11/7/22 17:54, Walt Mankowski via plug wrote:
On Mon, Nov 07, 2022 at 09:56:02PM -0500, Rich Mingin (PLUG) via plug wrote:On Mon, Nov 7, 2022 at 4:30 PM Rich Freeman via plug <plug@lists.phillylinux.org> wrote:Basically it does the right thing in most circumstances. Hard to be certain what is going on here, but it could be that the distro has overridden the behavior and is preventing a degraded array from mounting, or the array just isn't finding any drives. Keep in mind this is a computer that can't even be reliably booted to firmware, so this is getting beyond the scope of raid.Getting ahead of the first issue. Don't blame failure to boot on the array when the computer is frequently failing to complete basic power on tests before turning off again. There is a hardware issue, beyond just the disks. No OS is loaded at that time, if the box is powering off mid-POST, there absolutely is a hardware problem to identify and resolve before anything with md/LVM/etc come into play. Could be a loose cable, could be power supply damage by the failing disk, could be intermittent cosmic ray errors. Too little data to guess meaningfully, beyond needing more troubleshooting.It seems to be both an mdadm issue and a hardware issue. The first thing I did was remove the old drive and replace it with a new one. It got well into the boot process but refused to mount the array with one of the drives missing. It also refused to boot without my external backup drive plugged in, presumably because I was mounting /dev/sde as /backup in /etc/fstab. (I really need to check my default settings if I can ever get this box to boot again!) This business with it shutting down before it even finishes booting started after I put the old drive back. I need to check the hardware cables again, and also dumb stuff like maybe I'm just not plugging in the power cable all the way. But along with all these computer problems I've also come down with a nasty chest cold, and I'm just not feeling up to crawling under my desk again today. Walt
Feel better Walt !!Based on your findings, one of Rich's post and one of Leroy's posts, I now have some questions for the list after doing a quick Google myself, I'm also not finding any examples of replacing a failed drive in a RAID 1 without removing the drive ***first*** while it is online.
1) Has anyone every run a degraded RAID 1 (i.e. only one disk online) that was created with mdadm? Was that a boot set or data set?
2) Has anyone ever replaced a failed RAID 1 disk with mdadm without first removing the bad disk while the system was up? What where your steps and is this (or your process) documented somewhere?
Sorry, I'm suddenly curious about this. -- ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Keith C. Perry, MS E.E. Managing Member, DAO Technologies LLC (O) +1.215.525.4165 x2033 (M) +1.215.432.5167 www.daotechnologies.com ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug