Walt Mankowski via plug on 6 Nov 2022 15:14:51 -0800


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Box won't boot after RAID drive swap


I tried again this afternoon. I put the old drive back in, checked all the cables and connections, and turned it on. It booted up just fine!

So then I shut it down and put the new drive back in. It wouldn't boot up, because apparently it really wants both drives n the array before it will boot.

I tried booting into recovery mode. I tried commenting out some references to the RAID in /etc/fstab. It still wouldn't boot.

So then I put the old drive back in. My plan was to boot it up, explicitly tell mdadm to remove the bad drive from the array, then shutdown and do another swap back to the new drive. Now we're back to it spontaneously shutting down before it finishes booting!

I've checked all the cables and connections multiple times, but nothing is fixing it. I tried booting off memtest86 but it ignored it and booted off the hd again.

It does seem to take longer to shut off if it sits for a while. Maybe one of the fans has failed? Who knows?

At this point I'm thinking I might just bring it to Micro Center tomorrow and asking them how much they'd charge to figure out what's failing. Alternatively, since I built this box in 2017 it might be time to build a new one, but it would be nice to know what I can salvage from this one.

Sigh.

On Sun, Nov 6, 2022, at 11:59 AM, Keith via plug wrote:
On 11/5/22 17:29, Walt Mankowski via plug wrote:
I’m skeptical of your power draw theory, because the spontaneous shutdowns still occurred after I a) unplugged the new drive, and b) put the old drive back. Also the first time I booted up with the new drive it started to boot up, but never completed because it was looking for a drive that wasn’t there.


I wouldn't  rule this out yet.  The only times I've had delayed malfunctions with little to no user interaction or early in booting has been due to bad ram (which others have mentioned) or the PSU.  Bad capacitors and power transistors can exhibit is pattern and this is what happens after a power event- usually power spikes (over voltage condition) but it is possible that the mobo allowed something through for a few millis before things settled down and that was enough to fry something somewhere in the power path.  Once that happens, you're done.  The fact that you said this happened right away after replacing the drive actually increases the likelihood  of it being a power issue for me) and tracing the issue is a nightmare because you literally have to have two of everything to isolate.

If you've reverted back to the previous hardware configuration then what I would do is try to boot from a known to be good live cd / usb device with everything on the mobo disconnected (i.e. all drives), all internal cards removed and if you can, the GPU if you can use mobo graphics for this test.  If you still can't get past the bios then you are down to the PSU and mobo (which could be bad ram).

This is the hard part of the test because you would need to test ever ram module if you can and then some how you have to test the PSU and mobo.  When means you need to do swaps or franken-builds to test.   A bad GPU can cause this too so if you can't test without it, you'll never be able to rule it out.  This process can be longish but it will give you your answers.

FWIW, this type of failure can also cascade... a PSU, mobo or other issue may yield another PSU, mobo or other issue.  Its entirely possible you could find a combination of things... a bad RAM module, bad mobo section or bad PSU section / output group, etc.


On Sat, Nov 5, 2022, at 5:18 PM, brent saner via plug wrote:
This also sounds very much, perhaps moreso, like "the new drive I bought has a higher power draw than the drive I replaced and it's tripping the PSU because it's at its limit".

Try removing non-essential components (gfx card, etc.).

The "I knocked something loose" theory doesn't hold much water; a BIOS/UEFI menu should load with just power and a CPU (and sometimes you need at least 1 stick of RAM). You'd probably get some POST alerts, but you absolutely should be able to still get into your BIOS/UEFI menu.

But no, this definitely feels power related. Either what I suggested above or something is shorting somewhere. 

One time I had a very similar issue. Turned out to be a tiny screw *underneath* the board against an exposed pin and making contact with the grounding plate of the case, shorting out the board.

On Sat, Nov 5, 2022, 15:38 Walt Mankowski via plug <plug@lists.phillylinux.org> wrote:
On Sat, Nov 05, 2022 at 03:32:28PM -0400, Soren Harward via plug wrote:
> On Sat, Nov 5, 2022 at 3:23 PM Walt Mankowski via plug <
>
> > Not very easily. The box is just spontaneously shutting down,
> > sometimes before it even gets to the BIOS screen.
>
>
> That sounds very much like a "I knocked something loose that shouldn't be"
> problem. Double-check all power connectors, that CPU and RAM are seated
> properly, and that all data cables are firmly connected.

That's what I thought too. It's certainly plausible, especially
considering I somehow managed to unplug my GPU from its wall socket
just pulling the case out! However, it's a rat's nest of a zillion
cables in there, and since I set it up 6 years ago I don't have any
idea what's what. Nothing *seemed* loose, but I'll check again.

Thanks!

Walt
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug



___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug

-- 
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Keith C. Perry, MS E.E.
Managing Member, DAO Technologies LLC
(O) +1.215.525.4165 x2033
(M) +1.215.432.5167
www.daotechnologies.com
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug