Rich Freeman via plug on 28 Oct 2022 06:21:41 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] RAID1 impending failure questions


On Fri, Oct 28, 2022 at 8:45 AM Walt Mankowski via plug
<plug@lists.phillylinux.org> wrote:
>
> First question -- The two devices in the array are /dev/sdc1 and
> /dev/sdd1. The alert says the questionable drive is sdc1. When I open
> things up, is there any easy way to tell which drive is which? Will
> the serial number be printed on the drive?

Yes, but of course it won't have "sdc" on it.  smartctl will output
the serial number and you can find that on the drive.

One trick I use is to put a label with the serial number on the side
of each drive next to the cables.  I have a fair number and I don't
want to jostle half a dozen connectors fishing for the drive I want to
pull.

> Second question -- Let's say I remove the old drive, install the new
> drive and it's sde1. Will the system think it's a RAID1 with one drive
> and just use that until I add sde1 to the array?

Assuming you're talking about mdadm, then yes.  It will boot as a
degraded array.  Obviously I can't make promises but this is how every
other RAID system I've encountered works.  Of course you can set some
up to refuse to start the array in a degraded mode for safety, but the
most typical option is to operate degraded and scream for help, since
the main point of RAID is to avoid downtime.

If you're going to go this route a cleaner approach would be to fail
and remove the old drive, so then it isn't seen as missing, though
obviously you are not redundant.

> Third question -- As long as everything is working, and assuming I've
> got the slots, power, cables, etc, would it make sense to add the new
> drive as a third drive in the array, let it sync, then remove the old
> drive from the array?

This is definitely the safest option and what I always do unless I'm
tight for interfaces.  I believe mdadm has a --replace option that
will do it as one step, but you could make the array triple-redundant
first and then remove the old one.

mdadm does this all online, so the system is usable while this is
going on, and it checkpoints so you can actually shutdown/reboot at
any point and it will just resume where it left off.  Of course you
can't actually remove drives until they're replicated (at least not
without ending up degraded).

I haven't used mdadm in a few years, but most raid-like software
implementations have similar features.  They degrade if a drive is
lost, and they usually have a way to cleanly replace a drive without
the array becoming degraded.

It sounds like your old drive is well-behaved and just giving some
errors.  If you get a drive that is misbehaving and causing disruption
to other drives due to some kind of interface issue then it is
probably safest to just fail and disconnect it and operate degraded.

-- 
Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug