Re: [PLUG] softraid

Yikes...

I think I know the answer to this question but do you have images of the drive headers? I'm assuming that's what got mangled during the "resync". I haven't used md in long time awhile but is there a backup "header"- like a backup superblock that contain's the drive's metadata?

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Keith C. Perry, MS E.E.

Owner, DAO Technologies LLC

(O) +1.215.525.4165 x2033

(M) +1.215.432.5167

www.daotechnologies.com

From: "Carl Johnson" <cjohnson19791979@gmail.com>
To: "PLUG" <plug@lists.phillylinux.org>
Sent: Tuesday, May 19, 2015 11:10:44 AM
Subject: [PLUG] softraid

I have, orrrrr rather HAD a Centos 6 box with a three softraid arrays setup that I upgraded before my first cup of coffee this morning. It's all downhill from here....

For some reason yum thought it knew more about my raid setup than I did and stuck a few "NAME=" bits and a bunch of other crap into my /etc/mdadm.conf. This of course was done before the new initramfs was auto-built for the new kernel image. Cool! Break the RAID config file then jam it into the ramdisk. Awesome! So, as expected after a reboot the kernel panicked telling me that it couldn't find (among other things) its' root FS.

Excellent so far, right? It gets better!

I figured "Ok, I'll just fire up sysresccd, chroot what I need, fix the mdadm.conf in the initramfs, "init 6" and all will be right with the world once again. WRONG! For whatever the reason, sysresccd decided to pick just a few random disks out of one of the arrays and, initiate a resync during boot before I had an interactive tty. Nice! Thanks for that. Fantastic!

The next thing that I did was stop the resync, and fix my mdadm.conf in initramfs. Reboot. Two of the three arrays came back up assembled and clean. One (RAID6) didn't. Here's what the disks in the affected array look like now.....

[root@SAN ~]# mdadm -E /dev/sdj
/dev/sdj:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

[root@SAN ~]# mdadm -E /dev/sdn
/dev/sdn:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : cb360579:c72e69f9:a378bc9e:f7498b21
           Name : SAN.iscsi.export:2 (local to host SAN.iscsi.net)
Creation Time : Tue Aug 12 00:35:45 2014
     Raid Level : raid6
   Raid Devices : 14

Avail Dev Size : 7813775024 (3725.90 GiB 4000.65 GB)
     Array Size : 46882646016 (44710.78 GiB 48007.83 GB)
Used Dev Size : 7813774336 (3725.90 GiB 4000.65 GB)
    Data Offset : 249856 sectors
   Super Offset : 8 sectors
   Unused Space : before=249768 sectors, after=12976 sectors
          State : clean
    Device UUID : 34a42a25:97152243:aec7c16e:663d6632

    Update Time : Fri May 15 12:33:05 2015
Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : a93512dd - correct
         Events : 28347

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 10
   Array State : AAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)

Yes, before you ask. There's more than two disks out of the fourteen that have borked superblocks like this.

My question now is, what should I do? Do I just lower the flag to half mast, have a moment of silence and start from scratch? Ideas?

___________________________________________________________________________
Philadelphia Linux Users Group -- http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug