[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: md(adm) ... Re: Next meeting July 26th 2020, Tomorrow!



Yeah, I'm running late.  

I have achieved caffeination and will be connecting soon.  

It will take me some time to get the system up before I 
can start on the md-raid.  

Thomas

On Sun, Jul 26, 2020 at 3:33 AM Michael Paoli <Michael.Paoli@cal.berkeley.edu> wrote:
> From: "tom r lopes" <tomrlopes@gmail.com>
> Subject: Next meeting July 26th 2020, Tomorrow!
> Date: Sat, 25 Jul 2020 14:00:49 -0700

> 4th Sunday virtual meeting 11 am
>
> meet.jit.si/berkeleylug
>
> (no typo this time :-)
>
> I'm hoping to work on a file server running on a sbc.
> Plan was to work on this last week for the PI meeting but
> I couldn't find the SATA hat for my NanoPi.  Now I have it.
> So I will install Armbian and add two 1TB and combine them
> in md-raid.
>
> Hope to see you there,
>
> Thomas

Let me know if you need any md(adm) assistance.

I quite recently had need/occasion to snag copy of (the very top bit):
file (large, about 5GiB) on
filesystem on
LVM LV on VG on PV on
partition on
VM raw format disk image file on
filesystem on
md raid1 on
(pair of) LVM LV on (each their own) VG on PV on
(pair of) partitions (one each) on
2 physical drives on
physical host
and without network (only virtual console) access to the VM.

The topmost bit being a file on filesystem within a Virtual Machine (VM)
where that VM's drive storage was the aforementioned VM raw format disk
image file, and needed to snag copy of topmost referenced (and large -
~5GiB) file from within the VM - with no network (only virtual serial
console) access to the VM.  And, "of course", to make it more
interesting, has to be consistent/recoverable, and conflict with neither
the ongoing use of the VM nor the physical host, and all while the VM
and physical host remained up and running.  So, among other bits,
to do that, took a LV snapshot of the lowest level LV,
that then gave point-in-time snapshot of one of the two md raid1
constituent member devices under the lowest raid1 shown in that stack.
"Of course" that immediately has UUID conflict potential - so wiped that
metadata to eliminate that hazard, then to be able to make use of the
data, took that snapshot, and turned it into an md raid1 device - being
careful to use the same metadata format - notably so it would be same
size of earlier metadata and not stomp on any data that would be within
the md device at the md device level.  Also, to make it the same(ish),
and not complain about missing device, created it as md raid1 ... but
with single member device and configured for just one device.  Once that
was done, had recoverable (point-in-time snapshot from live) filesystem.
Again to thwart potential conflicts, changed UUID of that filesystem,
then mounted it nosuid,nodev.  It needed to be mounted rw, due to some
bits needing teensy bit 'o write further up the chain to metadata.
Then once that was mounted, losetup and and partx -a to get to
applicable partition within the file on that filessytem within the
drive image.  Was then able to bring the VG from that PV (activate)
onto the physical (were the UUID and/or VG name conflicting with any on
the physical host, there would've been some other steps needed too).
 From there, mounted that filesystem ro(,nosuid,nodev) (but device under
it again - rw needed - as filesystem state was recoverable but not
clean) provided by that LV.  Was then able to access and copy the
desired file from that filesystem - now seen via snapshot and some
medatada mucking about, on the physical host, whereas before it was
effectively only accessible on the VM - and all that with the VM and
phyisical still up and running throughout.

Yeah, I didn't design it like that.  That's the way some particular
vendor's "appliance" devices structure things and manage their VMs on
the device.

Had another occasion some while back, to fix rather a mess on quite same
type of device.  There were two physical hard drives ... lots of RAID-1.
So far so good.  But, no backups ("oops").  And, one of the two hard
drives had failed long ago ("oops"), and not been replaced ("oops").
And now the one hard drive that wasn't totally dead was giving
hard errors - notably unrecoverable read errors on a particular sector
... uh oh.

Well, the vendor and their support, and the appliance were too
stupid(/smart?) to be able to fix/recover that mess.  But I didn't give
up so easily.  I drilled all the way down to isolate exactly
where the failed sector was, and exactly what it was/wasn't being used
by.  Turned out it wasn't holding any data proper, but just
recoverable/rewritable metadata - or allocated but not used data.
So, I did an operation to rewrite that wee bit 'o data.
The drive, being "smart enough", since it was unrecoverable read
sector, but got a write operation to it then, automagically remapped and
wrote it out.  At that point drive was operational (enough) again -
could read the entire drive with no read errors - and was then able to
successfully mirror to a good replacement for the other failed drive
(before that all such attempts failed, notably due to the hard read
error).  Anyway, successfully and fully recovered what the vendor's
appliance and the vendor's support could not recover, where they were
saying it would have to be reinstalled from scratch.  Oh, and also,
after the successful remirroring - also got the drive that was having
the sector hard read error replaced, then remirrored onto the
replacement drive, thus ending fully recovered onto two newly replaced
good drives.  Not the first time I've recovered RAID-1 when it was
discovered there were problems when the 2nd drive started failing
after the 1st drive had long since totally died and not been
earlier replaced.  "Of course" it's highly preferable to not get into
such situations ... have good (and validated) backups, and replace
failed drives in redundant arrays as soon as feasible - especially
before things start to hard fail without redundancy.

--
You received this message because you are subscribed to the Google Groups "BerkeleyLUG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylug+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/berkeleylug/20200726033304.13885iwu7mlx9cdc%40webmail.rawbw.com.

--
You received this message because you are subscribed to the Google Groups "BerkeleyLUG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to berkeleylug+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/berkeleylug/CAGpvfspkuKv_hC1w5zUsx%2BAvj0Lw6mOMOA4GfjaK5vYkQWZfVg%40mail.gmail.com.