Lee H. Marzke on 8 Dec 2011 21:13:41 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Vmware oops

ESXi is now 100% a custom vmware kernel, the linux part was removed, and
it only has a small busybox module to run basic linux type commands. So
I would not expect you to do anything with a Linux recovery tool but damage.

As in my recent Plug ZFS talk ,  I discussed  a known issue with RAID-5 called the
'write-hole' \1.   So if you lose power to the array you can wind up with
silent data corruption where the array still provide correct data, but
the parity is scrambled.    Now when you actually lose a drive and
reconstruct the array,  the rebuild silently replaces your data with junk.

So the moral is:

- never use RAID-5 without NVRAM backup ( especially for random writes on ESX )

- backup VM's - this really very easy with VMware VDR backup now included
  free with entry level VMware Accelleration kits.

- Use NVRAM battery backed arrays ( Such as NetApp ) to minimize issues with
  losing power.

- Use ZFS, with the additional feature of background scrubbing to fix any
  silent data corruption in the background.

So if you read up more about this problem, it is expected that RAID-5
arrays that lose power may fail to rebuild when a disk actually fails
and this may be what happened to your system.  The only things done wrong in that case
are 1) use of RAID-5, and 2) not having backups or replica's.  If the sysadmins
didn't have control of that - perhaps the management should leave for not
spending the money on good arrays like the NetAPP,  or COW based filesytems
like NetAPP or ZFS (Nexenta) , and free VMware backup software included with Vsphere.


\1  http://blogs.oracle.com/bonwick/entry/raid_z

----- Original Message -----
> From: "jeff" <jeffv@op.net>
> To: "Philadelphia Linux User's Group Discussion List" <plug@lists.phillylinux.org>
> Sent: Thursday, 8 December, 2011 10:37:12 PM
> Subject: [PLUG] Vmware oops
> `Something happened' to RAID array on an ESXi 4 server [RAID5 4
> drives].
> Drives were reinitialized, server boots, ESXi comes up.
> We can find everything except the vmdk we're looking for.
> Recovery software [run from its own OS] finds tons of files that were
> inside the vmdk, but most are trashed.
> Vmware consultants stated that this happens a lot - when storage
> fails,
> vmdk's get hosed.
> Of course there's no backup.
> Do I even try booting with linux and running any of our tools or
> should
> I advise some people to start updating their resumes?  I don't
> believe
> there's anything that runs inside the ESXi shell [ctl-alt-f1], is
> there?
> I love hearing coworkers say, "UGH. This is LINUX.  How do I
> navigate?"
> Thanks,
> --Perplexed in PA
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug