Casey Bralla on 23 Sep 2010 15:55:04 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Updated Info on Disk Error in VM


Based on some suggestion made here (thanks, guys!), I've done some more 
investigation.

To recap my problem, I'm having disk errors crashing Virtual Machines.  This 
has happened using both VirtualBox and VMWare systems.  Both host and VMs are 
running Debian Lenny Stable.   RAM is somewhat constrained.    Errors occur 
sporadically, with no discernible pattern.   No errors are apparent in the 
host disk or RAM.

I've tried running a few diagnostics within the VM.Memtest86 runs flawlessly 
for 12+ hours.   The "inquisitor" diagnostic routine fails about 15% of the 
time when doing the disk R-W tests.  (It fails catastrophically, so it's tough 
to get a good handle on what exactly happened.)

I've noticed some odd error messages in some of the VMs, even when the VMs 
have not crashed.  Typical of these messages is is:


Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task 
abort! (sc=cba53580)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 38 65 b7 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53580)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task 
abort! (sc=cba53180)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 5e 35 37 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53180)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task 
abort! (sc=cba53480)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 5e 7d 3f 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53480)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task 
abort! (sc=cba53380)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 60 14 0f 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53380)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task 
abort! (sc=cba53280)
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 65 b3 bf 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53280)
Sep 21 06:25:52 VWeb01 kernel: [155632.916667] mptscsih: ioc0: attempting task 
abort! (sc=cba53080)
Sep 21 06:25:52 VWeb01 kernel: [155632.917021] sd 0:0:0:0: [sda] CDB: 
Write(10): 2a 00 00 96 c0 4f 00 00 08 00
Sep 21 06:25:52 VWeb01 kernel: [155632.917436] mptscsih: ioc0: task abort: 
SUCCESS (sc=cba53080)



Googling this error shows that I'm not unique in having this problem, although 
I found no solution other than reducing the load on the host disk system,  The 
"mptscsih" reference is a kernel module related to SCSI disk interface.  I 
found references to the problem as far back as 2006.  It seems like it affects 
Debian more than other distros.


So here's my theory:

There is some type of basic bug in the SCSI kernel module that is triggered by 
something in generic virtualization code.   This only become serious when the 
disk system is taxed.  (Otherwise, the disks errors, retries, and succeeds.)  
If I had a more powerful computer, or a fewer VMs, this problem probably would 
not have appeared.


So here are some of the things I think I will try:

1.  Switch kernels in the VM to i386 version instead of the installed i686 
version.
2.  Move the host disk system to a separate physical hard disk, so all the VM 
disks are on a completely separate disk.
3.  See if I can change the emulated SATA hardware so the VMs use a different 
SCSI driver (not mptscsih)
4.  Reduce the number of VMs by consolidating them wherever possible.



Anybody have any  more thoughts or suggestions?


TIA!



-- 

Casey Bralla
Chief Nerd in Residence
The NerdWorld Organisation
http://www.NerdWorld.org
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug