Casey Bralla on 23 Sep 2010 15:55:04 -0700 |
Based on some suggestion made here (thanks, guys!), I've done some more investigation. To recap my problem, I'm having disk errors crashing Virtual Machines. This has happened using both VirtualBox and VMWare systems. Both host and VMs are running Debian Lenny Stable. RAM is somewhat constrained. Errors occur sporadically, with no discernible pattern. No errors are apparent in the host disk or RAM. I've tried running a few diagnostics within the VM.Memtest86 runs flawlessly for 12+ hours. The "inquisitor" diagnostic routine fails about 15% of the time when doing the disk R-W tests. (It fails catastrophically, so it's tough to get a good handle on what exactly happened.) I've noticed some odd error messages in some of the VMs, even when the VMs have not crashed. Typical of these messages is is: Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task abort! (sc=cba53580) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 38 65 b7 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: SUCCESS (sc=cba53580) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task abort! (sc=cba53180) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 5e 35 37 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: SUCCESS (sc=cba53180) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task abort! (sc=cba53480) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 5e 7d 3f 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: SUCCESS (sc=cba53480) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task abort! (sc=cba53380) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 60 14 0f 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: SUCCESS (sc=cba53380) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: attempting task abort! (sc=cba53280) Sep 21 06:25:52 VWeb01 kernel: [155632.915005] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 65 b3 bf 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.915005] mptscsih: ioc0: task abort: SUCCESS (sc=cba53280) Sep 21 06:25:52 VWeb01 kernel: [155632.916667] mptscsih: ioc0: attempting task abort! (sc=cba53080) Sep 21 06:25:52 VWeb01 kernel: [155632.917021] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 96 c0 4f 00 00 08 00 Sep 21 06:25:52 VWeb01 kernel: [155632.917436] mptscsih: ioc0: task abort: SUCCESS (sc=cba53080) Googling this error shows that I'm not unique in having this problem, although I found no solution other than reducing the load on the host disk system, The "mptscsih" reference is a kernel module related to SCSI disk interface. I found references to the problem as far back as 2006. It seems like it affects Debian more than other distros. So here's my theory: There is some type of basic bug in the SCSI kernel module that is triggered by something in generic virtualization code. This only become serious when the disk system is taxed. (Otherwise, the disks errors, retries, and succeeds.) If I had a more powerful computer, or a fewer VMs, this problem probably would not have appeared. So here are some of the things I think I will try: 1. Switch kernels in the VM to i386 version instead of the installed i686 version. 2. Move the host disk system to a separate physical hard disk, so all the VM disks are on a completely separate disk. 3. See if I can change the emulated SATA hardware so the VMs use a different SCSI driver (not mptscsih) 4. Reduce the number of VMs by consolidating them wherever possible. Anybody have any more thoughts or suggestions? TIA! -- Casey Bralla Chief Nerd in Residence The NerdWorld Organisation http://www.NerdWorld.org ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|