Richard Freeman on 22 Sep 2010 07:50:21 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Need Troubleshooting Advice: Disk Errors in Virtual Machines

On 09/21/2010 06:04 PM, Casey Bralla wrote:
> Anybody have any thoughts on what might cause these disk errors, or what I 
> might check?

Well, I guess we can start by just listing the various possibilities:

1.  Kernel issue in the guest that causes the panic - perhaps no real
fault in any lower layer.
2.  Error in the VM code, that causes an apparent disk error to the guest.
3.  Error in the host software (kernel, other processes, etc) that
interacts with the VM to manifest itself in a disk error in the guest.
4.  Error in the host hardware that causes a disk error in the guest.

Note that host hardware errors could be ANYTHING, since the disk is
virtual.  A disk error in the guest might have nothing to do with a
physical disk error, unless maybe the guest is directly mapped to a
physical disk.

RAM errors of course jump to mind as a possibility, perhaps only under
load or long uptime (temperature/etc).  Power supply problems could also
cause any number of glitches.  The problem could be almost anything.

If the host isn't generating any kind of error I tend to doubt that the
issue is host software, but you can't rule that out.  Switching to a
different platform (xen/etc) would probably address any issue in #2-3
above, and perhaps even mitigate #4 (RAM/resource use patterns will be

These kinds of gremlins can be really hard to track down.  I was having
intermitent problems on my server at home, and thought they were fixed.
 They started coming back right around summer, which made me think heat.
 This is an older server, and I opened it up and really cleaned out the
more sensitive components with compressed air (heat sinks/etc), turned
on cpu scaling (to reduce heat generation), and haven't had problems
since even with chromium builds.

As far as load/etc goes - that clearly can make bugs in the VM or other
components more apparent.  However, well-written software should not
crash at any level of load - the VM should just be slow.  If the OS
panics when you hit a load of 75, then there is a bug in the OS or at a
lower layer.  Of course, if you can avoid these kinds of bugs, so much
the better.

Good luck with it!

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --