Bill Jonas on Fri, 10 Nov 2000 09:46:27 -0500 (EST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Has my disk gone bad?


On Fri, Nov 10, 2000 at 09:31:00AM -0400, Ed G. wrote:
> My disk has about 110,000 files in it.  Can I safely conclude from 
> two known bad files that the disk is roasted or is this an 
> "acceptable" rate of failure?  How common is it in your experience 
> for small bits of the disk drive to fail, rather than a single 
> catastrophic failure that leaves you unable to boot up?

Hmm...  I would suspect memory, first.  These same kind of symptoms plagued
us at my job with a server one time.  I had installed mailman and left it
over the weekend.  When I (remotely) logged in Monday morning, the load was
over 60, consisting of Mailman processes.  I killed the errant processes
and the load fell back to normal levels, but then the number of processes
and the load started climbing back up.  Later that afternoon, 'ps', along
with a few other commands, started failing.  The box turned out to be
unbootable when we attempted rebooting in order to regain some semblance of
control.  It wasn't the disk, but it was a low-level problem reported by
the BIOS during its POST before the video even initialized, and while we're
a little uncertain as to the beep code, we're fairly sure that it was
indicative of a memory problem somewhere.

Here's what we think happened: A small part of memory went bad, and
Mailman's cron job happened to get loaded into that part of memory, causing
just enough of a change to make the process run away.  This drove up the
load and most likely the temperature, causing other components (or sections
of memory) that were in a marginal state to fail.  The corruption of a few
binaries most likely resulted from those files being cached in a bad part
of memory (I copied the binaries over from another Debian 2.2 box, and they
worked fine then, although it was suggested by my boss that I could have
simply touch'd the files).

I'd try using memtest86 or some other memory tester before you replace the
disk.

-- 
Bill Jonas                | "If you haven't gotten where you're going,
bill@billjonas.com        |  you aren't there yet." --George Carlin
http://www.billjonas.com/ |  http://www.harrybrowne.org/


______________________________________________________________________
Philadelphia Linux Users Group       -      http://www.phillylinux.org
Announcements-http://lists.phillylinux.org/mail/listinfo/plug-announce
General Discussion  -  http://lists.phillylinux.org/mail/listinfo/plug