Bill Jonas on Fri, 10 Nov 2000 09:46:27 -0500 (EST) |
On Fri, Nov 10, 2000 at 09:31:00AM -0400, Ed G. wrote: > My disk has about 110,000 files in it. Can I safely conclude from > two known bad files that the disk is roasted or is this an > "acceptable" rate of failure? How common is it in your experience > for small bits of the disk drive to fail, rather than a single > catastrophic failure that leaves you unable to boot up? Hmm... I would suspect memory, first. These same kind of symptoms plagued us at my job with a server one time. I had installed mailman and left it over the weekend. When I (remotely) logged in Monday morning, the load was over 60, consisting of Mailman processes. I killed the errant processes and the load fell back to normal levels, but then the number of processes and the load started climbing back up. Later that afternoon, 'ps', along with a few other commands, started failing. The box turned out to be unbootable when we attempted rebooting in order to regain some semblance of control. It wasn't the disk, but it was a low-level problem reported by the BIOS during its POST before the video even initialized, and while we're a little uncertain as to the beep code, we're fairly sure that it was indicative of a memory problem somewhere. Here's what we think happened: A small part of memory went bad, and Mailman's cron job happened to get loaded into that part of memory, causing just enough of a change to make the process run away. This drove up the load and most likely the temperature, causing other components (or sections of memory) that were in a marginal state to fail. The corruption of a few binaries most likely resulted from those files being cached in a bad part of memory (I copied the binaries over from another Debian 2.2 box, and they worked fine then, although it was suggested by my boss that I could have simply touch'd the files). I'd try using memtest86 or some other memory tester before you replace the disk. -- Bill Jonas | "If you haven't gotten where you're going, bill@billjonas.com | you aren't there yet." --George Carlin http://www.billjonas.com/ | http://www.harrybrowne.org/ ______________________________________________________________________ Philadelphia Linux Users Group - http://www.phillylinux.org Announcements-http://lists.phillylinux.org/mail/listinfo/plug-announce General Discussion - http://lists.phillylinux.org/mail/listinfo/plug
|
|