Casey Bralla on 31 Dec 2009 15:17:59 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Update: How To Diagnose System Instability?

Thanks to the group for the good suggestions and comments (even Eric, who had 
to duck for suggesting I had worn out my system <grin>)

Anyway, I think I figured it out and thought I'd let the group know what I 

I discovered that the memtest program will let you know which bank of DIMMs is 
giving the error.  In my case, the offending DIMM was one that I had _not_ 
removed.  (You may remember that removing half the DIMMs made the problem 
better, but did not eliminate it.)

So I removed the offending DIMMs and put the other memory back into it's 
**original sockets**.  This seemed to work fine.  I had no crashes, and ran 
the memtest program for almost 8 hours with zero errors.

I then tried putting the offending DIMMs by swapping the "bad" DIMMs for teh 
"good" DIMMs and putting them into sockets where the good memory was running.  
Although I didn't run my tests for hours this time, everything seemed good, so 
I threw caution to the wind and put the "good" DIMMs into the "bad" sockets 
that had shown errors previously.   

It all worked.  I've been recompiling my entire Gentoo system for about 6 
hours without any problems.

So here is my conclusion:

I had faulty electrical contacts on one of the DIMMs, which got better when I 
mechanically stressed the motherboard by removing the other memory DIMMs.  
(This is why the problem got better when I removed half the RAM.)  Removing 
the memory chips and re-seating them corrected the contact problem.

On Wednesday 30 December 2009 9:57:10 am Casey Bralla wrote:
> Lately my computer has started showing signs of instability.   I get rather
> frequent SegFaults which pop up windows in KDE and (usually) allow me to
> restart applications.  The computer is about 8 months old, and had been
> completely fine up until about a month ago.
> At first I thought this was just goofiness with the latest version of KDE,
>  but after I removed half my RAM, the stability improved tremendously.  
>  Also, my BIOS includes the memtest86+ program and it shows errors
>  (sometimes).
> I use Gentoo, so am compiling a lot of software.  I've read that
>  recompiling the kernel is a great way to identify hardware problems since
>  it exercises the whole system so thoroughly. <sigh>
> So I'm struggling to identify which component(s) of my system are causing
>  the instability.   Here are the things I've tried, with very limited
>  success:
>  - run memtest86+.  (Sometimes shows errors, but sometimes can run
>  overnight without any problems)
>  - Removed half the 4 memory chips. [I had 8 GBytes, so could spare the
>  RAM] (This helped, but not 100%)
>  - Used to BIOS to restrict my 4-core AMD Phenom to a single core (no
> improvement)
>  - Used the BIOS to raise the RAM voltage slightly (no apparent effect)
>  - Used the BIOS to slow the DDR-800 chips to DDR-400 speeds (no apparent
> effect)
> Here are the things I still want to try:
>  - Swap the remaining RAM chips into the other sockets
>  - Swap out the power supply
> Beyond these tasks, does anybody have any suggestions to help identify the
> cause of the instability?   I would hate to have to replace the
>  motherboard. And how would I "prove" that the motherboard is the problem?
> BTW, here are the specs:
> Motherboard: BIOStar TA790GX
> CPU: AMD Phenom II X4 940
> RAM: Crucial "Ballistix Tracer" 2 sets of 2@2 GBytes each
> Disk: Western Digital SATA
> Video: Sparkle GForce 7600 PCI Express
> Power Supply: Raidmax 730 Watt


Casey Bralla

Chief Nerd in Residence
The NerdWorld Organisation
Philadelphia Linux Users Group         --
Announcements -
General Discussion  --