Eric on 31 Dec 2009 16:31:17 -0800

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Update: How To Diagnose System Instability?


That's good news.

While I was working as an Electrical Engineer for General Electric I
learned that about 90% of the equipment problems we experienced were
directly traceable to connectors. Cable connectors, card edge
connectors, chip sockets, and the like were the bane of our existence. I
suspect that it's worse today because of the lower power / higher
impedance circuits which have less ability to "punch through" an oxide
layer on a contact.

Maybe this is an excuse to tear down and rebuild our computers every year?



Casey Bralla wrote:
> Thanks to the group for the good suggestions and comments (even Eric, who had 
> to duck for suggesting I had worn out my system <grin>)
> Anyway, I think I figured it out and thought I'd let the group know what I 
> did.
> I discovered that the memtest program will let you know which bank of DIMMs is 
> giving the error.  In my case, the offending DIMM was one that I had _not_ 
> removed.  (You may remember that removing half the DIMMs made the problem 
> better, but did not eliminate it.)
> So I removed the offending DIMMs and put the other memory back into it's 
> **original sockets**.  This seemed to work fine.  I had no crashes, and ran 
> the memtest program for almost 8 hours with zero errors.
> I then tried putting the offending DIMMs by swapping the "bad" DIMMs for teh 
> "good" DIMMs and putting them into sockets where the good memory was running.  
> Although I didn't run my tests for hours this time, everything seemed good, so 
> I threw caution to the wind and put the "good" DIMMs into the "bad" sockets 
> that had shown errors previously.   
> It all worked.  I've been recompiling my entire Gentoo system for about 6 
> hours without any problems.
> So here is my conclusion:
> I had faulty electrical contacts on one of the DIMMs, which got better when I 
> mechanically stressed the motherboard by removing the other memory DIMMs.  
> (This is why the problem got better when I removed half the RAM.)  Removing 
> the memory chips and re-seating them corrected the contact problem.
> On Wednesday 30 December 2009 9:57:10 am Casey Bralla wrote:
>> Lately my computer has started showing signs of instability.   I get rather
>> frequent SegFaults which pop up windows in KDE and (usually) allow me to
>> restart applications.  The computer is about 8 months old, and had been
>> completely fine up until about a month ago.
>> At first I thought this was just goofiness with the latest version of KDE,
>>  but after I removed half my RAM, the stability improved tremendously.  
>>  Also, my BIOS includes the memtest86+ program and it shows errors
>>  (sometimes).
>> I use Gentoo, so am compiling a lot of software.  I've read that
>>  recompiling the kernel is a great way to identify hardware problems since
>>  it exercises the whole system so thoroughly. <sigh>
>> So I'm struggling to identify which component(s) of my system are causing
>>  the instability.   Here are the things I've tried, with very limited
>>  success:
>>  - run memtest86+.  (Sometimes shows errors, but sometimes can run
>>  overnight without any problems)
>>  - Removed half the 4 memory chips. [I had 8 GBytes, so could spare the
>>  RAM] (This helped, but not 100%)
>>  - Used to BIOS to restrict my 4-core AMD Phenom to a single core (no
>> improvement)
>>  - Used the BIOS to raise the RAM voltage slightly (no apparent effect)
>>  - Used the BIOS to slow the DDR-800 chips to DDR-400 speeds (no apparent
>> effect)
>> Here are the things I still want to try:
>>  - Swap the remaining RAM chips into the other sockets
>>  - Swap out the power supply
>> Beyond these tasks, does anybody have any suggestions to help identify the
>> cause of the instability?   I would hate to have to replace the
>>  motherboard. And how would I "prove" that the motherboard is the problem?
>> BTW, here are the specs:
>> Motherboard: BIOStar TA790GX
>> CPU: AMD Phenom II X4 940
>> RAM: Crucial "Ballistix Tracer" 2 sets of 2@2 GBytes each
>> Disk: Western Digital SATA
>> Video: Sparkle GForce 7600 PCI Express
>> Power Supply: Raidmax 730 Watt

#  Eric Lucas
#                "Oh, I have slipped the surly bond of earth
#                 And danced the skies on laughter-silvered wings...
#                                        -- John Gillespie Magee Jr

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --