eric@lucii.org on Sat, 19 Jul 2003 17:56:27 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Mysterious system freeze


                       - Lessons learned - 

Well, thanks to those who replied to this email... I really had a tough
time with this until last week.  The "freezes" were so random and so
rare (about once a week at the _most_) that I was at a loss to pinpoint
a single cause.

Then, last week I went away for 2 days and left the computer off.  When
I returned and powered it up there was a green LED but no activity... no
video, no whirring of disks... zilch.

Dissassembly revealed a foul oder from the power supply which leads me
to believe that it was "going - going - GONE!".  I replaced it on Friday
the 11th of July and so far (knock on wood) there is no sign of the
"freezes".  

Had this not happened, I was preparing (if the problem did not cease) to
replace the memory and the power supply as a SWAG* solution.

Thanks again.  


Eric

* "Scientific Wild Ass Guess"

On Mon, Apr 14, 2003 at 12:22:01AM -0400, Martin DiViaio wrote:
> 
> I had a similar problem with a server I use to work on. I eventually
> tracked it down to a problem with one of the device modules and SMP
> support. I recompiled the kernel without SMP and the problem went away.
> 
> --
> GPG Fingerprint: C900 18EF 0C36 4EAF A93C  F073 85D4 8B3C F3D8 077B
> 
> 
> On the 13th day of April in the year 2003 you wrote:
> 
> > Date: Sun, 13 Apr 2003 17:13:46 -0400
> > From: "eric@lucii.org" <eric@lucii.org>
> > To: PLUG <plug@lists.phillylinux.org>
> > X-Spam-Status: No, hits=-4.3 required=5.0
> > 	tests=SIGNATURE_LONG_DENSE,SPAM_PHRASE_00_01,
> > 	      TO_LOCALPART_EQ_REAL,USER_AGENT,USER_AGENT_MUTT
> > 	version=2.44
> > Subject: [PLUG] Mysterious system freeze
> > 
> > Occasionally in the past three months, my primary Linux workstation
> > (sol), will, for no apparent reason, stop functioning.  
> > 
> > It would stop responding to commands, not start new logins, xterms, or
> > shells and eventually I would have to press the reset button or power
> > off.  It might do this once every two weeks or so (quite infrequently).
> > 
> > Friday, it went one step further and simply "froze"  No mouse movement,
> > no keyboard input - cannot even switch to a VC.   I tried to ssh to the
> > workstation from another computer to look at the logs but there was no
> > reponse.  It would not even respond to a "ping".
> > 
> > Dmesg appears to only hold current information.  The (I hope) relevant
> > portion of /var/log/messages is at the bottom of this message.  Note 
> > the odd time shift (syslogd restarts 37 minutes BEFORE the prevous 
> > crontab entry :-P )
> > 
> > One thing I do notice is that in the reboot process the reiser fsck
> > finds a number of things to correct:
> > 
> > > clm-6006: writing inode 7644 on readonly FS
> > > clm-6006: writing inode 7644 on readonly FS
> > > clm-6006: writing inode 7644 on readonly FS
> > > clm-6006: writing inode 7644 on readonly FS
> > > clm-6006: writing inode 7644 on readonly FS
> > etc. (about 300+ times). 
> > 
> > (This is also visible in the /var/log/messages snippet below.)
> > 
> > I don't know if that is a problem, or the result of the problem.
> > 
> > The system is SuSE 7.3 with some upgrades from Yast Online Update.  It's
> > a 800 MHz Athlon system with 256 Meg RAM.  It runs on a three year old
> > Fujitsu 18 GB SCSI hard drive divided up like this (in relevant part):
> > 
> > [eric@sol eric]$ df -h
> > Filesystem            Size  Used Avail Use% Mounted on
> > /dev/sda6             5.6G  4.4G  1.2G  78% /
> > /dev/sda2              23M  3.9M   17M  18% /boot
> > /dev/sda7             5.5G  3.1G  2.1G  59% /home
> > shmfs                 125M     0  124M   0% /dev/shm
> > 
> > Any help/suggestions/hints are appreciated.
> > 
> > Eric
> > 
> > 
> > --------------- portion of /var/log/messages follows --------------
> > Apr 11 21:41:47 sol PAM-unix2[9477]: session started for user eric, service xdm 
> > Apr 11 21:50:00 sol /USR/SBIN/CRON[12698]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 21:59:00 sol /USR/SBIN/CRON[12710]: (root) CMD ( rm -f /var/spool/cron/lastrun/cron.hourly) 
> > Apr 11 22:00:00 sol /USR/SBIN/CRON[12715]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:10:00 sol /USR/SBIN/CRON[12758]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:20:01 sol /USR/SBIN/CRON[12816]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:30:00 sol /USR/SBIN/CRON[12876]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:40:00 sol /USR/SBIN/CRON[12938]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:03:06 sol syslogd 1.4.1: restart.
> > Apr 11 22:03:09 sol webmin[343]: Webmin starting 
> > Apr 11 22:03:11 sol kernel: klogd 1.4.1, log source = /proc/kmsg started.
> > Apr 11 22:03:11 sol kernel: Inspecting /boot/System.map-2.4.10-4GB
> > Apr 11 22:03:11 sol kernel: Loaded 11709 symbols from /boot/System.map-2.4.10-4GB.
> > Apr 11 22:03:11 sol kernel: Symbols match kernel version 2.4.10.
> > Apr 11 22:03:11 sol kernel: Loaded 439 symbols from 13 modules.
> > Apr 11 22:03:11 sol kernel: g inode 7644 on readonly FS
> > Apr 11 22:03:11 sol kernel: clm-6006: writing inode 7644 on readonly FS
> > Apr 11 22:03:11 sol last message repeated 234 times
> > Apr 11 22:03:11 sol kernel: clm-6005: writing inode 7644 on readonly FS
> > Apr 11 22:03:11 sol kernel: clm-6006: writing inode 7644 on readonly FS
> > Apr 11 22:03:11 sol last message repeated 110 times
> > Apr 11 22:03:11 sol kernel: clm-6005: writing inode 7644 on readonly FS
> > Apr 11 22:03:11 sol kernel: ip_tables: (c)2000 Netfilter core team
> > Apr 11 22:03:11 sol kernel: ip_conntrack (2047 buckets, 16376 max)
> > Apr 11 22:03:11 sol kernel: PCI: Found IRQ 10 for device 00:09.0
> > Apr 11 22:03:11 sol kernel: 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
> > Apr 11 22:03:11 sol kernel: 00:09.0: 3Com PCI 3c905B Cyclone 100baseTx at 0xdc00. Vers LK1.1.16
> > Apr 11 22:03:11 sol kernel: IPv6 v0.8 for NET4.0
> > Apr 11 22:03:11 sol kernel: IPv6 over IPv4 tunneling driver
> > Apr 11 22:03:16 sol kernel: eth0: no IPv6 routers present
> > Apr 11 22:03:41 sol /usr/sbin/cron[778]: (CRON) STARTUP (fork ok) 
> > Apr 11 22:03:45 sol kernel: isapnp: Scanning for PnP cards...
> > Apr 11 22:03:45 sol kernel: isapnp: Calling quirk for 01:00
> > Apr 11 22:03:45 sol kernel: isapnp: SB audio device quirk - increasing port range
> > Apr 11 22:03:45 sol kernel: isapnp: Card 'Creative ViBRA16X PnP'
> > Apr 11 22:03:45 sol kernel: isapnp: 1 Plug & Play card detected total
> > Apr 11 22:03:50 sol kernel: nvidia: loading NVIDIA Linux x86 NVdriver Kernel Module  1.0-3123  Tue Aug 27 15:56:48 PDT 2002
> > Apr 11 22:03:51 sol kernel: Linux agpgart interface v0.99 (c) Jeff Hartmann
> > Apr 11 22:03:51 sol kernel: agpgart: Maximum main memory to use for agp memory: 203M
> > Apr 11 22:03:51 sol kernel: agpgart: Detected Via Apollo Pro KT133 chipset
> > Apr 11 22:03:51 sol kernel: agpgart: AGP aperture is 64M @ 0xd0000000
> > Apr 11 22:03:51 sol kernel: NVRM: AGPGART: VIA Apollo KT133 chipset
> > Apr 11 22:03:51 sol kernel: NVRM: AGPGART: aperture: 64M @ 0xd0000000
> > Apr 11 22:03:51 sol kernel: NVRM: AGPGART: aperture mapped from 0xd0000000 to 0xd3adf000
> > Apr 11 22:03:51 sol kernel: NVRM: AGPGART: mode 2x
> > Apr 11 22:03:51 sol kernel: NVRM: AGPGART: allocated 16 pages
> > Apr 11 22:03:56 sol kernel: Switching off penguin.
> > Apr 11 22:08:33 sol kdm[906]: Abnormal helper termination, code 1, signal 0
> > Apr 11 22:08:33 sol kdm[906]: fatal IO error 32 (Broken pipe)
> > Apr 11 22:08:33 sol kernel: NVRM: AGPGART: freed 16 pages
> > Apr 11 22:08:33 sol kernel: NVRM: AGPGART: backend released
> > Apr 11 22:08:34 sol kernel: NVRM: AGPGART: VIA Apollo KT133 chipset
> > Apr 11 22:08:34 sol kernel: NVRM: AGPGART: aperture: 64M @ 0xd0000000
> > Apr 11 22:08:34 sol kernel: NVRM: AGPGART: aperture mapped from 0xd0000000 to 0xd3adf000
> > Apr 11 22:08:34 sol kernel: NVRM: AGPGART: mode 2x
> > Apr 11 22:08:34 sol kernel: NVRM: AGPGART: allocated 16 pages
> > Apr 11 22:10:00 sol /USR/SBIN/CRON[1054]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:20:00 sol /USR/SBIN/CRON[1085]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > Apr 11 22:30:00 sol /USR/SBIN/CRON[1093]: (root) CMD ( /usr/lib/sa/sa1      ) 
> > 
> > 
> > 
> 
> _________________________________________________________________________
> Philadelphia Linux Users Group        --       http://www.phillylinux.org
> Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.netisland.net/mailman/listinfo/plug
> 
> 

-- 
------------------------------------------------------------------------
#   Eric Lucas 
========================================================================
Today, wanting someone else's money is called "need", wanting to keep
your own money is called "greed", and "compassion" is when politicians
arrange the transfer. 
  -- Joseph Sobran
_________________________________________________________________________
Philadelphia Linux Users Group        --       http://www.phillylinux.org
Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
General Discussion  --   http://lists.netisland.net/mailman/listinfo/plug