Jeff Abrahamson on 31 Jul 2005 02:21:08 -0000 |
There definitely are other problems, but none is leaving a trace that I can find in the logs. Today gnome-terminal died once as well as gnome-session or my X server once. The weirdest today was an unknown program that apparently leaves a socket that xclock, xterm, gnumeric, and gnome-terminal but not mozilla or xload try to read from on launch. (I ran strace to see what was going on, and the four programs that failed all failed after doing an open on a socket and then hanging while trying to read from the descriptor.) This is very frustrating, since these crashes leave no trace besides the disappearing app. I have 10G of swap and 1G of RAM, so memory ought not be the problem. Another very odd thing is that the system feels substantially slower than another system I use that is technically slower. My slow system says: vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz stepping : 1 cpu MHz : 2989.741 cache size : 1024 KB The system that feels substantially faster says vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.66GHz stepping : 7 cpu MHz : 2657.778 cache size : 512 KB This is all so weird and very frustrating. -Jeff On Sat, Jul 30, 2005 at 03:31:12PM -0700, Carlos Konstanski wrote: > [140 lines, 834 words, 5522 characters] Top characters: e>rtnosi > > That's good. Perhaps it will stay up long enough to run mprime for a > few hours. This will test the CPU and memory. Maybe you fixed one > problem, but have others as well. > > When we got these problems, we were using the same CPU as usual (AMD > Athlon XP 2600), but a different board. Our normal boards were VIA > KT600, but these were nForce. Try a different board? > > Carlos > > On Sat, 30 Jul 2005, Jeff Abrahamson wrote: > > > Date: Sat, 30 Jul 2005 18:20:12 -0400 > > From: Jeff Abrahamson <jeff@purple.com> > > Reply-To: Philadelphia Linux User's Group Discussion List > > <plug@lists.phillylinux.org> > > To: plug@lists.phillylinux.org > > Subject: Re: [PLUG] APIC errors, weird crashes > > > > Specifying "nolapic" my machine does not boot, hanging trying to get > > an interrupt for hda (the boot drive). > > > > Specifying "noapic" seems to work. Some things are clearly better, > > such as not crashing when the monitor is put to sleep. I've still had > > a gnome-terminal die since reboot (only a few hours), but that's > > a better rate than previously, even if still unacceptable. > > > > The whole thing is very weird, since in general no entry appears in > > any log files. > > > > -Jeff > > > > > > On Wed, Jul 27, 2005 at 10:12:17PM -0700, Carlos Konstanski wrote: > >> [92 lines, 549 words, 3556 characters] Top characters: erntosia > >> > >> I dealt with this kind of thing before once, where we got 5 new boxes > >> that were not our usual build. We had to use "nolapic" in the kernel boot > >> arguments. These were single-processor machines. I forget which CPU > >> and motherboard they were, but I think the boards were Asus. > >> > >> With "nolapic" specified, these machines have been reliable, one even > >> running as a qa server for a huge tomcat app. > >> > >> Carlos > >> > >> On Thu, 28 Jul 2005, Jeff Abrahamson wrote: > >> > >>> Date: Thu, 28 Jul 2005 01:07:28 -0400 > >>> From: Jeff Abrahamson <jeff@purple.com> > >>> Reply-To: Philadelphia Linux User's Group Discussion List > >>> <plug@lists.phillylinux.org> > >>> To: PLUG <plug@lists.phillylinux.org> > >>> Subject: [PLUG] APIC errors, weird crashes > >>> > >>> I set up a new machine that has been getting weird crashes. (So far > >>> gnome terminal, mozilla, emacs21, exim4, X, clock applet, workspace > >>> applet, xterm, and ogg123 have crashed.) > >>> > >>> At first I thought this was APIC related, as I saw a few kernel log > >>> messages to this effect (see below). But, for the most part, the > >>> crashes have not been accompanied by anything tell-tale in the logs. > >>> It happens often enough to be annoying but not so often that it's > >>> feasible to sit around and watch it crash. > >>> > >>> I'm running Debian testing, but no updates have been posted for > >>> several days. I'm hoping it's not hardware. > >>> > >>> Any suggestions what might be going on or what to do? > >>> > >>> > >>> [ The remainder of this message details the APIC kernel error, for > >>> those who are interested and for posterity. Most can stop reading > >>> now. > >>> ] > >>> > >>> Here is an example of the APIC kernel error, but this is relatively rare: > >>> > >>> jeff@astra:kernel-source-2.6.8 $ dmesg | grep -i apic > >>> ENABLING IO-APIC IRQs > >>> init IO_APIC IRQs > >>> IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected. > >>> Using local APIC timer interrupts. > >>> calibrating APIC timer ... > >>> ACPI: Using IOAPIC for interrupt routing > >>> number of IO-APIC #2 registers: 24. > >>> testing the IO APIC....................... > >>> IO APIC #2...... > >>> ....... : physical APIC id: 02 > >>> ....... : IO APIC version: 0003 > >>> APIC error on CPU0: 00(60) > >>> <6>APIC error on CPU0: 60(60) > >>> APIC error on CPU0: 60(60) > >>> jeff@astra:kernel-source-2.6.8 $ uname -a > >>> Linux astra 2.6.8-2-686 #1 Thu May 19 17:53:30 JST 2005 i686 GNU/Linux > >>> jeff@astra:kernel-source-2.6.8 $ > >>> > >>> APIC errors, though, seem like they should only happen on SMP > >>> machines. (Cf. arch/i386/kernel/apic.c, function > >>> smp_error_interrupt().) My kernel is not compiled for SMP (see uname, > >>> above) and I only have one processor. > >>> > >>> jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo > >>> processor : 0 > >>> vendor_id : GenuineIntel > >>> cpu family : 15 > >>> model : 4 > >>> model name : Intel(R) Pentium(R) 4 CPU 3.00GHz > >>> [...] > >>> jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo | grep processor > >>> processor : 0 > >>> jeff@astra:kernel-source-2.6.8 $ > >>> > >>> BTW, I found a cool info site here: > >>> > >>> http://wiki.linuxquestions.org/wiki/APIC > >>> > >>> -- > >>> Jeff > >>> > >>> Jeff Abrahamson <http://www.purple.com/jeff/> +1 215/837-2287 > >>> GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B > >>> > >> ___________________________________________________________________________ > >> Philadelphia Linux Users Group -- http://www.phillylinux.org > >> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > >> General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug > >> > > > > -- > > Jeff > > > > Jeff Abrahamson <http://www.purple.com/jeff/> +1 215/837-2287 > > GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B > > > ___________________________________________________________________________ > Philadelphia Linux Users Group -- http://www.phillylinux.org > Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce > General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug > -- Jeff Jeff Abrahamson <http://www.purple.com/jeff/> +1 215/837-2287 GPG fingerprint: 1A1A BA95 D082 A558 A276 63C6 16BF 8C4C 0D1D AE4B Attachment:
signature.asc ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|