Carlos Konstanski on 30 Jul 2005 22:31:36 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] APIC errors, weird crashes


That's good.  Perhaps it will stay up long enough to run mprime for a
few hours.  This will test the CPU and memory.  Maybe you fixed one
problem, but have others as well.

When we got these problems, we were using the same CPU as usual (AMD
Athlon XP 2600), but a different board.  Our normal boards were VIA
KT600, but these were nForce.  Try a different board?

Carlos

On Sat, 30 Jul 2005, Jeff Abrahamson wrote:

Date: Sat, 30 Jul 2005 18:20:12 -0400
From: Jeff Abrahamson <jeff@purple.com>
Reply-To: Philadelphia Linux User's Group Discussion List
    <plug@lists.phillylinux.org>
To: plug@lists.phillylinux.org
Subject: Re: [PLUG] APIC errors, weird crashes

Specifying "nolapic" my machine does not boot, hanging trying to get
an interrupt for hda (the boot drive).

Specifying "noapic" seems to work.  Some things are clearly better,
such as not crashing when the monitor is put to sleep.  I've still had
a gnome-terminal die since reboot (only a few hours), but that's
a better rate than previously, even if still unacceptable.

The whole thing is very weird, since in general no entry appears in
any log files.

-Jeff


On Wed, Jul 27, 2005 at 10:12:17PM -0700, Carlos Konstanski wrote:
  [92 lines, 549 words, 3556 characters]  Top characters: erntosia

I dealt with this kind of thing before once, where we got 5 new boxes
that were not our usual build.  We had to use "nolapic" in the kernel boot
arguments.  These were single-processor machines.  I forget which CPU
and motherboard they were, but I think the boards were Asus.

With "nolapic" specified, these machines have been reliable, one even
running as a qa server for a huge tomcat app.

Carlos

On Thu, 28 Jul 2005, Jeff Abrahamson wrote:

Date: Thu, 28 Jul 2005 01:07:28 -0400
From: Jeff Abrahamson <jeff@purple.com>
Reply-To: Philadelphia Linux User's Group Discussion List
    <plug@lists.phillylinux.org>
To: PLUG <plug@lists.phillylinux.org>
Subject: [PLUG] APIC errors, weird crashes

I set up a new machine that has been getting weird crashes.  (So far
gnome terminal, mozilla, emacs21, exim4, X, clock applet, workspace
applet, xterm, and ogg123 have crashed.)

At first I thought this was APIC related, as I saw a few kernel log
messages to this effect (see below).  But, for the most part, the
crashes have not been accompanied by anything tell-tale in the logs.
It happens often enough to be annoying but not so often that it's
feasible to sit around and watch it crash.

I'm running Debian testing, but no updates have been posted for
several days.  I'm hoping it's not hardware.

Any suggestions what might be going on or what to do?


[ The remainder of this message details the APIC kernel error, for those who are interested and for posterity. Most can stop reading now. ]

Here is an example of the APIC kernel error, but this is relatively rare:

   jeff@astra:kernel-source-2.6.8 $ dmesg | grep -i apic
   ENABLING IO-APIC IRQs
   init IO_APIC IRQs
    IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
   Using local APIC timer interrupts.
   calibrating APIC timer ...
   ACPI: Using IOAPIC for interrupt routing
   number of IO-APIC #2 registers: 24.
   testing the IO APIC.......................
   IO APIC #2......
   .......    : physical APIC id: 02
   .......     : IO APIC version: 0003
   APIC error on CPU0: 00(60)
    <6>APIC error on CPU0: 60(60)
   APIC error on CPU0: 60(60)
   jeff@astra:kernel-source-2.6.8 $ uname -a
   Linux astra 2.6.8-2-686 #1 Thu May 19 17:53:30 JST 2005 i686 GNU/Linux
   jeff@astra:kernel-source-2.6.8 $

APIC errors, though, seem like they should only happen on SMP
machines.  (Cf. arch/i386/kernel/apic.c, function
smp_error_interrupt().)  My kernel is not compiled for SMP (see uname,
above) and I only have one processor.

   jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo
   processor       : 0
   vendor_id       : GenuineIntel
   cpu family      : 15
   model           : 4
   model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
   [...]
   jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo | grep processor
   processor       : 0
   jeff@astra:kernel-source-2.6.8 $

BTW, I found a cool info site here:

   http://wiki.linuxquestions.org/wiki/APIC

--
Jeff

Jeff Abrahamson  <http://www.purple.com/jeff/>    +1 215/837-2287
GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


-- Jeff

Jeff Abrahamson  <http://www.purple.com/jeff/>    +1 215/837-2287
GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug