Jeff Abrahamson on 31 Jul 2005 02:21:08 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] APIC errors, weird crashes


There definitely are other problems, but none is leaving a trace that
I can find in the logs.  Today gnome-terminal died once as well as
gnome-session or my X server once.  The weirdest today was an unknown
program that apparently leaves a socket that xclock, xterm, gnumeric,
and gnome-terminal but not mozilla or xload try to read from on
launch.  (I ran strace to see what was going on, and the four programs
that failed all failed after doing an open on a socket and then
hanging while trying to read from the descriptor.)

This is very frustrating, since these crashes leave no trace besides
the disappearing app.  I have 10G of swap and 1G of RAM, so memory
ought not be the problem.

Another very odd thing is that the system feels substantially slower
than another system I use that is technically slower.

My slow system says:

    vendor_id       : GenuineIntel
    cpu family      : 15
    model           : 4
    model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
    stepping        : 1
    cpu MHz         : 2989.741
    cache size      : 1024 KB

The system that feels substantially faster says

    vendor_id       : GenuineIntel
    cpu family      : 15
    model           : 2
    model name      : Intel(R) Pentium(R) 4 CPU 2.66GHz
    stepping        : 7
    cpu MHz         : 2657.778
    cache size      : 512 KB

This is all so weird and very frustrating.

-Jeff


On Sat, Jul 30, 2005 at 03:31:12PM -0700, Carlos Konstanski wrote:
>   [140 lines, 834 words, 5522 characters]  Top characters: e>rtnosi
> 
> That's good.  Perhaps it will stay up long enough to run mprime for a
> few hours.  This will test the CPU and memory.  Maybe you fixed one
> problem, but have others as well.
> 
> When we got these problems, we were using the same CPU as usual (AMD
> Athlon XP 2600), but a different board.  Our normal boards were VIA
> KT600, but these were nForce.  Try a different board?
> 
> Carlos
> 
> On Sat, 30 Jul 2005, Jeff Abrahamson wrote:
> 
> > Date: Sat, 30 Jul 2005 18:20:12 -0400
> > From: Jeff Abrahamson <jeff@purple.com>
> > Reply-To: Philadelphia Linux User's Group Discussion List
> >     <plug@lists.phillylinux.org>
> > To: plug@lists.phillylinux.org
> > Subject: Re: [PLUG] APIC errors, weird crashes
> > 
> > Specifying "nolapic" my machine does not boot, hanging trying to get
> > an interrupt for hda (the boot drive).
> >
> > Specifying "noapic" seems to work.  Some things are clearly better,
> > such as not crashing when the monitor is put to sleep.  I've still had
> > a gnome-terminal die since reboot (only a few hours), but that's
> > a better rate than previously, even if still unacceptable.
> >
> > The whole thing is very weird, since in general no entry appears in
> > any log files.
> >
> > -Jeff
> >
> >
> > On Wed, Jul 27, 2005 at 10:12:17PM -0700, Carlos Konstanski wrote:
> >>   [92 lines, 549 words, 3556 characters]  Top characters: erntosia
> >>
> >> I dealt with this kind of thing before once, where we got 5 new boxes
> >> that were not our usual build.  We had to use "nolapic" in the kernel boot
> >> arguments.  These were single-processor machines.  I forget which CPU
> >> and motherboard they were, but I think the boards were Asus.
> >>
> >> With "nolapic" specified, these machines have been reliable, one even
> >> running as a qa server for a huge tomcat app.
> >>
> >> Carlos
> >>
> >> On Thu, 28 Jul 2005, Jeff Abrahamson wrote:
> >>
> >>> Date: Thu, 28 Jul 2005 01:07:28 -0400
> >>> From: Jeff Abrahamson <jeff@purple.com>
> >>> Reply-To: Philadelphia Linux User's Group Discussion List
> >>>     <plug@lists.phillylinux.org>
> >>> To: PLUG <plug@lists.phillylinux.org>
> >>> Subject: [PLUG] APIC errors, weird crashes
> >>>
> >>> I set up a new machine that has been getting weird crashes.  (So far
> >>> gnome terminal, mozilla, emacs21, exim4, X, clock applet, workspace
> >>> applet, xterm, and ogg123 have crashed.)
> >>>
> >>> At first I thought this was APIC related, as I saw a few kernel log
> >>> messages to this effect (see below).  But, for the most part, the
> >>> crashes have not been accompanied by anything tell-tale in the logs.
> >>> It happens often enough to be annoying but not so often that it's
> >>> feasible to sit around and watch it crash.
> >>>
> >>> I'm running Debian testing, but no updates have been posted for
> >>> several days.  I'm hoping it's not hardware.
> >>>
> >>> Any suggestions what might be going on or what to do?
> >>>
> >>>
> >>> [ The remainder of this message details the APIC kernel error, for
> >>>  those who are interested and for posterity.  Most can stop reading
> >>>  now.
> >>> ]
> >>>
> >>> Here is an example of the APIC kernel error, but this is relatively rare:
> >>>
> >>>    jeff@astra:kernel-source-2.6.8 $ dmesg | grep -i apic
> >>>    ENABLING IO-APIC IRQs
> >>>    init IO_APIC IRQs
> >>>     IO-APIC (apicid-pin) 2-0, 2-16, 2-17, 2-18, 2-19, 2-20, 2-21, 2-22, 2-23 not connected.
> >>>    Using local APIC timer interrupts.
> >>>    calibrating APIC timer ...
> >>>    ACPI: Using IOAPIC for interrupt routing
> >>>    number of IO-APIC #2 registers: 24.
> >>>    testing the IO APIC.......................
> >>>    IO APIC #2......
> >>>    .......    : physical APIC id: 02
> >>>    .......     : IO APIC version: 0003
> >>>    APIC error on CPU0: 00(60)
> >>>     <6>APIC error on CPU0: 60(60)
> >>>    APIC error on CPU0: 60(60)
> >>>    jeff@astra:kernel-source-2.6.8 $ uname -a
> >>>    Linux astra 2.6.8-2-686 #1 Thu May 19 17:53:30 JST 2005 i686 GNU/Linux
> >>>    jeff@astra:kernel-source-2.6.8 $
> >>>
> >>> APIC errors, though, seem like they should only happen on SMP
> >>> machines.  (Cf. arch/i386/kernel/apic.c, function
> >>> smp_error_interrupt().)  My kernel is not compiled for SMP (see uname,
> >>> above) and I only have one processor.
> >>>
> >>>    jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo
> >>>    processor       : 0
> >>>    vendor_id       : GenuineIntel
> >>>    cpu family      : 15
> >>>    model           : 4
> >>>    model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
> >>>    [...]
> >>>    jeff@astra:kernel-source-2.6.8 $ cat /proc/cpuinfo | grep processor
> >>>    processor       : 0
> >>>    jeff@astra:kernel-source-2.6.8 $
> >>>
> >>> BTW, I found a cool info site here:
> >>>
> >>>    http://wiki.linuxquestions.org/wiki/APIC
> >>>
> >>> --
> >>> Jeff
> >>>
> >>> Jeff Abrahamson  <http://www.purple.com/jeff/>    +1 215/837-2287
> >>> GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B
> >>>
> >> ___________________________________________________________________________
> >> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> >> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> >> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
> >>
> >
> > -- 
> > Jeff
> >
> > Jeff Abrahamson  <http://www.purple.com/jeff/>    +1 215/837-2287
> > GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B
> >
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug
> 

-- 
 Jeff

 Jeff Abrahamson  <http://www.purple.com/jeff/>    +1 215/837-2287
 GPG fingerprint: 1A1A BA95 D082 A558 A276  63C6 16BF 8C4C 0D1D AE4B

Attachment: signature.asc
Description: Digital signature

___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug