Will Dyson on Tue, 13 Aug 2002 03:06:26 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] smp/ext3/nfs/raid == solid lockup on 2.4.18 and 2.4.19


Fred K Ollinger wrote:
I'm getting a solid lockup on linux 2.4.18 and 2.4.19. I just upgraded to
2.4.19 for a short time and got a crash in 24 hours. Now I'm back to
2.4.18.

It is an ext3 error (posted below), as these messages keep getting spewed
to messages right before a crash.

Most of the errors are just the FS reporting that it got an IO error when trying to access the disk (at least the first ones are). There are also some scsi layer errors. I'd bet that the actuall ext3-fs errors (such as ext3_add_entry: bad entry in directory #886488) are caused by the disk returning bad data that confuses it (especially since these errors don't happen until a few IO errors have gone by).


These lockups have been happening periodically ever since we got new raid
array. We are using the aacraid driver for this.

If the raid card (and driver) was the most recent thing to change, I would naturally suspect it first.


Does anyone have any other idea on how to start trouble-shooting this

You can start by enabling all the debugging-related options in the kernel config (under "kernel hacking"). You should also enable the NMI watchdog (read Documentation/nmi_watchdog.txt), since you are seeing hard lockups. Thse will slow the kernel down somewhat, but they should also make sure that crashes happen close to the error that caused them.


Once you have run with all these debugging options on for a while (and experienced the problem a few times), you should hopefully have some helpful oopses (basicly backtraces of the kernel) to show what the kernel was doing before it barfed.

You should definatly make sure that you have ksymoops setup correctly so that the oops records will be meaningful.

You should then post on the linux kernel mailing list about your problem.

Oh, one last peice of advice. Move up to the 2.4.19 kernel. Bug reports against an old kernel usually just cause people to ask you, "Does that still happen on the most recent kernel?".

--
Will Dyson
"Back off man, I'm a scientist!" -Dr. Peter Venkman

_________________________________________________________________________
Philadelphia Linux Users Group        --       http://www.phillylinux.org
Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
General Discussion  --   http://lists.netisland.net/mailman/listinfo/plug