Fred K Ollinger on Mon, 12 Aug 2002 21:22:22 -0400


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] smp/ext3/nfs/raid == solid lockup on 2.4.18 and 2.4.19


I'm getting a solid lockup on linux 2.4.18 and 2.4.19. I just upgraded to
2.4.19 for a short time and got a crash in 24 hours. Now I'm back to
2.4.18.

It is an ext3 error (posted below), as these messages keep getting spewed
to messages right before a crash.

I'm running a dell poweredge 4300 dual smp 500 mhz system. I am sharing
out a raid 5 array (powervault 2115). All my fs are ext3.

I got a lockup solid. This was in the middle of a backup (dump).

It is nfs mounted on 5 clients.

These lockups have been happening periodically ever since we got new raid
array. We are using the aacraid driver for this.

Does anyone have any other idea on how to start trouble-shooting this?

I have spent some time reading up on all the pertinent error messages, but
I am coming up short in the ideas stage:
1. what caused it
2. how to fix it

---start log and commentary---

Here's some errors and comments:

Here's a link:

http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&threadm=200205141238.11104.kiza%40gmx.net&rnum=2&prev=/groups%3Fq%3D%2Bext3%2B%25222.4%2B18%2B%2522%26hl%3Den%26lr%3Dlang_en%26ie%3DUTF-8%26selm%3D200205141238.11104.kiza%2540gmx.net%26rnum%3D2

[snipped duplicate errors]

Aug 12 11:13:02 wernicke kernel: EXT3-fs error (device sd(8,18)) in
ext3_reserve_inode_write: IO failure

Most of these are people w/ similar problems. No solutions. I'm just
documenting that we are not the only people to have these problems.

http://groups.google.com/groups?q=ext3_reserve_inode_write:&hl=en&lr=lang_en&ie=UTF-8&selm=linux.kernel.Pine.LNX.4.33.0203242328170.2544-100000%40devel.blackstar.nl&rnum=4

Aug 12 11:16:36 wernicke kernel: EXT3-fs error (device sd(8,18)) in
ext3_new_inode: IO failure

http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&threadm=linux.kernel.3D2F3331.376FB6D2%40zip.com.au&rnum=13&prev=/groups%3Fq%3Dext3_new_inode:%26start%3D10%26hl%3Den%26lr%3Dlang_en%26ie%3DUTF-8%26selm%3Dlinux.kernel.3D2F3331.376FB6D2%2540zip.com.au%26rnum%3D13

This one is particularly weird as this is supposed to mean that we are out
of
inodes. We are not. The inode usage on /data is < 3%.

Aug 12 11:16:36 wernicke kernel: EXT3-fs error (device sd(8,18)):
ext3_add_entry: bad entry in directory #886488: rec_len %% 4 != 0 -
offset=0, inode=3889333976,
rec_len=11254, name_len=124

http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&threadm=linux.kernel.20020729123706.GC463%40gzp2.gzp.hu&rnum=1&prev=/groups%3Fq%3D%2522bad%2Bentry%2Bin%2Bdirectory%2522%2B2.4.18%26hl%3Den%26lr%3Dlang_en%26ie%3DUTF-8%26selm%3Dlinux.kernel.20020729123706.GC463%2540gzp2.gzp.hu%26rnum%3D1


Here's where the really nasty error occur:

Aug 12 11:13:02 wernicke kernel: SCSI disk error : host 5 channel 2 id 0
lun 0 return code = 25040001
Aug 12 11:13:02 wernicke kernel:  I/O error: dev 08:12, sector 588530016
Aug 12 11:13:02 wernicke kernel: EXT3-fs error (device sd(8,18)):
ext3_readdir: directory #36853974 contains a hole at offset 0

http://groups.google.com/groups?hl=en&lr=lang_en&ie=UTF-8&threadm=linux.kernel.20011217161538.GA17099%40spylog.ru&rnum=3&prev=/groups%3Fq%3Dext3_readdir%2Bhole%2B2.4%26hl%3Den%26lr%3Dlang_en%26ie%3DUTF-8%26selm%3Dlinux.kernel.20011217161538.GA17099%2540spylog.ru%26rnum%3D3


Aug 12 11:13:02 wernicke kernel: EXT3-fs error (device sd(8,18)):
ext3_readdir: bad entry in directory #36853974: rec_len %% 4 != 0 -
offset=0, inode=1330206934,
rec_len=51, name_len=1
Aug 12 11:14:32 wernicke kernel: SCSI disk error : host 5 channel 2 id 0
lun 0 return code = 25040001
Aug 12 11:14:32 wernicke kernel:  I/O error: dev 08:12, sector 89147976

[same errors, a lot]

Aug 12 11:17:22 wernicke kernel: EXT3-fs error (device sd(8,18)):
ext3_readdir: bad entry in directory #36853974: rec_len %% 4 != 0 -
offset=0, inode=1330206934,
rec_len=51, name_len=1
Aug 12 11:17:36 wernicke last message repeated 20 times

restarted
--end of log---

Thanks so much for your time (lot of reading to get here). :)

Fred Ollinger (follinge@sas.upenn.edu)
CCN sysadmin


_________________________________________________________________________
Philadelphia Linux Users Group        --       http://www.phillylinux.org
Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce
General Discussion  --   http://lists.netisland.net/mailman/listinfo/plug