Kevin Brosius on Wed, 12 Mar 2003 09:11:05 -0500 |
Kris Reilly wrote: > > Hello All! > > I am hoping that someone else has encountered this problem and been able > to diagnose it effectively. > > Under heavy load I have been experiencing a 50% failure rate. The > problem has appeared in machines configured with both SCSI and IDE > drives. The test configuration in question is the IDE setup. > > We pound the machines with web requests, we generate large logs then we > crunch them. Crunching is very disk intensive and the drives stop > responding. Errors that appear in the logs are attached below. > > The machines are P4 Xeon 1.2Ghz x 4 with 6GB RAM. The drives that are > crashing are 120G IDE. They fail as secondary on IDE0 and also as > primary on IDE1. They experience the same failure using both ext2 and > ext3. The machines are running RedHat 7.3, kernel version > 2.4.18-18.7x.bigmem. I have just updated one of the boxes to > 2.4.18-24.7x, custom compiling the kernel and leaving out any > unnecessary cruft and am waiting to see when it crashes again. > > My next approach is to use hdparm and/or muck with the proc fs though > the logs seem to suggest that this problem is directly related to > hardware and not operating system limitations. > > Does anyone have any suggestions? > > Thanks! > Kris Reilly > > **Disks that are crashing: > > http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=703&p_created=1037222838 > > **Disks crash with this error in the logs: > > Message from syslogd@105 at Fri Mar 7 19:10:18 2003 ... > 105 kernel: Assertion failure in do_get_write_access() at > transaction.c:737: > "(( > (jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)" > > Message from syslogd@103 at Sat Mar 8 06:19:41 2003 ... > 103 kernel: Assertion failure in do_get_write_access() at > transaction.c:737: > "(( > (jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)" > > **Just before the crash this is what dmesg has: > > )) > [<c0146534>] bread [kernel] 0x24 (0xd58f3d2c)) > [<f881e5a5>] ext3_get_branch [ext3] 0x55 (0xd58f3d50)) > [<f880dd6f>] journal_get_write_access_Rsmp_78dc75e5 [jbd] 0x3f > (0xd58f3d68)) > [<f881ed55>] ext3_get_block_handle [ext3] 0x205 (0xd58f3d7c)) > [<f880e241>] journal_dirty_metadata_Rsmp_fb9ecae4 [jbd] 0x61 > (0xd58f3de4)) > [<c0146772>] create_buffers [kernel] 0x62 (0xd58f3de8)) > [<f881ee7c>] ext3_get_block [ext3] 0x5c (0xd58f3e0c)) > [<c0146d19>] __block_prepare_write [kernel] 0xe9 (0xd58f3e2c)) > [<f8821555>] ext3_mark_iloc_dirty [ext3] 0x25 (0xd58f3e5c)) > [<f8816310>] .rodata.str1.1 [jbd] 0x30 (0xd58f3e6c)) > [<c0147675>] block_prepare_write [kernel] 0x25 (0xd58f3e80)) > [<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3e94)) > [<f880d39d>] journal_start_Rsmp_171b1921 [jbd] 0x7d (0xd58f3ea0)) > [<f881f3a5>] ext3_prepare_write [ext3] 0xd5 (0xd58f3eb0)) > [<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3ec0)) > [<c01343ed>] generic_file_write [kernel] 0x4ed (0xd58f3ee8)) > [<c0156be4>] fcntl_setlk [kernel] 0x1a4 (0xd58f3f3c)) > [<f881cc32>] ext3_file_write [ext3] 0x22 (0xd58f3f5c)) > [<c01440f6>] sys_write [kernel] 0x96 (0xd58f3f7c)) > [<c0152e9d>] sys_fcntl64 [kernel] 0x8d (0xd58f3fac)) > [<c0108c73>] system_call [kernel] 0x33 (0xd58f3fc0)) > > Code: 0f 0b e1 02 f0 62 81 f8 83 c4 14 8b 44 24 34 8b 08 b8 00 e0 > end_request: I/O error, dev 03:41 (hdb), sector 67895384 > EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read > inode block - inode=4243506, block=8486923 > end_request: I/O error, dev 03:41 (hdb), sector 181670032 > end_request: I/O error, dev 03:41 (hdb), sector 181670040 > end_request: I/O error, dev 03:41 (hdb), sector 181670096 > end_request: I/O error, dev 03:41 (hdb), sector 181670128 > end_request: I/O error, dev 03:41 (hdb), sector 0 > EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO > failure > end_request: I/O error, dev 03:41 (hdb), sector 18752 > end_request: I/O error, dev 03:41 (hdb), sector 37224528 > EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read > inode block - inode=2326537, block=4653066 > EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read > inode block - inode=2326532, block=4653066 > end_request: I/O error, dev 03:41 (hdb), sector 63941840 > end_request: I/O error, dev 03:41 (hdb), sector 176160848 > EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read > inode block - inode=11010060, block=22020106 > end_request: I/O error, dev 03:41 (hdb), sector 181669984 > end_request: I/O error, dev 03:41 (hdb), sector 181670016 > end_request: I/O error, dev 03:41 (hdb), sector 181670032 > end_request: I/O error, dev 03:41 (hdb), sector 181670040 > end_request: I/O error, dev 03:41 (hdb), sector 181670096 > end_request: I/O error, dev 03:41 (hdb), sector 0 > EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO > failure > end_request: I/O error, dev 03:41 (hdb), sector 181670016 > end_request: I/O error, dev 03:41 (hdb), sector 181670032 > end_request: I/O error, dev 03:41 (hdb), sector 181670040 > > ... many more of these end_request errors ... What does the vendor say about the drive failures? Worst case, some drives aren't rated for 100% usage and you'll need better drives. Best case, you aren't keeping them cool enough. Do the systems have adequate circulation & cooling for the drive bays? -- Kevin Brosius _________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.netisland.net/mailman/listinfo/plug-announce General Discussion -- http://lists.netisland.net/mailman/listinfo/plug
|
|