Kris Reilly on Wed, 12 Mar 2003 09:00:06 -0500 |
Hello All! I am hoping that someone else has encountered this problem and been able to diagnose it effectively. Under heavy load I have been experiencing a 50% failure rate. The problem has appeared in machines configured with both SCSI and IDE drives. The test configuration in question is the IDE setup. We pound the machines with web requests, we generate large logs then we crunch them. Crunching is very disk intensive and the drives stop responding. Errors that appear in the logs are attached below. The machines are P4 Xeon 1.2Ghz x 4 with 6GB RAM. The drives that are crashing are 120G IDE. They fail as secondary on IDE0 and also as primary on IDE1. They experience the same failure using both ext2 and ext3. The machines are running RedHat 7.3, kernel version 2.4.18-18.7x.bigmem. I have just updated one of the boxes to 2.4.18-24.7x, custom compiling the kernel and leaving out any unnecessary cruft and am waiting to see when it crashes again. My next approach is to use hdparm and/or muck with the proc fs though the logs seem to suggest that this problem is directly related to hardware and not operating system limitations. Does anyone have any suggestions? Thanks! Kris Reilly **Disks that are crashing: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=703&p_created=1037222838 **Disks crash with this error in the logs: Message from syslogd@105 at Fri Mar 7 19:10:18 2003 ... 105 kernel: Assertion failure in do_get_write_access() at transaction.c:737: "(( (jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)" Message from syslogd@103 at Sat Mar 8 06:19:41 2003 ... 103 kernel: Assertion failure in do_get_write_access() at transaction.c:737: "(( (jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)" **Just before the crash this is what dmesg has: )) [<c0146534>] bread [kernel] 0x24 (0xd58f3d2c)) [<f881e5a5>] ext3_get_branch [ext3] 0x55 (0xd58f3d50)) [<f880dd6f>] journal_get_write_access_Rsmp_78dc75e5 [jbd] 0x3f (0xd58f3d68)) [<f881ed55>] ext3_get_block_handle [ext3] 0x205 (0xd58f3d7c)) [<f880e241>] journal_dirty_metadata_Rsmp_fb9ecae4 [jbd] 0x61 (0xd58f3de4)) [<c0146772>] create_buffers [kernel] 0x62 (0xd58f3de8)) [<f881ee7c>] ext3_get_block [ext3] 0x5c (0xd58f3e0c)) [<c0146d19>] __block_prepare_write [kernel] 0xe9 (0xd58f3e2c)) [<f8821555>] ext3_mark_iloc_dirty [ext3] 0x25 (0xd58f3e5c)) [<f8816310>] .rodata.str1.1 [jbd] 0x30 (0xd58f3e6c)) [<c0147675>] block_prepare_write [kernel] 0x25 (0xd58f3e80)) [<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3e94)) [<f880d39d>] journal_start_Rsmp_171b1921 [jbd] 0x7d (0xd58f3ea0)) [<f881f3a5>] ext3_prepare_write [ext3] 0xd5 (0xd58f3eb0)) [<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3ec0)) [<c01343ed>] generic_file_write [kernel] 0x4ed (0xd58f3ee8)) [<c0156be4>] fcntl_setlk [kernel] 0x1a4 (0xd58f3f3c)) [<f881cc32>] ext3_file_write [ext3] 0x22 (0xd58f3f5c)) [<c01440f6>] sys_write [kernel] 0x96 (0xd58f3f7c)) [<c0152e9d>] sys_fcntl64 [kernel] 0x8d (0xd58f3fac)) [<c0108c73>] system_call [kernel] 0x33 (0xd58f3fc0)) Code: 0f 0b e1 02 f0 62 81 f8 83 c4 14 8b 44 24 34 8b 08 b8 00 e0 end_request: I/O error, dev 03:41 (hdb), sector 67895384 EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read inode block - inode=4243506, block=8486923 end_request: I/O error, dev 03:41 (hdb), sector 181670032 end_request: I/O error, dev 03:41 (hdb), sector 181670040 end_request: I/O error, dev 03:41 (hdb), sector 181670096 end_request: I/O error, dev 03:41 (hdb), sector 181670128 end_request: I/O error, dev 03:41 (hdb), sector 0 EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev 03:41 (hdb), sector 18752 end_request: I/O error, dev 03:41 (hdb), sector 37224528 EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read inode block - inode=2326537, block=4653066 EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read inode block - inode=2326532, block=4653066 end_request: I/O error, dev 03:41 (hdb), sector 63941840 end_request: I/O error, dev 03:41 (hdb), sector 176160848 EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read inode block - inode=11010060, block=22020106 end_request: I/O error, dev 03:41 (hdb), sector 181669984 end_request: I/O error, dev 03:41 (hdb), sector 181670016 end_request: I/O error, dev 03:41 (hdb), sector 181670032 end_request: I/O error, dev 03:41 (hdb), sector 181670040 end_request: I/O error, dev 03:41 (hdb), sector 181670096 end_request: I/O error, dev 03:41 (hdb), sector 0 EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO failure end_request: I/O error, dev 03:41 (hdb), sector 181670016 end_request: I/O error, dev 03:41 (hdb), sector 181670032 end_request: I/O error, dev 03:41 (hdb), sector 181670040 ... many more of these end_request errors ... -- Kris Reilly <kar@ramblingredneck.com> -- Kris Reilly <kar@ramblingredneck.com> Attachment:
signature.asc
|
|