Kris Reilly on Wed, 12 Mar 2003 09:00:06 -0500


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

[PLUG] Hard Drives Crashing


Hello All!

I am hoping that someone else has encountered this problem and been able
to diagnose it effectively.

Under heavy load I have been experiencing a 50% failure rate.  The
problem has appeared in machines configured with both SCSI and IDE
drives.  The test configuration in question is the IDE setup.  

We pound the machines with web requests, we generate large logs then we
crunch them.  Crunching is very disk intensive and the drives stop
responding.  Errors that appear in the logs are attached below.

The machines are P4 Xeon 1.2Ghz x 4 with 6GB RAM.  The drives that are
crashing are 120G IDE.  They fail as secondary on IDE0 and also as
primary on IDE1.  They experience the same failure using both ext2 and
ext3.  The machines are running RedHat 7.3, kernel version
2.4.18-18.7x.bigmem.  I have just updated one of the boxes to
2.4.18-24.7x, custom compiling the kernel and leaving out any
unnecessary cruft and am waiting to see when it crashes again. 

My next approach is to use hdparm and/or muck with the proc fs though
the logs seem to suggest that this problem is directly related to
hardware and not operating system limitations.

Does anyone have any suggestions?  

Thanks!
Kris Reilly      

**Disks that are crashing:

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=703&p_created=1037222838

**Disks crash with this error in the logs:

Message from syslogd@105 at Fri Mar  7 19:10:18 2003 ...
105 kernel: Assertion failure in do_get_write_access() at
transaction.c:737:
"((
(jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)"

Message from syslogd@103 at Sat Mar  8 06:19:41 2003 ...
103 kernel: Assertion failure in do_get_write_access() at
transaction.c:737:
"((
(jh2bh(jh))->b_state & (1UL << BH_Uptodate)) != 0)"

**Just before the crash this is what dmesg has:

))
[<c0146534>] bread [kernel] 0x24 (0xd58f3d2c))
[<f881e5a5>] ext3_get_branch [ext3] 0x55 (0xd58f3d50))
[<f880dd6f>] journal_get_write_access_Rsmp_78dc75e5 [jbd] 0x3f
(0xd58f3d68))
[<f881ed55>] ext3_get_block_handle [ext3] 0x205 (0xd58f3d7c))
[<f880e241>] journal_dirty_metadata_Rsmp_fb9ecae4 [jbd] 0x61
(0xd58f3de4))
[<c0146772>] create_buffers [kernel] 0x62 (0xd58f3de8))
[<f881ee7c>] ext3_get_block [ext3] 0x5c (0xd58f3e0c))
[<c0146d19>] __block_prepare_write [kernel] 0xe9 (0xd58f3e2c))
[<f8821555>] ext3_mark_iloc_dirty [ext3] 0x25 (0xd58f3e5c))
[<f8816310>] .rodata.str1.1 [jbd] 0x30 (0xd58f3e6c))
[<c0147675>] block_prepare_write [kernel] 0x25 (0xd58f3e80))
[<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3e94))
[<f880d39d>] journal_start_Rsmp_171b1921 [jbd] 0x7d (0xd58f3ea0))
[<f881f3a5>] ext3_prepare_write [ext3] 0xd5 (0xd58f3eb0))
[<f881ee20>] ext3_get_block [ext3] 0x0 (0xd58f3ec0))
[<c01343ed>] generic_file_write [kernel] 0x4ed (0xd58f3ee8))
[<c0156be4>] fcntl_setlk [kernel] 0x1a4 (0xd58f3f3c))
[<f881cc32>] ext3_file_write [ext3] 0x22 (0xd58f3f5c))
[<c01440f6>] sys_write [kernel] 0x96 (0xd58f3f7c))
[<c0152e9d>] sys_fcntl64 [kernel] 0x8d (0xd58f3fac))
[<c0108c73>] system_call [kernel] 0x33 (0xd58f3fc0))


Code: 0f 0b e1 02 f0 62 81 f8 83 c4 14 8b 44 24 34 8b 08 b8 00 e0 
 end_request: I/O error, dev 03:41 (hdb), sector 67895384
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=4243506, block=8486923
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040
end_request: I/O error, dev 03:41 (hdb), sector 181670096
end_request: I/O error, dev 03:41 (hdb), sector 181670128
end_request: I/O error, dev 03:41 (hdb), sector 0
EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO
failure
end_request: I/O error, dev 03:41 (hdb), sector 18752
end_request: I/O error, dev 03:41 (hdb), sector 37224528
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=2326537, block=4653066
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=2326532, block=4653066
end_request: I/O error, dev 03:41 (hdb), sector 63941840
end_request: I/O error, dev 03:41 (hdb), sector 176160848
EXT3-fs error (device ide0(3,65)): ext3_get_inode_loc: unable to read
inode block - inode=11010060, block=22020106
end_request: I/O error, dev 03:41 (hdb), sector 181669984
end_request: I/O error, dev 03:41 (hdb), sector 181670016
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040
end_request: I/O error, dev 03:41 (hdb), sector 181670096
end_request: I/O error, dev 03:41 (hdb), sector 0
EXT3-fs error (device ide0(3,65)) in ext3_reserve_inode_write: IO
failure
end_request: I/O error, dev 03:41 (hdb), sector 181670016
end_request: I/O error, dev 03:41 (hdb), sector 181670032
end_request: I/O error, dev 03:41 (hdb), sector 181670040

... many more of these end_request errors ...
-- 
Kris Reilly <kar@ramblingredneck.com>
-- 
Kris Reilly <kar@ramblingredneck.com>

Attachment: signature.asc
Description: This is a digitally signed message part