LeRoy Cressy on 30 Aug 2005 15:24:00 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] ext3 and fsck


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Jeff Abrahamson wrote:
> On Tue, Aug 30, 2005 at 08:00:47AM -0400, Edmund Goppelt wrote:
> 
>>1. Install smartmontools and do a short self-test.  If the drive is
>>truly hosed, the drive will almost certainly fail spectacularly.
> 
> 
> Cool tool.  Thanks for the pointer.
> 
> I'm not too sure how to read the results.  It tells me there were
> errors, which I know, but I don't know if this is "spectacular
> failure," pending failure, or normal range.  Any help?
> 
> I've attached the output of "smartctl -a /dev/hdb".
> 

I looked at the results below and it seems that your kernel has dma
support loaded.  I have found that the 2.6 kernel tree has problems with
the dma support.

My mothers system had some problems.
Here is the help from the kernel config:

Use PCI DMA by default when available IDEDMA_PCI_AUTO

Prior to kernel version 2.1.112, Linux used to automatically use
DMA for IDE drives and chipsets which support it. Due to concerns
about a couple of cases where buggy hardware may have caused damage,
the default is now to NOT use DMA automatically. To revert to the
previous behaviour, say Y to this question.

If you suspect your hardware is at all flakey, say N here.
Do NOT email the IDE kernel people regarding this issue!

It is normally safe to answer Y to this question unless your
motherboard uses a VIA VP2 chipset, in which case you should say N.

Thus with all of the other advice given on this thread:
	testing the drive in another box
	Checking the health and temp of the drive
	Check the ide chipset

I would turn off IDEDMA_PCI_AUTO in my kernel.

> 
> 
> ------------------------------------------------------------------------
> 
> smartctl version 5.32 Copyright (C) 2002-4 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Device Model:     Maxtor 5A300J0
> Serial Number:    A824F1LE
> Firmware Version: RAM51VV0
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   7
> ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
> Local Time is:    Tue Aug 30 08:23:12 2005 EDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Offline data collection status:  (0x00)	Offline data collection activity
> 					was never started.
> 					Auto Offline Data Collection: Disabled.
> Self-test execution status:      (   0)	The previous self-test routine completed
> 					without error or no self-test has ever 
> 					been run.
> Total time to complete Offline 
> data collection: 		 (  30) seconds.
> Offline data collection
> capabilities: 			 (0x5b) SMART execute Offline immediate.
> 					Auto Offline data collection on/off support.
> 					Suspend Offline collection upon new
> 					command.
> 					Offline surface scan supported.
> 					Self-test supported.
> 					No Conveyance Self-test supported.
> 					Selective Self-test supported.
> SMART capabilities:            (0x0003)	Saves SMART data before entering
> 					power-saving mode.
> 					Supports SMART auto save timer.
> Error logging capability:        (0x01)	Error logging supported.
> 					No General Purpose Logging support.
> Short self-test routine 
> recommended polling time: 	 (   2) minutes.
> Extended self-test routine
> recommended polling time: 	 ( 158) minutes.
> 
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
>   3 Spin_Up_Time            0x0027   252   252   063    Pre-fail  Always       -       3679
>   4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       18
>   5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       1
>   6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
>   7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
>   8 Seek_Time_Performance   0x0027   252   246   187    Pre-fail  Always       -       49139
>   9 Power_On_Hours          0x0032   251   251   000    Old_age   Always       -       58836
>  10 Spin_Retry_Count        0x002b   252   252   157    Pre-fail  Always       -       0
>  11 Calibration_Retry_Count 0x002b   252   252   223    Pre-fail  Always       -       0
>  12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       8
> 192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
> 193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
> 194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       44
> 195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       28633
> 196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
> 197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       1
> 198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
> 200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
> 201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       33
> 202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
> 203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       14
> 204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
> 205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
> 207 Spin_High_Current       0x002a   252   252   000    Old_age   Always       -       0
> 208 Spin_Buzz               0x002a   252   252   000    Old_age   Always       -       0
> 209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   Offline      -       0
>  99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
> 100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
> 101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
> 
> SMART Error Log Version: 1
> Warning: ATA error count 22 inconsistent with error log pointer 5
> 
> ATA Error Count: 22 (device log contains only the most recent five errors)
> 	CR = Command Register [HEX]
> 	FR = Features Register [HEX]
> 	SC = Sector Count Register [HEX]
> 	SN = Sector Number Register [HEX]
> 	CL = Cylinder Low Register [HEX]
> 	CH = Cylinder High Register [HEX]
> 	DH = Device/Head Register [HEX]
> 	DC = Device Command Register [HEX]
> 	ER = Error register [HEX]
> 	ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
> 
> Error 22 occurred at disk power-on lifetime: 909 hours (37 days + 21 hours)
>   When the command that caused the error occurred, the device was in an unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 01 4f 00 68 f0  Error: UNC 1 sectors at LBA = 0x0068004f = 6815823
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   25 00 02 4f 00 68 f0 08  14d+05:40:47.888  READ DMA EXT
>   25 00 02 4f 00 68 f0 08  14d+05:40:46.512  READ DMA EXT
>   25 00 4e 51 00 68 f0 08  14d+05:40:46.464  READ DMA EXT
>   25 00 50 4f 00 68 f0 08  14d+05:41:50.624  READ DMA EXT
>   25 00 98 17 00 88 f0 08  14d+05:41:50.624  READ DMA EXT
> 
> Error 21 occurred at disk power-on lifetime: 909 hours (37 days + 21 hours)
>   When the command that caused the error occurred, the device was in an unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 01 4f 00 68 f0  Error: UNC 1 sectors at LBA = 0x0068004f = 6815823
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   25 00 02 4f 00 68 f0 08  14d+05:40:46.512  READ DMA EXT
>   25 00 4e 51 00 68 f0 08  14d+05:40:46.464  READ DMA EXT
>   25 00 50 4f 00 68 f0 08  14d+05:41:50.624  READ DMA EXT
>   25 00 98 17 00 88 f0 08  14d+05:41:50.624  READ DMA EXT
>   25 00 00 67 2a 64 f0 08  14d+05:41:50.608  READ DMA EXT
> 
> Error 20 occurred at disk power-on lifetime: 909 hours (37 days + 21 hours)
>   When the command that caused the error occurred, the device was in an unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 4f 4f 00 68 f0  Error: UNC 79 sectors at LBA = 0x0068004f = 6815823
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   25 00 50 4f 00 68 f0 08  14d+05:41:50.624  READ DMA EXT
>   25 00 98 17 00 88 f0 08  14d+05:41:50.624  READ DMA EXT
>   25 00 00 67 2a 64 f0 08  14d+05:41:50.608  READ DMA EXT
>   25 00 00 6f 10 64 f0 08  14d+05:41:50.608  READ DMA EXT
>   25 00 00 6f 0f 64 f0 08  14d+05:41:50.608  READ DMA EXT
> 
> Error 19 occurred at disk power-on lifetime: 909 hours (37 days + 21 hours)
>   When the command that caused the error occurred, the device was in an unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 07 4f 00 68 f0  Error: UNC 7 sectors at LBA = 0x0068004f = 6815823
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   25 00 08 4f 00 68 f0 08  14d+05:28:14.832  READ DMA EXT
>   25 00 08 4f 10 58 f0 08  14d+05:28:14.832  READ DMA EXT
>   25 00 08 4f 10 44 f0 08  14d+05:28:14.816  READ DMA EXT
>   25 00 08 4f 10 30 f0 08  14d+05:28:14.816  READ DMA EXT
>   25 00 08 4f 10 34 f0 08  14d+05:28:14.800  READ DMA EXT
> 
> Error 18 occurred at disk power-on lifetime: 909 hours (37 days + 21 hours)
>   When the command that caused the error occurred, the device was in an unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 51 07 4f 00 68 f0  Error: UNC 7 sectors at LBA = 0x0068004f = 6815823
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   25 00 08 4f 00 68 f0 08  14d+05:25:34.464  READ DMA EXT
>   25 00 08 4f 00 64 f0 08  14d+05:25:34.464  READ DMA EXT
>   25 00 08 4f 00 58 f0 08  14d+05:25:34.464  READ DMA EXT
>   25 00 08 4f 00 54 f0 08  14d+05:25:34.464  READ DMA EXT
>   25 00 08 4f 00 50 f0 08  14d+05:25:34.448  READ DMA EXT
> 
> SMART Self-test log structure revision number 1
> No self-tests have been logged.  [To run self-tests, use: smartctl -t]
> 
> 
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>   After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ___________________________________________________________________________
> Philadelphia Linux Users Group         --        http://www.phillylinux.org
> Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
> General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug


- --
 Rev. LeRoy D. Cressy  mailto:leroy@lrcressy.com   /\_/\
                       http://lrcressy.com        ( o.o )
                       Phone:  215-535-4037        > ^ <
                       FAX:    215-535-4285

gpg fingerprint:  62DE 6CAB CEE1 B1B3 359A  81D8 3FEF E6DA 8501 AFEA

For info on enigmail:    http://lrcressy.com/linux/mozilla.pdf
For info on gpg:         http://www.gnupg.org/

Jesus saith unto him, I am the way, the truth, and the life:
no man cometh unto the Father, but by me. (John 14:6)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Debian - http://enigmail.mozdev.org

iQIVAwUBQxR5tnlsxrSGsIsqAQq8AA//ZVZxviFZ+jeLYB5W1JSpbEc4QauwMcMm
74wlbAQ8kq6dFafPVfL/X1cJTkkLJCtVqfq+EiPSbOPLyg9DiJe2hg5VHPCtoQqD
uV6tRTAtAOLmPEqVh5QmeVJqj33zrSVuBIWmv7HJaU88YkjHk++F/VNwhv+XWLG5
j0kSWMX9th2T7RuVLhgW3iLHW1CFoWBDbMg7fx35zQQFh1gQRC1Xw4aHJDlrasf+
PZpNwQTcfuZXZBvRgZVzjpNV5u3FAZiLUyA7stE2FfEFdKLPTv0ymSUodQgMePVe
cHEdG8ssvn8EXMo94QJgWXEHR39JJjQDKDFfwaBOOWjL8wfjBquVicVVV9Lce33h
Pt6sKLD9bNMrVoIepZNj/bov9QHLkT0dqs3terroITgXt4x/F26WXhTfsu79g+s5
sk61UuqhyQ4wgczfPCHBiNmvkmhnk1nAJ2uMYN0K3E3v7KdsduAaem4XG2nLc9UF
SMtjhJv5Xt6n6Jdc4sPHn7sjyu93WkFaRbN2BoRzN2MO3IVkMUXYXt4lbznzJsBz
yLtZbZCYNYgCNTgQ/49/U9gjAEUIayotlw4UgAVtqvla2MNCPYPRbX+6WR21g7mb
P99NK9cmO02PZYd7SO5j+ncizUVNhElZnh9/rcT0gMDDYuYsFUJfrBpiQEX1r2p8
Cia3Zs4JajU=
=I4n3
-----END PGP SIGNATURE-----
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug