JP Vossen on 20 May 2009 22:20:44 -0700 |
* For some value of "fixed"... I just brought up a server in a remote co-lo, and was getting: May 20 21:02:18 host smartd[6707]: Device: /dev/sda, 13 Offline uncorrectable sectors May 20 21:32:18 host smartd[6707]: Device: /dev/sda, 13 Offline uncorrectable sectors That drive is an older one pulled from some other server: a Maxtor DiamondMax Plus 9 6Y160M0 160G and is disk0 of a software mirror set that's sitting under LVM. So I have a mirror (which is NOT NOT NOT a backup, but that's a different story) so I can probably keep running if it dies, but... Dead hard drives are a PITA. When I Google for that error I get stuff like: http://smartmontools.sourceforge.net/badblockhowto.html Unfortunately, nothing I found in Google told me what that error actually *means*. The closest was http://en.wikipedia.org/wiki/S.M.A.R.T which said: ID Hex Attribute name Better Description [...] 198 C6 Uncorrectable Sector Count \/ The total number of uncorrectable errors when reading/writing a sector. A rise in the value of this attribute indicates defects of the disk surface and/or problems in the mechanical subsystem. (or Off-Line Scan Uncorrectable Sector Count – Fujitsu)[15] Unfortunately, that's not *quite* what I see. I am getting "198 Offline_Uncorrectable" on Maxtor. Sigh. BTW the order drive is fine as far as I can tell from SMART. So just for fun, I zeroed free space on both partitions. From what I read (mostly in badblockhowto.html), an attempt to write to a bad block will cause the drive to Do Something About the error, and maybe fix* it. Since according to df / has 142G available but only 7G used (5.6G is a single VM), and /boot has 236M avail with 48M used, the odds favor that if anything is really wrong it doesn't have data in it or it's in the VM that I'll be re-rsyncing anyway. # dd if=/dev/zero of=/boot/zero; rm -f /boot/zero # dd if=/dev/zero of=/zero; rm -f /zero dd: writing to `/zero': No space left on device 282258265+0 records in 282258264+0 records out 144516231168 bytes (145 GB) copied, 3992.15 s, 36.2 MB/s (Funny story, after the big dd ended, I was wondering why I showed 0 free space. Turns out fcheck was running an md5sum on 145G of /zero, so while I *had* deleted it, since md5sum had the file handle open it wasn't "gone" yet. Once I killed the md5sum I got my space back.) Also, I understand that 145 GB written as reported by dd is larger than the 142G reported free by df. I hate drive space math in different bases... Anyway, short story long, that fixed it after I told smart to re-test: # smartctl -t offline /dev/sda ## Wait 300+ seconds # smartctl -A /dev/sda | egrep '^198|^ID' ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 198 Offline_Uncorrectable 0x0008 253 240 000 Old_age Offline - 0 If I'd been really smart I'd have re-run the test (smartctl -t offline /dev/sda) before the "fix" just in case... But that didn't occur to me until just now. Hopefully this is useful for someone, JP ----------------------------|:::======|------------------------------- JP Vossen, CISSP |:::======| http://bashcookbook.com/ My Account, My Opinions |=========| http://www.jpsdomain.org/ ----------------------------|=========|------------------------------- "Microsoft Tax" = the additional hardware & yearly fees for the add-on software required to protect Windows from its own poorly designed and implemented self, while the overhead incidentally flattens Moore's Law. ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug
|
|