brent timothy saner on 14 Jun 2019 07:45:42 -0700

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] Is My Disk Toast?

On 6/14/19 9:10 AM, Louis K wrote:
> I've got a HDD that starting throwing errors to dmesg (below). It
> happened on about 8 different sectors recently, and happens repeatedly.
> I've been googling around and learning about smart, but the diagnostic
> seems (also below) to indicate the disk is good (Reallocated_Sector_Ct
> and Reallocated_Event_Count have raw values 0).
> Is this disk dying, or can I run a tool to reallocate the bad sectors? 
> print_req_error: I/O error, dev sdc, sector 5911268904 flags 0
> ata3: EH complete
> ata3.00: exception Emask 0x0 SAct 0x8 SErr 0x0 action 0x0
> ata3.00: irq_stat 0x40000008
> ata3.00: failed command: READ FPDMA QUEUED
> ata3.00: cmd 60/08:18:f0:d8:56/00:00:60:01:00/40 tag 3 ncq dma 4096 in
>                                     res
> 41/40:00:f0:d8:56/00:00:60:01:00/00 Emask 0x409 (media error) <F>
> ata3.00: status: { DRDY ERR }
> ata3.00: error: { UNC }
> ata3.00: configured for UDMA/133
> sd 2:0:0:0: [sdc] tag#3 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> sd 2:0:0:0: [sdc] tag#3 Sense Key : Medium Error [current]
> sd 2:0:0:0: [sdc] tag#3 Add. Sense: Unrecovered read error - auto
> reallocate failed

this can be anything from:

- bad disk
- bad cable between motherboard and disk
- bad connector on motherboard
- bad PSU connection (yes.) - undervoltage will cause FPDMA queueing.

the first is hard to test if it's any of the other three.

start with testing the voltage on the PSU connector since that's the
easiest to isolate. grab your multimeter and:

pinout for molex:
pinout for SATA power:

if that passes, try the same data cable/same disk on a different port on
the motherboard (or an entirely different machine) and run a:

badblocks /dev/<disk>
smartctl -t long -d sat <disk>

on the disk (where <disk> is sda, sdb, whatever). you can then use
smartctl to get the test results of the long test.

if THAT passes *with no DMESG errors*, replace the cable and use the
original motherboard port. repeat those two commands, re-check SMART
status, etc.

if THAT passes *with no DMESG errors*, it was the cable. if it STILL
spits out DMESG errors, it's either the port or the disk. repeat the
test on another port with a new cable. if that fails, it's probably the
disk itself. replace the disk.

if the new disk fails with the same error, you're looking at a kernel
bug. not unheard of, but pretty rare with I/O-related stuff.

Attachment: signature.asc
Description: OpenPGP digital signature

Philadelphia Linux Users Group         --
Announcements -
General Discussion  --