Rich Freeman via plug on 13 Jul 2021 07:13:15 -0700 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
[PLUG] New Hard Drive Testing Practices |
Figured I'd start a conversation even if the topic is a little trivial. I'm curious what others do when they obtain a new hard drive (or a used one for that matter - I'm curious if you do anything differently). Do you do any kind of testing on drives before putting them into service, assuming you can spare a few days/etc? Do you take any kind of measures to mitigate the potential of early failures in lieu of testing? I have a new drive that will probably arrive tomorrow and am going to be taking another drive out of service that has an uncorrectable sector (or 8 of them depending on how you count). The disk is in a RAID and it looks like the sector isn't even in use right now, but I usually try to replace disks in this condition. That actually creates a bit of a risk management question. What is the relative benefit of doing testing before putting a new disk into service, knowing that it comes at the cost of delaying removal of a potentially-failing disk from service? Both the risk and benefit are probably pretty low in this case. Some options I can think of: 1. smart short test (a no-brainer really, but largely ignores the surface) 2. smart long test 3. badblocks destructive test (probably the most extensive practical option, but this can take a number of days for a 14TB drive). This also has the advantage of probably detecting an SMR drive that managed to sneak through though this drive isn't supposed to be one. 4. Add the drive to the RAID without removing the old drive, and then do a scrub, and remove the old drive after it passes. This is effectively a one-pass random data destructive write test that only costs the additional time of a read pass as the write pass was going to happen anyway. However, it would probably only test the in-use regions of the disk surface. I'm using ZFS so silent errors won't be an issue, but use care with anything that doesn't detect silent errors since if there is a failure you're going to have to deal with it manually somehow (hopefully you can pick the version where 2/3 disks agree). This also only works for RAID options that support adding an extra redundant disk temporarily. Figured this might stir up some interesting discussion. -- Rich ___________________________________________________________________________ Philadelphia Linux Users Group -- http://www.phillylinux.org Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce General Discussion -- http://lists.phillylinux.org/mailman/listinfo/plug