Mark Bergman via plug on 13 Jul 2021 08:09:03 -0700


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [PLUG] New Hard Drive Testing Practices


In the message dated: Tue, 13 Jul 2021 10:41:51 -0400,
The pithy ruminations from Rich Freeman via plug on 
[Re: [PLUG] New Hard Drive Testing Practices] were:
=> On Tue, Jul 13, 2021 at 10:20 AM Casey Bralla via plug
=> <plug@lists.phillylinux.org> wrote:
=> >
=> > Interesting question I never thought about.  For me, without a "mission
=> > critical" application, I just swap in the drive, format it, copy data to
=> > it, and go.   Modern drives are so reliable now, I assume any defects
=> > are caught in final testing at the manufacturer, or will show up almost
=> > immediately when formated. However, I also assume the big drives you're

Yep. Failures are almost entirely in a bathtub-shaped curve...drives fail "immediately" or well after their lifespan,
almost nothing in between.

=> 
=> I'm not sure the big drives are really any less reliable (unless using
=> a formula that accounts for the fact that a total drive failure
=> impacts more data).  The biggest practical issue I've seen with large
=> drives is that it takes a good day or more to transfer their entire
=> contents (that is at peak sequential transfer speed - obviously any
=> kind of seeking or inefficiency significantly slows that).  So, when
=> you're scrubbing, replacing, backing-up, restoring, etc - the downtime
=> can be significant (or the time at risk for online operations).  This
=> is why there has been a move towards tolerance for two disk failures
=> with the increase in drive sizes - your opportunity for a double
=> failure goes up as the time to replace a drive increases.

Yes.

=> 
=> This is also why RAID or a similar solution matters a lot for uptime
=> when you have a lot of data.  Even if you have full backups, and even

Absolutely.

=> without downtime.  I don't have a TON of storage at home but at this
=> point drive replacements seem to happen 1-2x/yr regularly.

Drive quality, air conditioning, power quality, ventilation, and vibration all matter.

At $WORK, from a small sample -- about 100 drives -- we see about 2~3 failures on consumer-class desktop PC hard drives annually (going up in year 6+).

We used to see about 5~10% failures on external USB hard drives, from a similar size sample. I suspect power supplies more than physical damage (ie., the drive
being knocked off the desk while it's spinning). That failure rate has gone done, largely due to higher quality drives and
SSDs.

On enterprise-class spinning disks in the datacenter, out of ~300 drives, we see about 0.15% failure -- about 1 hard drive failure
every other year.

For each category, those observations are over ~10 years.

However, my numbers aren't even accurate enough to be called annecdotes. For actual data, see:

	https://www.backblaze.com/blog/backblaze-hard-drive-stats-q1-2020/

Mark

=> 
=> -- 
=> Rich
___________________________________________________________________________
Philadelphia Linux Users Group         --        http://www.phillylinux.org
Announcements - http://lists.phillylinux.org/mailman/listinfo/plug-announce
General Discussion  --   http://lists.phillylinux.org/mailman/listinfo/plug