My hard disks is starting to fail

sentosa · September 26, 2021, 4:02pm

I have about 280 TB plot keep in internal hard disk and external hard disk with case .

After 4 months it start to failed about 3 - 4 hdd failed … mostly the internal

Any one know why , I suspect the vibration from stack hdd or bad psu , Im using cheap oem fully modular psu to coonect some of my drive , daisy chain the hdd .

any tips how to stack hdd safely ?

or is it better to put them inside pc case .

mostly it is click of death , I can hear it clicking sound .

Bones · September 26, 2021, 4:51pm

What are your temps like?
I use crystal disk to monitor those.

Voodoo · September 26, 2021, 5:01pm

My money would be on either overheating, or bad power supply.
Disk are very close together without forced cooling, a fan from afar might not do the trick.

Personally I would never use a cheap power supply for anything, let alone expensive hardware. It’s just not the part you want to save money on.
another part of power issues might be in the grid itself. This is reason why some people use a UPS or AVR (voltage regulator) to regulate the incoming power.

hdd’s always vibrate to some extend, what really kills them fast is sudden shocks when running.
So if you have a wobbly floor or shelf/desk that might also cause problems just from walking past.

DjDemonD · September 26, 2021, 5:43pm

There is a device designed for this very application:

ap838b_hp_p200_g3_lff_msa_chassis_1_17

Jacek · September 26, 2021, 8:54pm

What is the brand of your drives? Looks like more and more people are starting to have issues with Seagate USB drives (whether shucked or not). I am also not happy with SMART data from those Seagate drives (comparing to WD). Those drives have 2 years warranty, and are meant for rather mostly idle / suspended use, where we keep those drives 24/7.

Thanks for those picture, I think your temps should be in a good range, as you have that big fan in front, and looks like smaller fans at the back of those drives. Would be nice, if you could confirm, though.

Bones · September 26, 2021, 8:58pm

They said they were mostly internals failing, good to get a brand, model though.

Oddly the 1 I have failing is a seagate exos, but they shipped from the states as were cheaper and were really badly packaged, not 100% but I suspect my 1 failing is the one that arrived with big dents in it.

Jacek · September 26, 2021, 9:11pm

All modern drives, when powered off, park their heads in a rather very safe zones. Also, if you check G forces that they can sustain (when powered off), those are recently rather high. So, I would be also concerned with a box being beaten badly, but the data says that it should not be an issue.

I also read some time ago, that HDs may be prone to damage when air-shipped. Those drives are basically sealed, new ones have helium inside. When they are high in the air, and potentially got some beating before that, they may start leaking the air/gas that they have, and start sucking dirty air when landed. I am not sure whether this still is an issue, though.

On the other hand, I purchased 8TB Seagate and WD about 2-3 months ago, and they have virtually the same amount of hours/power cycles. However, SMART data for WD is pristine, where Seagate has already plenty of errors reported.

I have never believed in quality coming from Seagate. As I remember, I read tons of reports, where Seagate were always bottom feeders. I basically never had issues with WD, Hitachi/IBM, and in one/two cases, the RMA process was smooth. I am really concerned, that we may be pushing those Seagate drives past their limits.

Jacek · September 26, 2021, 9:14pm

I partially agree. The one that I don’t is that you can do virtually everything you mentioned, but if a HD is just POS, it really doesn’t matter, it will just fail two days later.

Maybe we should start asking the brand name question every time we see similar issues. It may be that we are guinea pigs pushing those (cheap) drives to their limits (I know, Exos should be high-endurance, and they have 5 years warranty - so maybe those are a bit better).

Mugen0815 · September 26, 2021, 9:51pm

I just remembered, that we had a case here, where some Seagate-hdds needed 5 wires instead of 4: https://chiaforum.com/t/my-hdds-from-seagate-ironwolf-pro-nas-failing-one-after-the-other-help/12237/15

sentosa · September 26, 2021, 10:56pm

I use sentinel , all green below 45 c.

sentosa · September 26, 2021, 10:58pm

yes. … all temp are in green in sentinel …

sentosa · September 26, 2021, 10:59pm

all hdd are seagate .

Bones · September 26, 2021, 11:58pm

I’m really not sure then.
Worth looking at connectors maybe as Mugen0815 said, but other than that hard to say
Just try to return them ASAP.

Jacek · September 27, 2021, 12:09am

I am not sure whether that is right.

HD draws power from 12/5 V lines. The introduction of 3V was to “enable” staged power-on only. HOWEVER, no consumer grade computer has staged power-on, therefore, that 5 lines power connectors are more or less optional. You can find that feature only on add-on SAS/SATA HBA/RAID controllers. Still, those controllers are using data connection to control staging, as that feature preceded that 3V addition by years, and doesn’t require modifying power cables (e.g., for older servers, etc.).

As far as I know, that 3V line is only used in proprietary USB/SATA controllers for enclosures manufactured by the original HD manufacturers. The purpose for that is to discourage people from shucking drives, and using them as internal ones.

So, is that 3V line needed? Maybe for some HDs. However, it should have nothing to do with drive performance, as it is only needed during the power-on phase.

This video was just posted (didn’t see it before I wrote that stuff). Gives a better explanation of that 3V line. By the way, Janis has really good videos in his channel.

Jacek · September 27, 2021, 12:38am

In general, that is what Seagate wants you to believe, as they want to keep their quota of rejected RMAs. You need an example, I guess. There is an FTC ruling from 2018 that says that when you purchase a USB drive, you can shuck it, and that doesn’t void any warranty. However, when you ask for RMA for your Seagate drive, the first question is whether you shucked it, and thus your RMA is rejected. Once you quote that FTC ruling, there is a confusion on the other end of the phone line, a quick chat with the supervisor, and puff, your RMA request is good. Another example? With NVMe market heating up, and basically Samsung beating everyone in consumer grade components, suddenly all manufacturers claim the same TBW rates as Samsung. The reason was very simple - virtually no normal installation was anywhere close to hitting even small fractions of those TBW levels. When we started farming, Crucial was the first to introduce changes to their warranty to say that farming chia voids it, plus stating that it will be applied retroactively. So much for their overblown TBW rates.

My previous (two/three) experiences with WD was that I sent them SMART report, and no question asked, those drives were replaced. Of course, once I cooked a couple of drives, when I did my first RAID1 array. They were really yellow, and the label smelled like burned. I didn’t ask for RMA at that time, though

So, what you stated there is correct, but engineering language doesn’t recognize adjectives Therefore, we may disagree on what sudden or shock means.

The only time that HD is sensitive to shocks is when they read or write. As soon as that operation is done, heads go to a safe zone that is more or less immune to both sudden and shocks. So, for us (farmers), our drives have heads over platters (reading) when: 1. partials are found, and 2. those partials needs some more digging. Let’s say that #1 takes 5 sec (time allowed to return partial), and #2 takes 30 sec (time allowed to process plot). You can take the number of partials you find per day, and divide that by number of drives (use that 5/30 sec), and you get roughly how many seconds per day your drive is exposed to sudden shocks (my bet is about 5 sec/day or so on average). If you are a member of Flex pool, then you do maybe 5-10x more partials, than any other pool The fact that those drives are 24/7 in ACTIVE/IDLE state really doesn’t matter, as heads are all the time parked.

Vibrations on the other hand are mostly problems for RAID arrays (not JBOD), as every write spans all drives, as such each drive heads potentially move out of sync, etc. Our farms don’t run RAIDs, and if proofs are found, those are usually on one drive only. So, I would really not worry about that.

My take is that unless you put those drives on your grandpa’s subwoofer (remember, he is partially deaf, but still the sudden part is missing here), or carry them on your motorcycle while dragging a very long power cable, they should be fine.

Sure, we should do whatever we can to eliminate whatever is in our power to improve things (e.g., those drives on that picture maybe would benefit from a rubber mat, so a hammer hit in that table would be less sudden). But, we should not get paranoid.

It just smells more and more that Seagate has issues as we kind of pushing the envelope from what they assumed would be a consumer operation (well, then what about Exos drives?).

sentosa · September 27, 2021, 11:24pm

Update :

change the PSU … make the hdd more stable … less clicking sound .

md-chia · September 27, 2021, 11:34pm

A bit normal I guess … could be your PSU couldn’t deliver the required power as required for your farm although it should

The higher certification of a PSU the better

Screenshot - 2021-09-28T013239.914

Even a Titanium 80 plus still loses a 10% simply on heat from total power available on the label, so a 650W one will at a 100% PSU load deliver max like a 585W , so if the PSU can’t deliver all even if calced correctly HDDs will choke …

Jacek · September 28, 2021, 6:19am

I didn’t think about that earlier. Looking at Seagate X18 Exos datasheet, there is a row there called “Shock, Operating 2ms (Read/Write) (Gs)” and all those drives have 50 there (assuming 50Gs when under r/w operation; it is also 200 G when nonoperating). 50 G is a bit of force. Although, it may be that a constant vibration could be more damaging than one single drop on a carpet for such drive.

So, yes, better safe than sorry, and really take a good care of those drives. However, maybe vibration is the least problematic issue for our drives.

Although, good catch with that PSU. I also agree that PSU is not the place to save money, as troubleshooting it is usually tricky.

Cryptoplotter · September 28, 2021, 7:05am

Have you disabled all your power saving options for your HDDs? I have heard some rumors that drives start to fail when they power up and down all the time.