Ironwolf and Exos Hard Drive failures - Action Group

Thanks for the info, I wont contact him just yet but will in due course. Indeed, regardless of the outcome of this I think I’ll have him see these emails.

There may not be misinformation but the problem is you really aren’t supplying any information for us to know what you’re talking about

No drive failures this end (approaching 1 PiB)

After watching storage_jm’s video, re: drive power use w/ Chia . Competitive Chia Farming - Deep dive into HDD power and TCO - YouTube it seems hard to believe that Chia plot files have any effect on the drives themselves. Little long, but does show how chia use is virtually nothing on a scale of 0 to something.

If you buy more drives, why not change model and/or even better, manufacturer? Since your setup w/Seagate IronWolf seems to have high failure, have a go w/something else? That will give indications if it’s the drives being the problem, as you imagine, or your setup using them that is causing high failure of drives in general. Anything is possible, even using the best thought out setup.

1 Like

How are these drives being powered?

If you are worried about manufacturer rejecting your warranty claim because they claims their HDDs are not suited for software XYZ then you could:

  1. Encrypt the data on the HDD. This will of course make it impossible to recover any data and I would not bother with it at least for Chia plots because of #2

  2. Capture full SMART report before shipping the drive for RMA, well unless the drive is completely dead. Full/extended SMART scans should be performed on HDDs used in production workload at least once a month (plus I suggest running the short test at least 1x week), this way you will have SMART stats over time.

#2 is what I do as I do perform a long/extended test on every HDD I purchase and before I start using it for any purpose. Full/extended SMART test does not guarantee that the drive will not fail, as I’ve had drive fail within 24h after the test, but it will give you much more confidence then just slapping a new drive in to a system and starting to copy data to it. Plus, SMART info will give you valuable stats (ex: power on hours, start/stop counts, etc.) which can easily be referenced back to HDD specs as that is all that really matters regardless of that XYZ software being used to access or store the data as manufacture will not list “OK for use with XYZ software” on their spec sheet. I’m sure they are reviewing the SMART info when they get the HDD in their RMA center to confirm if actual usage did not exceed the specs. Manufacturer provided HDD check tools do not show you all this info this is why you need to use other tools to see the SMART data, my favorite on Windows & Linux is GSmartControl which is free & open source.

Just to add 2c more, I do have number of Exos X16 14TB drives in my 24/7 NAS and they have been running OK for months now with SMART test passing every time so knock on wood. These Exos enterprise drives are a bit loud vs consumer grade NAS drives like Ironwolfs (which are just overpriced in my opinion) and they do have a distinct clicking noise when the head moves during normal operation which is different from the continuous clicking you experience when drive is failing.

On the issue of recovery, there are tools you can use to check the drive (if it’s still running) to recover data but unfortunately they are not something a regular end user could do. Honestly I have the know how to do my own data recover if the drive is still spinning meaning there is no fatal mechanical issue or problem with the controller itself. Honestly I never really on the manufacturer “Data Recovery Service” as I think it is just a marketing gimmick and they will always put some kind of “…if possible” clause in fine print to get out of it as recovering data if drive is completely dead due to fatal mechanical failure or a controller failure takes way more effort than recovering data due to bad clusters.

Also, I’m not sure if Chia plots have a CRC check to tell you if the plot is damaged unless you created an MD5/hash of each file in the first place so using restored plots may be risky. if someone knows whether plots have a CRC/hash check do let me know and how that can be checked a part from the standard CLI chia plots check which I don’t think does that?

Actually, that is not exactly true. Several NVMe manufacturers were changing their warranties to explicitly target Chia. Some (one/two) wanted to change those warranties retroactively.

Crucial was potentially the biggest offender here - Crucial Caves: Says Chia Cryptomining Voids SSD Warranty, Then Retracts Post | Tom's Hardware. Although, it looks like they backed off at the end.

1 Like

@Jacek LOL, I should have put that one exception for NVMes in my original reply as I knew someone would mention this as soon as pressed “Reply”.

Yes, some manufactures did change their warranty periods or specs after Chia release to limit warranty claims without explicitly saying it was because of Chia but I did not mention it as this is more for plotting and nor farming which seams to be what OP is referring to? Honestly I think PNY may be even bigger offender here as they HEAVILY slashed their specs (MTBF specifically) on XLR8 NVMes. This is why I kept original packaging for the drives which have the original specs I PURCHASED so if they do fail and they try to reject my warranty claim I will have a valid legal case :wink:

EDIT: Correction on the PNY XLR8, it was the TBW on CS3030 that they slashed from:
250GB: 380, 500GB: 800, 1TB: 1665, 2TB: 3115, 4TB: 6820
before Chia to:
250GB: 170, 500GB: 170, 1TB: 360, 2TB: 660, 4TB: 6070
some time between 5/9/2021 and 6/4/2021 (hmm that time looks shocking familiar) according to WayBackMachine :wink:

1 Like

Yup, PNY was slashing TBW on their NVMes, as they introduced a Chia dedicated SSD for over $1k for a 2TB. I assume that their customers were thrilled, and were breaking doors to get those in bunches :wink:

Also, if I recall it right, Chia (most likely JM) was shilling for them at that time as well.

1 Like

That decision must have been from some executive bean counter. And I suspect that they course-corrected after a higher ranking executive lambasted them for their moronic decision.

Chia has likely resulted in Crucial having record NVMe sales, and then some executive puts out an announcement to kill future sales.

Someone at Crucial woke up, as the article states:
There are two parts to our current SSD warranties: calendar time (either 3 or 5 years) and/or total bytes written (up to 1200 Terabytes depending on the capacity), whichever comes first.

So why in the heck would Crucial or any other manufacturer care how much you beat on the drive. If you exceed the TBW value, then you kiss the warranty goodbye.

Besides, these NVMe drives keep plowing, well past their TBW rating. I doubt that Chia processing is causing many warranty claims. If an NVMe drive fails while plotting, then that NVMe drive was probably defective, and Chia revealed the defect all the sooner.

1 Like

Yes, that TBW and/or calendar time in general are the norm everywhere (e.g., with cars - 5 years / 50k miles). Yet still that argument is being used on Wikipedia to bash Chia, and they (Chia) do nothing to get it corrected. Plenty of other Chia bashing articles are using that source. It really boggles my mind, why Chia didn’t produce a counter article to state the obvious.

1 Like

I’m still digesting this. Chia farming bad on these drives? Am I living on Uranus? Call me confused. This makes zero sense. None. Fails logic test. Let me digest more and add more commentary.

1 Like

Well, you get what you pay for when it comes to customer service. That’s a big statement for a non-engineer to make. I’m sure they (the one that told you that) personally haven’t extensively lab-tested Chia plots on hundreds or more drives.

I’d wager to bet that IF they really are getting disproportionate numbers of failures from Chia farmers, the cause is NOT Chia plotting / harvesting directly, but probably because many farmers aren’t operating the drives under the best or recommended conditions.

Or maybe, Chia farmers are more dishonest than the average person and are filing more dubious warranty claims. Point is, I wouldn’t take that one statement from a front-line Seagate support rep to have any gravity or even be accurate. In court they’ll say that person had no authority or evidence to back up that statement and will require more evidence on your part to support the claim.

3 Likes

If they replace the drives, I think you’re asking for plotting services from them, which they won’t do, but it’s a very minor cost. I don’t see any reason to harass them for that when they do their part, data loss in Chia farming is part of the cost of doing business (and we don’t even need the data that much since we can make new plots).

This whole thread is weird.

1 Like

Like keeping the drives too HOT!!! :japanese_ogre: :japanese_ogre:

2 Likes

I get the impression that this is really common, you just have to look at pictures of people’s setups to realize how many drives are getting toasty while farming Chia. I’m a bit weary of second-hand drives for this reason.

Just throwing in my 2 cents to say that all my internal drives are Seagate Exos and none have failed over a year later. Every now and then I’ll check the SMART data on a few of them and they’re all healthy. A chunk of my farm is also in Seagate 2.5 inch externals, those aren’t even rated for 24/7 operation and are also fine over a year later.

It could be just luck, but I suspect a proper case with good airflow and using a quality power supply both play a large part.

2 Likes

How did you determine that your Seagate 2.5 external drives are not rated for 24/7 operation?

Backblaze have noted a 4.79% 14TB Iron Wolf failure rate vs a 1.03% Exos failure rate Backblaze Drive Stats for 2021

I have a mix of Exos, Iron Wolf (only one) and Barracuda drives. I’ve only had a single failure in over 12 months and that was a 4TB Barracuda a week ago

If you’re running your drives in those plastic Seagate enclosures they can get very hot and are more likely to fail early Hard Drive Temperature—Does It Matter?

That’s not the Ironwolf though, st14000nm0138 is a 14TB Exos model

Ahh right, it’s an X14 Exos. I don’t have any of those, all of mine are X16 ST14000NM001G