Troubleshooting failure to farm with lots of plots, due to harvester 30s timeout

I’m using Windows 10; is NFS an option? If so I can do that! My options are

  • SMB/CIFS
  • AFP
  • NFS

Right now I have all other protocols disabled except SMB/CIFS. Ah, looks like you can use NFS; I’m willing to give it a shot.

Good idea; unfortunately it doesn’t correlate. None of the individual tests I did on each of the NAS’es above stood out as particularly bad, but I only tested 1 file per server.

In the meantime I just moved to 15 folders per NAS, so that’s ~55 k32 plots per folder. Just to rule that out as a factor. I know for absolute certain that one giant folder was a huge performance problem.

1 Like

I just tested -n 100 on a single plot and got 43 seconds. This is locally connected via USB, not using RAID. However, this is not an apples to apples comparison with actual harvesting. “chia plots check -n 100” assumes that you have a winning proof every time and it’s doing 100 different checks.

I opened a bug on this. It’s so demoralizing to not be able to farm at all based on my storage network being… too slow, I guess :frowning:

2 Likes

I don’t believe they can change the 30 seconds. That’s the wait window that the TimeLord has to receive responses from the network. You’ll need to figure out where the bottleneck is.

Yeah, I’m pretty screwed. I realize that… I need to transfer like 300tb to different storage mechanisms…

In the meantime I’m close to giving up on Chia, because… gosh… what can I do? I don’t see anything obviously wrong based on the above benchmarks? Do you?

I can’t personally help, but I would be really surprised if there weren’t more people running NAS that have gone through something similar. Also, I realize you’re already beyond this helping you, but have you tried Chia Dog?

1 Like

Time to pull out the manual. What NAS enclosures are these?

Edit: Disregard! I found it.

Terramaster F5-422 all in raid 0 stripe with 5 x 18tb drives, using Zyxel 10gbps / 2.5gbps / 1gbps switches. Latest firmware / OS on all.

Are you using BTRFS on them? If so, I wonder if it has anything to do with CoW?

I have nothing definitive, but I’ve been grepping logs for a few hours and I feel like I have 2 systems that would have been under similar load, but 1 had a lot more (~200/10000) really slow (>10s) attempts than the other (~2/5000).

One is CentOS 7 with XFS filesystems. The other is Ubuntu 20.04 with ext4 filesystems. Hmmm.

No. I go with the oldest, most stable choice when it comes to storage… ext4 all the way.

First of all, don’t give up, you have so many plots, once you fix whatever is going on, you’ll still be in a good spot to catch back up.

Second, I know I shared this before, but snag one of these and migrate a few drives to it and see if that can help: Yottamaster Aluminum Alloy 4 Bay 2.5/3.5 Inch Type C External Hard Drive Enclosure USB3.1 Gen1,Mac Style Designed for Personal Storage at Home&Office- [PS400C3]

If it does, you can migrate everything to these and get rid of the nas and probably end up with a bunch of money back.

One challenge you will face is you can’t move those raided hard drives without data loss, so you’re going to have to deal with some amount of transfer time

2 Likes

My finished plots from that system ended up on a 6TB disk with ext4 and a 4TB disk with XFS, so it wouldn’t be purely filesystem based. My other one with no issues is all ext4. I swear though, the one machine feels like it’s slow on attempts.

I have really old (>5 years) hardware that’s worse than almost everyone you see on the forum, so it doesn’t make sense that yours can’t keep up. I wonder if a lot more people are actually having similar issues and just don’t realize because the UI doesn’t even hint that something might be wrong.

I was having similar issues a few days ago and was disappointed the UI didn’t flag it. I was the person that posted asking about having an 84s attempt in my logs. My attempt times are way better and the only thing that’s really changed for me is that my disks are filling up and my systems are more idle than before.

Are those NUCs and NASes all in independent sets (1 NUC, 1 NAS) or is everything tied together. If it’s all one big setup, I’d consider splitting off a single set to test a simpler setup before doing anything drastic like moving plots around.

3 Likes

I 100% guarantee you that this is the case.

It’d be more encouraging if I could identify a specific “problem” device, but per my tests above… I really can’t. I’ll try doing a plot test of say, 5 random plots per device next and see if patterns emerge.

1 Like

Did some quick researching and it didn’t come back good. I saw some posts for the F5-421/422 and complaints that it was slow or that it was dropping network speed to zero and then picking up again. Most of the posts ended in the enclosures getting replaced with other enclosures from different brands. Some people contacted customer support. The first question that CS asked was what drives are in the enclosure and if they are on their “recommended” list.

One post did say that NFS helped. One post said that when they turned on cache it started acting slow and intermittent. This could explain why you had those invalid plots if they got tweaked going over to them.

Also, if you decide to begin moving plots, don’t raid the drives that holds the plots. JBOD mode is recommended because you won’t lose as much if a drive goes down.

3 Likes

You’d figure all these are linux underneath, so kind of the same thing in the end. Hard to see how there are these big conceptual differences when everyone’s running the same operating systems & code.

Yeah, I looked at JBOD, but JBOD doesn’t protect you from single disk failure, and I wanted speed of ingestion. I’ll try NFS first and see if that helps. But this 30 second thing is kind of poisonous long term for the project, especially the way it is a silent killer.

1 Like

What if you split the harvesting responsibilities across multiple harvesters, even if they are windows boxes, it could be that checking 10+ plots that pass the filter is taking longer than if you had 4 harvesters checking 3 plots each, using something like this: Farming on many machines · Chia-Network/chia-blockchain Wiki · GitHub

Could you run the harvester in a container on the terramaster?

1 Like

I could run multiple harvesters if you think that would help on speed of proofs? Would that really help?

The page you linked to is a little complicated for me, to be honest…

1 Like

Kiwi or storage_jm or someone in support would know more, but I’m just going off the wiki. It might have been mentioned in this YouTube video too.

It also makes your overall farm quicker and more efficient when replying to challenges.

And the farming expert YouTube video.

1 Like

I bet it could be simplified if you just run full node mode, but turn off UPnP and do manual port forwarding. UPnP is the main reason they suggest one node per home. Hopefully someone in the support channel would have advice. People turn off full node and just harvest so they can run on raspberry pi 4’s with 12-24 USB drives.

2 Likes