Troubleshooting failure to farm with lots of plots, due to harvester 30s timeout

Correct, the ship has sailed by that point. Now if you see “found 1 proofs” and it’s more than 30s. That’s when a chance was lost.

Edit: I should read previous comments. Haha.

1 Like

OK! since “one giant folder” per NAS was clearly a problem, and moving to 5 folders per NAS helped a lot… I figured what the hell… let’s double down and go to 10 folders per NAS. That’s about 82 plots per folder (these are 5 × 18tb = 90tb) on each NAS.

Kind of annoying from a data entry standpoint but compared to NO FARM WINS EVER, a small price to pay? My initial tests looked good, so I switched them all over and the results are…

(all lines start with {timestamp} harvester chia.harvester.harvester: INFO which I’ve removed for readability.)

8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.74567 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.92976 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.01540 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.11716 s. Total 3922 plots
6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.51660 s. Total 3922 plots
9 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 37.96922 s. Total 3922 plots
11 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 63.07326 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 50.81979 s. Total 3922 plots
6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 54.92095 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 42.29693 s. Total 3922 plots
8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 29.67779 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 18.59361 s. Total 3922 plots
8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 6.10934 s. Total 3922 plots
10 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.17564 s. Total 3922 plots
13 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.82808 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.82815 s. Total 3922 plots
12 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.56252 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.26593 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.42098 s. Total 3922 plots
3 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.36354 s. Total 3922 plots

well, definitely better. But I still have quite a few going outside the hard 30 second interval… need to think about this a bit more :worried:

Those times are still not good. I’m using USB and almost always take <1s. Is there any way you can connect your disks to your harvester directly? Even if it’s only temporary while you figure out what’s going on with your NAS?

2 Likes

What times do you get when you run

chia plots check -n 100 -g {filename}

on a locally attached drive via USB? Even over the network to a locally attached disk with no NAS overhead, and a pretty fast (2.5gpbs on both ends), I get 0:47 - 1:00, versus about 2x to the NAS storage.

The NAS devices are raid-0 stripes of 5 drives, so they should offer reasonable performance!

1 Like

Are those NASes / machines all idle?

I have a machine that was just about to copy a finished plot, so I did a chia plots check -n 5 just before the copy and during the copy. Feel free to double check my math.

Before:

2021-04-24T17:16:50.241  chia.plotting.plot_tools         : INFO     Loaded a total of 13 plots of size 1.28675157350699 TiB, in 0.7198820114135742 seconds
2021-04-24T17:16:50.241  chia.plotting.check_plots        : INFO     
2021-04-24T17:16:50.241  chia.plotting.check_plots        : INFO     Starting to test each plot with 5 challenges each
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     Summary
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     Found 13 valid plots, total size 1.28675 TiB
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     13 plots of size 32
  • 17:17:31 - 17:16:50 = 41s
  • 13*5 = 65 tests
  • 41s / 65 plots = .631s per test

During:

2021-04-24T17:18:59.269  chia.plotting.plot_tools         : INFO     Loaded a total of 13 plots of size 1.28675157350699 TiB, in 0.7343041896820068 seconds
2021-04-24T17:18:59.269  chia.plotting.check_plots        : INFO     
2021-04-24T17:18:59.269  chia.plotting.check_plots        : INFO     Starting to test each plot with 5 challenges each
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     Summary
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     Found 13 valid plots, total size 1.28675 TiB
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     13 plots of size 32
  • 17:39:18 - 17:18:59 = 10m19s = 619s
  • 13*5 = 65 tests
  • 619s / 65 plots = 9.52s per test

So IMHO it would be worth keeping track of if / when you’re copying full plots onto your NASes to see if that correlates with the slow times you see in the logs. That also might explain the randomness there seems to be; some idle NASes are fast while any being loaded down are slow. That’s just a guess, but it seems plausible to me.

2 Likes

What protocol are you using to connect to the NAS? I believe NFS has the quickest seek time? Its been a while.

1 Like

I’m using Windows 10; is NFS an option? If so I can do that! My options are

  • SMB/CIFS
  • AFP
  • NFS

Right now I have all other protocols disabled except SMB/CIFS. Ah, looks like you can use NFS; I’m willing to give it a shot.

Good idea; unfortunately it doesn’t correlate. None of the individual tests I did on each of the NAS’es above stood out as particularly bad, but I only tested 1 file per server.

In the meantime I just moved to 15 folders per NAS, so that’s ~55 k32 plots per folder. Just to rule that out as a factor. I know for absolute certain that one giant folder was a huge performance problem.

1 Like

I just tested -n 100 on a single plot and got 43 seconds. This is locally connected via USB, not using RAID. However, this is not an apples to apples comparison with actual harvesting. “chia plots check -n 100” assumes that you have a winning proof every time and it’s doing 100 different checks.

I opened a bug on this. It’s so demoralizing to not be able to farm at all based on my storage network being… too slow, I guess :frowning:

2 Likes

I don’t believe they can change the 30 seconds. That’s the wait window that the TimeLord has to receive responses from the network. You’ll need to figure out where the bottleneck is.

Yeah, I’m pretty screwed. I realize that… I need to transfer like 300tb to different storage mechanisms…

In the meantime I’m close to giving up on Chia, because… gosh… what can I do? I don’t see anything obviously wrong based on the above benchmarks? Do you?

I can’t personally help, but I would be really surprised if there weren’t more people running NAS that have gone through something similar. Also, I realize you’re already beyond this helping you, but have you tried Chia Dog?

1 Like

Time to pull out the manual. What NAS enclosures are these?

Edit: Disregard! I found it.

Terramaster F5-422 all in raid 0 stripe with 5 x 18tb drives, using Zyxel 10gbps / 2.5gbps / 1gbps switches. Latest firmware / OS on all.

Are you using BTRFS on them? If so, I wonder if it has anything to do with CoW?

I have nothing definitive, but I’ve been grepping logs for a few hours and I feel like I have 2 systems that would have been under similar load, but 1 had a lot more (~200/10000) really slow (>10s) attempts than the other (~2/5000).

One is CentOS 7 with XFS filesystems. The other is Ubuntu 20.04 with ext4 filesystems. Hmmm.

No. I go with the oldest, most stable choice when it comes to storage… ext4 all the way.

First of all, don’t give up, you have so many plots, once you fix whatever is going on, you’ll still be in a good spot to catch back up.

Second, I know I shared this before, but snag one of these and migrate a few drives to it and see if that can help: Yottamaster Aluminum Alloy 4 Bay 2.5/3.5 Inch Type C External Hard Drive Enclosure USB3.1 Gen1,Mac Style Designed for Personal Storage at Home&Office- [PS400C3]

If it does, you can migrate everything to these and get rid of the nas and probably end up with a bunch of money back.

One challenge you will face is you can’t move those raided hard drives without data loss, so you’re going to have to deal with some amount of transfer time

2 Likes

My finished plots from that system ended up on a 6TB disk with ext4 and a 4TB disk with XFS, so it wouldn’t be purely filesystem based. My other one with no issues is all ext4. I swear though, the one machine feels like it’s slow on attempts.

I have really old (>5 years) hardware that’s worse than almost everyone you see on the forum, so it doesn’t make sense that yours can’t keep up. I wonder if a lot more people are actually having similar issues and just don’t realize because the UI doesn’t even hint that something might be wrong.

I was having similar issues a few days ago and was disappointed the UI didn’t flag it. I was the person that posted asking about having an 84s attempt in my logs. My attempt times are way better and the only thing that’s really changed for me is that my disks are filling up and my systems are more idle than before.

Are those NUCs and NASes all in independent sets (1 NUC, 1 NAS) or is everything tied together. If it’s all one big setup, I’d consider splitting off a single set to test a simpler setup before doing anything drastic like moving plots around.

3 Likes

I 100% guarantee you that this is the case.

It’d be more encouraging if I could identify a specific “problem” device, but per my tests above… I really can’t. I’ll try doing a plot test of say, 5 random plots per device next and see if patterns emerge.

1 Like