Troubleshooting failure to farm with lots of plots, due to harvester 30s timeout

I am not sure if this matters or not, but still trying to troubleshoot why I’ve had an 8/9 hour estimated time to win, but no wins for more than 5+ days (last win was Sunday, today is Saturday… so mon, tue, wed, thu, fri 5 × 24 = 120 hours with no XCH wins).

I’ve cleaned up a ton of legit plot errors in my debug.log (such a great tool; dupes, bad plots, etc). I set log level to info and a mod on the keybase support channel helped me out. He said the problem is here, where it says only a few plots are eligible for farming out of a large number:

2021-04-24T10:03:36.709 harvester chia.harvester.harvester: INFO     9 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1194.37002 s. Total 3853 plots
2021-04-24T10:03:36.709 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1169.39020 s. Total 3853 plots
2021-04-24T10:03:36.709 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1142.46883 s. Total 3853 plots
2021-04-24T10:03:36.725 harvester chia.harvester.harvester: INFO     11 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1130.84370 s. Total 3853 plots
2021-04-24T10:03:36.726 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1078.00102 s. Total 3853 plots

11 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 1130.84370 s. Total 3853 plots

:warning: the time taken is 1131s (18 minutes), he said the time should be more like 5 seconds.

Same thing later in the logs:

2021-04-24T09:50:55.225 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 534.89036 s. Total 3853 plots
2021-04-24T09:50:55.225 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 521.45308 s. Total 3853 plots
2021-04-24T09:50:55.225 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 506.62494 s. Total 3853 plots
2021-04-24T09:50:55.225 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 492.23428 s. Total 3853 plots
2021-04-24T09:50:55.225 harvester chia.harvester.harvester: INFO     2 plots were eligible for farming fd550b7cc8... Found 0 proofs. Time: 477.46826 s. Total 3853 plots

Only a few plots eligible for farming, and time taken is 500 seconds!

He said to start removing folders from the list and see if that helps. I started by removing names from shares, I had some in there like this

//name-of-machine/d
//name-of-machine/e

I changed to

//192.168.1.12/d
//192.168.1.12/e

edit: opened a bug about this, as god knows how many people are running rigs that will never win a single coin due to this silent failure mode:

8 Likes

I just asked in Keybase if anyone might know what’s up, let’s see what they say/if they can hop over here and help

2 Likes

yeah thanks – I set my debug.log to INFO level, restarted the client (it synced SUPER fast with the port forwarded) and waited 15 minutes to generate a fair bit of logs, then zipped it and sent it to the mod in the Chia Keybase support channel (with his permission).

Whatever I find out I will document here for sure.

4 Likes

Hmm, I removed all the NASes and am looking only at the AMD machine on the network. Definitely much faster, here’s me filtering for eligible for farming, note the times:

2021-04-24T11:02:09.069 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming 18783a598b... Found 0 proofs. Time: 0.40634 s. Total 479 plots
2021-04-24T11:02:18.397 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 0.96869 s. Total 479 plots
2021-04-24T11:02:30.413 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 1.25008 s. Total 479 plots
2021-04-24T11:02:39.990 harvester chia.harvester.harvester: INFO     2 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 3.17188 s. Total 479 plots
2021-04-24T11:02:48.256 harvester chia.harvester.harvester: INFO     2 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 2.25008 s. Total 479 plots
2021-04-24T11:02:55.428 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 0.10987 s. Total 479 plots
2021-04-24T11:03:06.866 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 0.17189 s. Total 479 plots
2021-04-24T11:03:14.162 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 0.25000 s. Total 479 plots

Times ranging from 3.17s to 0.10s… the keybase support channel mod said we wanted to see times under 5 seconds. That’s much better… I think I will split the NAS up into subfolders (previously it was a LOT of plot files in a single folder) and add them back on and see.

1 plots were eligible for farming e5b34f6472... Found 0 proofs. Time: 0.17189 s. Total 479 plots

Also the “eligible” count isn’t important, the total is what matters, and especially the time… we want to see under 5 seconds, 15 seconds is cutting it close.

6 Likes

Hmm, so it was that a dir scan took to long? What’s the recommended number of plots per dir? So far I’ve put them all in a single folder as well, nuts that this would hurt your chances!

1 Like

I’d imagine this is very dependent on OS and filesystem, network storage generally taking longer to scan than local storage. In general (not specific to Chia), latency can get pretty awful with the latter.

1 Like

What kind of NASes do you have? I would check the SMART stats for every disk in every NAS. I’ve seen QNAPs that refuse to evict disks with 50k bad sectors and the whole unit grinds to a halt. Checking SMART is also going to be super fast compared to adding / removing NASes and trying to grok the effect that has from the Chia logs.

Logging into the GUI to check SMART on each NAS will also help IMO. If you have identical NASes you might be able to notice a performance difference just from logging in which would help narrow it down too.

I’m interested to hear what it is when you figure it out. I’d put my bets on having a bad disk in the first NAS you filled up.

Good luck!

3 Likes

Another super low effort thing you can do (if you’re not already) that can run while you explore other options is to ping your NASes continuously from your main machine.

ping [ip address] -t

That will help surface any super obvious network problems for about 60s of setup. Hit CTRL-C after an hour and look at the aggregate stats.

Ping statistics for [ip address]:
    Packets: Sent = 9, Received = 9, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 1ms, Maximum = 3ms, Average = 2ms
Control-C
1 Like

Yeah. Splitting up the files into subfolder more or less fixed my problem, at least it is no longer timing out at 1194 seconds! Example log file entries:

2021-04-24T11:34:03.394 harvester chia.harvester.harvester: INFO     11 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 1.61972 s. Total 3882 plots
2021-04-24T11:34:07.122 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 1.35923 s. Total 3882 plots
2021-04-24T11:34:39.006 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 23.23485 s. Total 3882 plots
2021-04-24T11:34:51.613 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 24.16673 s. Total 3882 plots
2021-04-24T11:35:04.239 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 25.23309 s. Total 3882 plots
2021-04-24T11:35:30.203 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 38.57434 s. Total 3882 plots

Breaking that time out… I grabbed the last 650 harvester times and slapped them in a spreadsheet, got average 87s, min 0.8s, max 279s, median 65s. So it’s still too slow. I will note that in the Chia 1.0.0 release notes they said:

Weight proof request timeout was increased to 180 seconds.

However, “weight proof is when you ask another node to prove to you that they have a higher block, before you sync up to that block. Harvester 30 seconds is set in stone in consensus. No way to change that now.”

I need to identify which network device is the slow one, so he said to run

chia plots check -n 100 -g {filename}

on a file from each NAS to determine if one of them has performance issues, so here goes. I picked a random file on each individual NAS and executed that command; below is the output. The 3 lines I will highlight are:

  • found the plot
  • loaded / started to test the plot
  • proof result for the plot

(load times, I have been told, are irrelevant so I’ve removed those… what matters is the time it takes for the -n 100 check)

NAS A (192.168.1.222)

2021-04-24T14:04:22.928  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.222\public\chia-plots-3\plot-k32-2021-04-02-14-42-0bed5fdf88d808ef6bec5883eba6340e6e26dd59359190c968f71b44af2ce895.plot of size 32
2021-04-24T14:04:22.944  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09897997698135441 TiB, in 1.1403071880340576 seconds
2021-04-24T14:06:19.598  chia.plotting.check_plots        : INFO        Proofs 113 / 100, 1.13

14:04:22 → 14:06:19 = 1:57

NAS B (192.168.1.97)

2021-04-24T14:08:59.865  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.97\public\chia-plots-3\plot-k32-2021-04-13-13-46-88ff07fd3b7c1c954ff2cf5213bf8b720211c0982ad230cf419beef71d4d2250.plot of size 32
2021-04-24T14:08:59.881  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09899792094620352 TiB, in 4.687199354171753 seconds
2021-04-24T14:10:36.520  chia.plotting.check_plots        : INFO        Proofs 98 / 100, 0.98

14:08:59 → 14:10:36 = 1:37

NAS C (192.168.1.114)

2021-04-24T14:11:48.006  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.114\public\chia-plots-3\plot-k32-2021-04-03-21-43-fc9dcc050b5f9e44c00302b682e98870997687d85b42f11d3d663e43718f79cd.plot of size 32
2021-04-24T14:11:48.006  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09896503558775294 TiB, in 2.671677350997925 seconds
2021-04-24T14:14:09.267  chia.plotting.check_plots        : INFO        Proofs 107 / 100, 1.07

14:11:48 → 14:14:09 = 2:21

NAS D (192.168.1.223)

2021-04-24T14:15:06.287  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.223\public\chia-plots-3\plot-k32-2021-03-28-00-35-62c642e68df8e144946224f189a844f18c03d12ba5fffbcc160c14bde4c43b2f.plot of size 32
2021-04-24T14:15:06.303  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09902664939090755 TiB, in 1.6250286102294922 seconds
2021-04-24T14:17:19.467  chia.plotting.check_plots        : INFO        Proofs 113 / 100, 1.13

14:15:06 → 14:17:19 = 2:13

NAS E (.126)

2021-04-24T14:18:20.975  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.126\public\chia-plots-2\plot-k32-2021-04-11-11-28-f258f57480c978e5025d026485775cced27a47c64ff305b42ec9a1c07de7f326.plot of size 32
2021-04-24T14:18:20.990  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09895394780960487 TiB, in 5.437449932098389 seconds
2021-04-24T14:19:37.305  chia.plotting.check_plots        : INFO        Proofs 92 / 100, 0.92

14:18:21 → 14:19:37 = 1:16

AMD machine (JBOD, 192.168.1.16)

2021-04-24T14:20:27.943  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.16\chia-final-z\plot-k32-2021-04-24-02-50-a3a9d0b52561ddbde4914da7eeb4aad8a6f226709d30e96ecaea5f78bf119231.plot of size 32
2021-04-24T14:20:27.943  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09897387777891709 TiB, in 6.140592098236084 seconds
2021-04-24T14:21:15.157  chia.plotting.check_plots        : INFO        Proofs 81 / 100, 0.81

14:20:28 → 14:21:15 = 0:47

I’m… really not sure what to do with this information? Oh and @ryan I checked pings to these IPs and they were all 1ms.

1 Like

I measured a few different network drive shares to that AMD machine as well:

K drive share

2021-04-24T14:30:31.647  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.16\k\plot-k32-2021-04-18-00-33-66c0f93f6b2818f3cf31fb382c5720646c9740f8581fc8eeab467f2e3e865605.plot of size 32
2021-04-24T14:30:31.662  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.0990069069493984 TiB, in 5.953089952468872 seconds
2021-04-24T14:31:25.306  chia.plotting.check_plots        : INFO        Proofs 92 / 100, 0.92

14:30:32 → 14:31:25 = 0:53

L drive share

2021-04-24T14:32:38.084  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.16\l\plot-k32-2021-04-12-10-24-4d9589690512ac9550fc86b02d072a11e692150d04793a69c3d7d4e2067ccb79.plot of size 32
2021-04-24T14:32:38.084  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09899746607334237 TiB, in 4.546555280685425 seconds
2021-04-24T14:33:30.834  chia.plotting.check_plots        : INFO        Proofs 86 / 100, 0.86

14:32:38 → 14:33:31 = 0:53

M drive share

2021-04-24T14:34:09.475  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.16\m\plot-k32-2021-04-13-07-02-715d2cfb525e015187b8c301d6fff369a30f5d32da4b323709026739d4072082.plot of size 32
2021-04-24T14:34:09.475  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09898239240828843 TiB, in 0.796903133392334 seconds
2021-04-24T14:35:04.452  chia.plotting.check_plots        : INFO        Proofs 95 / 100, 0.95

14:34:09 → 14:35:04 = 0:55

N drive share

2021-04-24T14:37:28.740  chia.plotting.plot_tools         : INFO     Found plot \\192.168.1.16\n\plot-k32-2021-04-18-17-16-42b304d9c3f59aae72147a78b68f62c33bf26637259e1bff1303b786e75d0392.plot of size 32
2021-04-24T14:37:28.740  chia.plotting.plot_tools         : INFO     Loaded a total of 1 plots of size 0.09900080765874009 TiB, in 5.5424230098724365 seconds
2021-04-24T14:38:29.265  chia.plotting.check_plots        : INFO        Proofs 104 / 100, 1.04

14:37:29 → 14:38:29 = 1:00

You could try running one harvester per NAS and see if one’s times stand out.

2 Likes

Hmmmm… Seeing these times makes me like my model of 1 full node and every other machine being a remote harvester of its direct attached disks. With this, I am seeing times like 0.008 seconds for 234 plots.

I’ll try adding some connected to a network share and see what happens.

1 Like

OK so per Keybase support… you have to

  • get the challenge
  • build the block
  • return the proof

all within 30 seconds? So when I see lines like

2021-04-24T11:42:42.774 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 56.44280 s. Total 3882 plots
2021-04-24T11:42:42.775 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 30.93245 s. Total 3882 plots
2021-04-24T11:42:42.776 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 857f51b97c... Found 0 proofs. Time: 77.73637 s. Total 3882 plots

And when the time is over 30s, I get no chance of winning? :sob:

2 Likes

If you have all the logs try searching for “Found 1 proofs”. Since you haven’t won any blocks in several days it would be obvious if you had recently found proofs with slow times. I’d really like to know the answer too.

And you’ve chased down all the low hanging fruit I can think of off the top of my head. It’s really tough when it’s not working under normal conditions, but when you test it everything looks good :frowning:

2 Likes

Correct, the ship has sailed by that point. Now if you see “found 1 proofs” and it’s more than 30s. That’s when a chance was lost.

Edit: I should read previous comments. Haha.

1 Like

OK! since “one giant folder” per NAS was clearly a problem, and moving to 5 folders per NAS helped a lot… I figured what the hell… let’s double down and go to 10 folders per NAS. That’s about 82 plots per folder (these are 5 × 18tb = 90tb) on each NAS.

Kind of annoying from a data entry standpoint but compared to NO FARM WINS EVER, a small price to pay? My initial tests looked good, so I switched them all over and the results are…

(all lines start with {timestamp} harvester chia.harvester.harvester: INFO which I’ve removed for readability.)

8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.74567 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.92976 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.01540 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.11716 s. Total 3922 plots
6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.51660 s. Total 3922 plots
9 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 37.96922 s. Total 3922 plots
11 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 63.07326 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 50.81979 s. Total 3922 plots
6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 54.92095 s. Total 3922 plots
7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 42.29693 s. Total 3922 plots
8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 29.67779 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 18.59361 s. Total 3922 plots
8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 6.10934 s. Total 3922 plots
10 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.17564 s. Total 3922 plots
13 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.82808 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.82815 s. Total 3922 plots
12 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.56252 s. Total 3922 plots
5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.26593 s. Total 3922 plots
4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.42098 s. Total 3922 plots
3 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.36354 s. Total 3922 plots

well, definitely better. But I still have quite a few going outside the hard 30 second interval… need to think about this a bit more :worried:

Those times are still not good. I’m using USB and almost always take <1s. Is there any way you can connect your disks to your harvester directly? Even if it’s only temporary while you figure out what’s going on with your NAS?

2 Likes

What times do you get when you run

chia plots check -n 100 -g {filename}

on a locally attached drive via USB? Even over the network to a locally attached disk with no NAS overhead, and a pretty fast (2.5gpbs on both ends), I get 0:47 - 1:00, versus about 2x to the NAS storage.

The NAS devices are raid-0 stripes of 5 drives, so they should offer reasonable performance!

1 Like

Are those NASes / machines all idle?

I have a machine that was just about to copy a finished plot, so I did a chia plots check -n 5 just before the copy and during the copy. Feel free to double check my math.

Before:

2021-04-24T17:16:50.241  chia.plotting.plot_tools         : INFO     Loaded a total of 13 plots of size 1.28675157350699 TiB, in 0.7198820114135742 seconds
2021-04-24T17:16:50.241  chia.plotting.check_plots        : INFO     
2021-04-24T17:16:50.241  chia.plotting.check_plots        : INFO     Starting to test each plot with 5 challenges each
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     Summary
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     Found 13 valid plots, total size 1.28675 TiB
2021-04-24T17:17:31.386  chia.plotting.check_plots        : INFO     13 plots of size 32
  • 17:17:31 - 17:16:50 = 41s
  • 13*5 = 65 tests
  • 41s / 65 plots = .631s per test

During:

2021-04-24T17:18:59.269  chia.plotting.plot_tools         : INFO     Loaded a total of 13 plots of size 1.28675157350699 TiB, in 0.7343041896820068 seconds
2021-04-24T17:18:59.269  chia.plotting.check_plots        : INFO     
2021-04-24T17:18:59.269  chia.plotting.check_plots        : INFO     Starting to test each plot with 5 challenges each
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     Summary
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     Found 13 valid plots, total size 1.28675 TiB
2021-04-24T17:39:18.350  chia.plotting.check_plots        : INFO     13 plots of size 32
  • 17:39:18 - 17:18:59 = 10m19s = 619s
  • 13*5 = 65 tests
  • 619s / 65 plots = 9.52s per test

So IMHO it would be worth keeping track of if / when you’re copying full plots onto your NASes to see if that correlates with the slow times you see in the logs. That also might explain the randomness there seems to be; some idle NASes are fast while any being loaded down are slow. That’s just a guess, but it seems plausible to me.

2 Likes

What protocol are you using to connect to the NAS? I believe NFS has the quickest seek time? Its been a while.

1 Like