Harvesting is hanging up (plots passed filter)

HououinK · June 29, 2021, 1:41pm

I have a problem. Lately, for some reason chia app is stopping attempts to pass plots through filter. I mean, it’s syncing normally etc, but plots are not passing the filter afer few hours of farming. I need to restart the app, then plots are passing the filter again.

I have almost 7000 plots and more than 60 HDDs.

Do you know what could be causing this problem?

Yes, I’m synced. Yes, nodes that I’m connected to are synced too. I’m connected to the internet via fiber connection and my farm is connected via ethernet cable, not wifi.

I tried checking logs, but I don’t even know what I need to search for in these logs.
I’m using only 1 machine for farming and disks are connected directly to this machine (not shared via SMB).

When it works “Latest Block Challenges” are going sequentially (without skips).

I’m using windows 10 pro machine to farm (I use ubuntu only for my ploters).
It has 32GB RAM and Ryzen 4600, so it’s definitely not about low-end hardware.

ChiaMax · June 29, 2021, 2:07pm

The plot filter works as follows:

plot filter bits = sha256(plot_id + chalenge_hash + signage_point)
if the plot filter bits starts with 9 zero’s then the plot passes filter.

if plots stop passing filter then one or more of the 3 inputs can’t be accessed.

plot_id ← do the plots become unreadable? I/O issue?

chalenge_hash + signage_point ← these come from the blockchain, you mention “When it works “Latest Block Challenges” are going sequentially (without skips).”
does this mean when the plots stop passing filter, the Block Challenges are not coming in anymore?

HououinK · June 29, 2021, 2:30pm

Hmm. It just stopping updating “Last Attempted Proof” and " Latest Block Challenges". It hangs completely (I mean there’s no updates in these sections, the app works normally).
I checked my HDDs many times and there’s no problem with writing and access time at all. chia plots check is listing and testing all plots without any trouble.

ChiaMax · June 29, 2021, 2:34pm

sounds like the issue is around you not getting any new Block Challenges (one of the required inputs for the plot filter)

check firewall / router / proxy and Internet provider rules around number of requests allowed on a non-standard port… there may be a DDOS protection in there that kicks in after x number of requests and starts blocking your connection with the block chain

HououinK · June 29, 2021, 2:38pm

Nope, I’m connected to many nodes with synced blockchain. My internet provider doesn’t have any limits.
Yes, I’m behind the NAT, but it doesn’t change anything. The app is syncing blockchain without any problem too (if this was the case I’d have many problems with other apps I’m using that are heavy request based). I’m farming since April and I didn’t have these problems before. It started (probably) after I updated to 1.1.7 about 2 weeks ago.

The logs tell me nothing too (I changed log level to info). I just can’t see anything unusual around the hour where it “Last Attempted Proofs” stopped.

If it was connection error, it would eventually start getting new challenges after some time. But it’s not, till I restart the app.

I’m not sure if “Latest Block Challenges” are going up when “Last Attempted Proof”
I’ll check it when it will hang again.

ChiaMax · June 29, 2021, 2:50pm

ok, your post title was focused around the plot filter. and it’s now narrowed down to you not getting new Block Challenges.

from this point it’s very hard to debug what is going on… as you said it won’t be in the logs, other than they will show you are missing on expected signage points + block challenges by omission (as they are supposed to come in regularly on fixed timed intervals).

HououinK · June 29, 2021, 2:58pm

The only warning messages in log files that I’m getting is about 1 bad plot that is unreadable due power loss (probably logical bad sector), but I had this almost from the very beginning and it didn’t affect the blocks that I farmed before and farming generally.

To sum it up:
Last Attempted Proof - this section definitely hangs up regularly (and based on HDD activity, it is really not working)

Latest Block Challenges - this I’m not sure about. It’s probably going on from 0 to 63 all the time. I’ll check it and report back when it will hang again.

chia plots check - reports every valid plot no matter when I try running this command

blockchain/sync/nodes - is always synced and I’m connected to other nodes all the time. Even when it hangs, the app is still showing “farming” and “synced”.

response time is ranging from 0.5 to 2.5s
(
16 plots were eligible for farming 447094f968… Found 0 proofs. Time: 0.96882 s. Total 6779 plots
10 plots were eligible for farming 447094f968… Found 0 proofs. Time: 0.76567 s. Total 6779 plots
16 plots were eligible for farming 447094f968… Found 0 proofs. Time: 1.25683 s. Total 6779 plots
22 plots were eligible for farming 874eba554f… Found 0 proofs. Time: 1.05054 s. Total 6779 plots
etc
)

ChiaMax · June 29, 2021, 3:03pm

The number 0 to 63 indicates the signage point index… there are 64 of them and then the index starts at 0 again… this is normal.

Try to find when this stops, like you said… if these keep coming in OK…
Then we’re back to the drawing board.

here’s the info on what signage points are… how the plot filter works etc.
https://manuals.plus/chia/chia-network-consensus-explained

ChiaMax · June 29, 2021, 3:19pm

That’s number of plots that passed filter.

Before a Proof is found, a few more steps are needed (see the manual I linked above)

in short, the next steps:

proof of space challenge = sha256(plot filter bits)
if this challenge is in table 7, then the linked pair of two x-values from table 1 is found
hash the x-values generated in this way into a 256 bit string to determine whether the proof is good.
Hashing these x-values gives us the quality string, a 256 bit random value.
Proof of Space ( collection of 64 x-values that have a certain mathematical relationship )
challenge is in table 7, then the linked pairs of two x-values from each table 1-7 table is found
resulting in 64 x-values

HououinK · June 29, 2021, 3:22pm

Thanks, but I know how it works (at least enough to know what these numbers mean).
I think that you didn’t understand what I mean.
I pasted it just to prove that response time is alright too and I’m almost sure that it’s not an IO problem.

HououinK · June 29, 2021, 7:51pm

Okay it hung up again (8:22:05 PM, now its 9:47 PM)
I mean Last Attempted Proof section.

Latest Block Challenges is not affected by this, it’s still goin on.

This is what log file says:

2021-06-29T20:22:14.652 daemon __main__                   : INFO     Websocket exception. Closing websocket with chia_harvester code = 1006 (connection closed abnormally [internal]), no reason Traceback (most recent call last):
  File "websockets\protocol.py", line 827, in transfer_data
  File "websockets\protocol.py", line 895, in read_message
  File "websockets\protocol.py", line 971, in read_data_frame
  File "websockets\protocol.py", line 1051, in read_frame
  File "websockets\framing.py", line 105, in read
  File "asyncio\streams.py", line 679, in readexactly
  File "asyncio\streams.py", line 473, in _wait_for_data
  File "asyncio\selector_events.py", line 814, in _read_ready__data_received
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "chia\daemon\server.py", line 172, in safe_handle
  File "websockets\protocol.py", line 439, in __aiter__
  File "websockets\protocol.py", line 509, in recv
  File "websockets\protocol.py", line 803, in ensure_open
websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason

2021-06-29T20:22:17.965 full_node chia.full_node.full_node: INFO     Added unfinished_block 8c7772e27744b7953109555dbbed36e2277b0b5bf20d4224e6c2e4a9f5335647, not farmed by us, SP: 34 farmer response time: 5.126590013504028, Pool pk xch1f0ryxk6qn096hefcwrdwpuph2hm24w69jnzezhkfswk0z2jar7aq5zzpfj, validation time: 0.13141226768493652, cost: 126224940, percent full: 1.147%
2021-06-29T20:22:22.727 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 35/64: CC: 4976e27f96f03645c614a9cb7fb525e89d88c83c7dd49884a1332e553296fc93 RC: c1f32203b6d77449db4fce39c6ad8e83f9e3ff5826ab5876d3943175c4ed3e21 
2021-06-29T20:22:33.095 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 36/64: CC: 927e058389856de3e9852772f8aa38a9fdb711fef258b3ee8b91d3864d723b02 RC: 290952c98a1592b5fb4a702ff8678dbd67f012bfb38f849cccafe15864279a47 
2021-06-29T20:22:39.134 full_node chia.full_node.full_node: INFO     🌱 Updated peak to height 501865, weight 292944880, hh 8b394fcf81c93087f6fe82459f6ec31f9ce34a86efb22e9a9070aadd58b1ab84, forked at 501864, rh: 23b3b40752ec2d558cfbf5f25ffecd9fcd63857b5d22e53ae60985d4cf3e9975, total iters: 1670487664642, overflow: False, deficit: 6, difficulty: 2176, sub slot iters: 122159104, Generator size: 3719, Generator ref list size: 1
2021-06-29T20:22:39.143 full_node chia.full_node.mempool_manager: INFO     Size of mempool: 0 spends, cost: 0 minimum fee to get in: 0
2021-06-29T20:22:39.150 full_node chia.full_node.full_node: INFO     Block validation time: 0.4887700080871582, cost: 126224940, percent full: 1.147%
2021-06-29T20:22:39.403 full_node chia.full_node.mempool_manager: INFO     It took 0.005004405975341797 to pre validate transaction
2021-06-29T20:22:39.414 full_node chia.full_node.mempool_manager: INFO     add_spendbundle took 0.009008169174194336 seconds, cost 10820854 (0.098%)
2021-06-29T20:22:39.521 wallet chia.wallet.wallet_state_manager: INFO     Coins removed [] at height: 501865

If I understand correctly it has something to do with internet adapter.
But I’m connected via RDP all the time and it didn’t disconnect me.

What can I do with this? Shouldn’t it reconnect automatically when there’s super minor connection interrupt?

ChiaMax · June 29, 2021, 8:23pm

ok, you found the issue.
the farmer stays connected to the blockchain and gets new signage points OK.
the harvester is disconnected from the farmer and does not get new signage points → plots can’t pass filter on the Harvester as they require the challenge_hash + signage point as input.

how to fix this… replacing the NIC / updating firmware / updating drivers may be a good start…

dctech · June 30, 2021, 1:00pm

I have experienced harvester lockup’s but in my case it was always related to underlying storage going bad. So let me backup a bit, I don’t use GUI as I farm on multiple machines so simply use CLI even if farmer is on Windows and just tail the debug log to monitor the system. When lockup’s occurred it was always when harvester had an issue reading or accessing plots in one of the folders added for farming. I farm on many USB HDDs, as well as NFS & SMB shares and have had issues with one NFS share on a WD DL4100 which eventually had a board failure. When I experience harvester lockup the first thing I do is to try to isolate which folder/mount is having issues and as soon as I unmount that path harvester resumes normal operation on its own while trying to catch-up which will result in many attempts with 0 plots matching filter. Your problem may be different but just want to share my experience.

HououinK · June 30, 2021, 2:37pm

Can’t the app just restart the farming service after disconnect occurred? I mean it’s probably extremely minor interrupt. I tried updating drives but it hung up again anyway.
Event viewer is not showing any warnings about network adapter too.

spencerbachus · June 30, 2021, 2:46pm

Having the same issue. Mine will eventually start passing plots but an hour in the future? Lol
Reinstalled multiple times. Now putting it on ubuntu

ChiaMax · June 30, 2021, 3:18pm

I’d have to dive into the Chia code what kind of error handling is in place when the harvester is disconnected from the farmer, if there is a retry etc.

HououinK · June 30, 2021, 3:31pm

I bought new 1gbit intel pcie network adapter. I hope it will hep, because I don’t have a time to non-stop check my farm.

HououinK · July 2, 2021, 10:41pm

Hi again.
I replaced network adapter (or rather added new) and it hung up anyway.

HououinK · July 5, 2021, 6:17pm

Anything what can I do?

ChiaMax · July 5, 2021, 7:39pm

Really hard to say from the other end of a forum.

Is the harvester starved of resources when it disconnects from the Farmer?
Are you also plotting on the Harvester or Farmer?