Harvester not participating in challenges

Hello all, since transactions went live this week it looks like my Chia farm has been having issues participating in challenges.

I’m using Chiadog to monitor the logs from my node and I’m frequently greeted with notifications like this:

This seems to happen several times a day, and usually it resolves on its own after 10-15 minutes. When the system is in this state, if I check the logs I see things like this:

2021-05-07T13:44:17.152 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7 95.150.222.126
2021-05-07T13:44:17.153 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 95.150.222.126 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7
2021-05-07T13:44:17.244 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301 51.154.15.44
2021-05-07T13:44:17.289 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 51.154.15.44 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301
2021-05-07T13:44:17.290 full_node full_node_server        : INFO     <- respond_signage_point from peer 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7 95.150.222.126
2021-05-07T13:44:17.306 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.306 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d
2021-05-07T13:44:17.307 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37 12.216.126.78
2021-05-07T13:44:17.309 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 12.216.126.78 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37
2021-05-07T13:44:17.381 full_node full_node_server        : INFO     <- respond_signage_point from peer 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37 12.216.126.78
2021-05-07T13:44:17.382 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.383 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d
2021-05-07T13:44:17.412 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 9ce7cf9783a80f5f80a9a3ed49990b57336e777a6d56c594747de2414345d225 13.66.209.137
2021-05-07T13:44:17.413 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 13.66.209.137 9ce7cf9783a80f5f80a9a3ed49990b57336e777a6d56c594747de2414345d225
2021-05-07T13:44:17.425 full_node full_node_server        : INFO     <- respond_signage_point from peer 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301 51.154.15.44
2021-05-07T13:44:17.427 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.427 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d

The node appears to be busy receiving vdfs and transactions from peers, but the log Don't have rc hash seems like a problem.

When the system is in a good state, I see logs like this:

2021-05-07T14:53:00.431 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 51/64: b361871f56c658923c3a464a17d20299267e3dac3e355b14b9a47ba8ee4e4646 
2021-05-07T14:53:07.188 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 52/64: 905cabbd799ed409e48fd689ffc56fb685e8a156ffe113d441e3e70362e87d12 
2021-05-07T14:53:15.515 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 53/64: a3191693f3589b6a15ff4c47f230fd0a8fcfb6a44f9bcbf5d91fb358e13ee691 
2021-05-07T14:53:24.971 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 54/64: a1ae62049c3c66aa2ef85bf7fba20b9c7c077b5260a3c6ab85fb40ccd8f90ea6 
2021-05-07T14:53:35.035 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 55/64: 06a9208d22e5f313d0ccd4220e38b6b753e3c30e3552eec37c216b1e8039c44f 
2021-05-07T14:53:41.206 full_node chia.full_node.full_node: INFO     🌱 Updated peak to height 245646, weight 9117292, hh 593ae0c29001564933aa68083205c195bbfa734ae9d6d88456c59f452c166f73, forked at 245645, rh: 85860e82929d83fbb9ac1b81af7b2b2f6841bf476ab1ad0dde55e0203189af8d, total iters: 795527654862, overflow: False, deficit: 0, difficulty: 182, sub slot iters: 110624768, Generator size: No tx, Generator ref list size: No tx
2021-05-07T14:53:41.895 wallet chia.wallet.wallet_blockchain: INFO     💰 Updated wallet peak to height 245646, weight 9117292, 
2021-05-07T14:53:45.231 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 56/64: a39a2e89a520b9f8e2a7b5e6cf8657f5585542ccd6aee2b9d238dbb8ae89e9c1 
2021-05-07T14:53:52.436 full_node chia.full_node.full_node: INFO     🌱 Updated peak to height 245647, weight 9117474, hh 301dcf4ccdc1989fd79162a51115aac0593cfe1f401f83cb6c48ea88d54d89b5, forked at 245646, rh: c6404499e3ba041241ce350e1385c9a72827dc8052d3743b2ca073e49416465d, total iters: 795529714011, overflow: False, deficit: 0, difficulty: 182, sub slot iters: 110624768, Generator size: No tx, Generator ref list size: No tx

None of those logs are present when Chiadog is reporting problems.

I’ve tried to do everything possible to rule out networking issues. I’m able to successfully open a connection to port 8444 to my external ip from an outside network. When I’m in a “failed” state running chia show -s -c shows that I’m fully synced and I typically have over 50 connections to other full nodes. My system seems otherwise pretty happy, low CPU usage and plenty of RAM.

I’m running a full node using the latest Chia version (1.1.4 at time of posting) on a Synology DS1520+ using the official Chia Docker container. The container is setup to use host networking, port 8444 is being forwarded, and upnp is disabled in my config.yaml.

My full debug.log from the last time I encountered this issue can be found here:

Any help would be greatly appreciated.

4 Likes

Welcome to the Forums!

There could be a lot of things happening here.

  • Are you transferring plots over the network? It could be that you get these errors because the network gets saturated with the file transfer.
  • Hopefully you’re not using any RAID either on those drives and are just running them as JBOD.
  • Are you using the Synology for anything else? Media server? Media Transcoder?
3 Likes

Thanks for your response.

  • Are you transferring plots over the network? It could be that you get these errors because the network gets saturated with the file transfer.

I am plotting on another machine and they are being transferred over the network when complete. I’ve been doing this since mainnet launch and only started having this issue in the past week. I don’t think the failures correlate with plot transfers, but I’ll keep a closer eye on this.

  • Hopefully you’re not using any RAID either on those drives and are just running them as JBOD.

No RAID, all individual volumes.

  • Are you using the Synology for anything else? Media server? Media Transcoder?

Nothing resource intensive, just Chia, Chiadog, and Home Assistant.

1 Like

I’ve been monitoring my NAS closely and so far each batch of missed signage points does correlate with a plot file transfer.

I’m going to disable the plot file transfer and continue monitoring to see if the issue resolves. If that does resolve it, I should be able to figure out a new plot transfer mechanism that hopefully doesn’t saturate the link.

It is odd that I’m just seeing this now, I’ve had the same setup for quite awhile with the same monitoring tools in place and this didn’t crop up until this week. Maybe a change in Chia 1.1.3/1.1.4 or maybe running a full node just needs a lot more resources with the explosive growth of the netspace :man_shrugging:

1 Like

I run into similar issue where Chiadog says my harvester appears to be offline. I haven’t started digging into it but your post looks like the same issue.

All in the same machine. Not correlated to rsyncing files to the final drive. Port 8444 being forwarded (UDP/TCP) to the machine.

Resource wise I’m close to maxed out but with headroom 14/16GB free, all 16 threads regularly over 80% but not at 100% for a long period of time. OS with NVMe, plot drive 4xNVME in raid0 and no other I/O to the plot drive(s).

Also interested in knowing what can be done to troubleshoot this.

1 Like

@marram are you on Windows by any chance? There is a known issue where Windows log rotation breaks Chiadog monitoring.

3 Likes

Nope. Ubuntu 20.04. Good to know that anyway.

I think the chia tool itself should have a test tool where you can point at your IP and it checks it all works.

1 Like

Since the issue was correlating with plot transfers I had assumed that the issue was network saturation. To test that hypothesis I tried (unsuccessfully) turning on QoS on the NAS to prioritize Chia node traffic (on port 8444) and de-prioritize file sharing traffic. I’m not sure the reasoning behind this, but according to Synology docs only outbound filesharing traffic can be limited (SMB, CIFS, AFP, NFS, etc).

I transitioned to using rsync with the --bwlimit option to transfer the plots while limiting the bandwidth usage. Even transferring at 20 Mbps my Chia node would stop participating in challenges, so this had to be more than just a network issue.

It’s at this point that I took another look at my Docker container settings, I noticed the CPU priority was only set to medium (the default). I had previously set this to high, but I think that setting got blown away when I updated to 1.1.4.

So now with the CPU priority now set to max, and limiting the plot transfer speed to 20 Mbps, surely things will work smoothly…

Nope.

Shortly after starting an agonizingly slow plot transfer my node once again stopped participating in challenges. The system isn’t even all that busy while this is happening, here is a screenshot of the resource utilization:

I understand this is most likely not a Chia issue, but a Synology or Docker problem, but it’s driving me crazy. At this point I don’t know what else to try. I suppose another option is to run the full node on another machine, but I was really hoping to be able to set this up to be self contained and not require additional hardware. Another option would be to plot to a USB drive and physically connect it to the NAS to transfer plots over. I’m worried that option is labor intensive and might not even solve the problem.

3 Likes

Thanks for sharing this info. This is really strange. I wonder if the NAS puts file transfers above all else. Also, could it possibly be a TCP vs UDP issue? Grasping at straws here.

1 Like

Hi, have you had any success and have any tutorials to getting chia to work on a synology using docker?

@Tigerraiders aside from the issues I’ve outlined in this post, I would say I’ve had some success :grinning_face_with_smiling_eyes:

I mainly followed the official Chia Docker documentation found here.

I couldn’t figure out a way to pull Docker images from GitHub in the official Synology Docker UI (seems like it only pulls from Docker Hub) so you have to do some stuff via the command line.

  • Install Docker from the Synology Package Center
  • Enable SSH on your NAS
  • SSH into your NAS
  • sudo su to switch to root user
  • docker pull ghcr.io/chia-network/chia:latest

Now the image is downloaded so you should be able to create a container using that image from the Docker Synology UI. Alternatively you can continue setting it up via the command line using the instructions from the official documentation. You’re going to have to configure some volumes to point it to your plots, your Chia config.yaml, and optionally your keyring as described in the official docs.

I think Gene has mentioned over on Keybase that they would like to have an official Chia app on the Synology Package Center, but it’s probably not a high priority for the team right now.

If you have additional questions or issues setting it up, a new topic would probably be appropriate.

Good luck!

4 Likes

I haven’t tried this, but I think this is a better way of doing Chia + Docker + NAS: Run Ubuntu in Docker and then install Chia in Ubuntu. I’m going to try it soon.

1 Like

That’s essentially what the official Chia Docker container I was referencing does:

The image is based on Ubuntu and it installs Chia just like you would normally.

2 Likes

An interesting data point… after the chaos this morning *cough* I decided to transfer a bunch of plots while the network was down. I patched my node with the hotfix and got things running pretty quickly, but I still had a bunch of plots transferring (at full speed, no bandwidth limit) and I noticed that I did not experience the missed challenges that I did previously during the plot transfer.

I noticed the amount of full nodes I’m connected to is much lower than before, 10 now vs >50 before. This might be due to the low percentage of nodes running the hotfix?

I don’t know if it’s just related to number of node connections, or potentially there is a lower rate of transactions after the stuck chain this morning.

FWIW I’m also seeing this in my chiadog

Also copying plots over the network. On 10GBps but seems I need to upgrade to 40gbps.

Out of interest how do you know if your farm is responding to challenges or not?

I’m also seeing this and working on resolutions. I agree that I see the most unhealthy signage point logs when transferring plots over the same link that the farmer/harvester use to the NAS.

I have a Synology NAS and am planning on using a RPi 4 as a farmer/harvester.

The NAS has multiple Ethernet ports so I plan on having the farmer/harvester with it’s own eth connection separate from the plotter LAN where plots are transferred.

Also following this as a possible bug here: [BUG] My Raspberry Pi 4 4GB currently misses / doesn't finish plenty of signage points in a row · Issue #1796 · Chia-Network/chia-blockchain · GitHub

2 Likes

I am also experiencing this issue (we had a small discussion here and I wasn’t the only one experiencing this). Not sure how long this has been going on but probably not always since I’ve managed to win some coins in the past. There also seems to be quite a few different bug reports in github, reddit etc from even a few months ago.

The Chiadog bug on Windows is a great tip. However, altough I am running everything on Windows, the stopping is real. I’ve gone through all the options I could imagine, and I’ve also started to think if this has something to do with new plots joining the farming after completion.

Are you guys using only one machine or multiple harvesters? I’ve just realized that the problem might not have occurred before I tested harvesting with multiple machines using the official guidelines.

Does anyone know when user sets up a harvester to another machine, and copies the ca-folder, and configures the IP, is this all a one-way-street, so all the changes are only in the harvester machine, or does this also make some changes automatically to the main system? I’ve noticed that this second harvester shows up in the GUI but now I just want to “erase” it completely to be sure that it doesn’t cause any issues. I’ve already disconnected the harvesting PC from internet, and stopped the harvester from command line. But does the main machine e.g. try to look for this second harvester?

I’m only running on a single machine. Full node, harvester, wallet, etc are all running on my NAS.

1 Like