Harvester not participating in challenges

unkn0wn · May 7, 2021, 3:17pm

Hello all, since transactions went live this week it looks like my Chia farm has been having issues participating in challenges.

I’m using Chiadog to monitor the logs from my node and I’m frequently greeted with notifications like this:

This seems to happen several times a day, and usually it resolves on its own after 10-15 minutes. When the system is in this state, if I check the logs I see things like this:

2021-05-07T13:44:17.152 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7 95.150.222.126
2021-05-07T13:44:17.153 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 95.150.222.126 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7
2021-05-07T13:44:17.244 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301 51.154.15.44
2021-05-07T13:44:17.289 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 51.154.15.44 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301
2021-05-07T13:44:17.290 full_node full_node_server        : INFO     <- respond_signage_point from peer 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7 95.150.222.126
2021-05-07T13:44:17.306 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.306 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d
2021-05-07T13:44:17.307 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37 12.216.126.78
2021-05-07T13:44:17.309 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 12.216.126.78 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37
2021-05-07T13:44:17.381 full_node full_node_server        : INFO     <- respond_signage_point from peer 21d643e48422e144bc7a4fb5e064712902e242be46e9cf80ecb8ffd558474b37 12.216.126.78
2021-05-07T13:44:17.382 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.383 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d
2021-05-07T13:44:17.412 full_node full_node_server        : INFO     <- new_signage_point_or_end_of_sub_slot from peer 9ce7cf9783a80f5f80a9a3ed49990b57336e777a6d56c594747de2414345d225 13.66.209.137
2021-05-07T13:44:17.413 full_node full_node_server        : INFO     -> request_signage_point_or_end_of_sub_slot to peer 13.66.209.137 9ce7cf9783a80f5f80a9a3ed49990b57336e777a6d56c594747de2414345d225
2021-05-07T13:44:17.425 full_node full_node_server        : INFO     <- respond_signage_point from peer 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301 51.154.15.44
2021-05-07T13:44:17.427 full_node chia.full_node.full_node_store: INFO     Don't have rc hash e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d. caching signage point 61.
2021-05-07T13:44:17.427 full_node chia.full_node.full_node: INFO     Signage point 61 not added, CC challenge: dc27d4269717dd2fe4e03ba961348670f188ae692806ccb06708843f7039ecf0, RC challenge: e144cef9daa8b046ef6b48d4701c2cfb361536c081126f29d1c45d5b8081196d

The node appears to be busy receiving vdfs and transactions from peers, but the log Don't have rc hash seems like a problem.

When the system is in a good state, I see logs like this:

2021-05-07T14:53:00.431 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 51/64: b361871f56c658923c3a464a17d20299267e3dac3e355b14b9a47ba8ee4e4646 
2021-05-07T14:53:07.188 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 52/64: 905cabbd799ed409e48fd689ffc56fb685e8a156ffe113d441e3e70362e87d12 
2021-05-07T14:53:15.515 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 53/64: a3191693f3589b6a15ff4c47f230fd0a8fcfb6a44f9bcbf5d91fb358e13ee691 
2021-05-07T14:53:24.971 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 54/64: a1ae62049c3c66aa2ef85bf7fba20b9c7c077b5260a3c6ab85fb40ccd8f90ea6 
2021-05-07T14:53:35.035 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 55/64: 06a9208d22e5f313d0ccd4220e38b6b753e3c30e3552eec37c216b1e8039c44f 
2021-05-07T14:53:41.206 full_node chia.full_node.full_node: INFO     🌱 Updated peak to height 245646, weight 9117292, hh 593ae0c29001564933aa68083205c195bbfa734ae9d6d88456c59f452c166f73, forked at 245645, rh: 85860e82929d83fbb9ac1b81af7b2b2f6841bf476ab1ad0dde55e0203189af8d, total iters: 795527654862, overflow: False, deficit: 0, difficulty: 182, sub slot iters: 110624768, Generator size: No tx, Generator ref list size: No tx
2021-05-07T14:53:41.895 wallet chia.wallet.wallet_blockchain: INFO     💰 Updated wallet peak to height 245646, weight 9117292, 
2021-05-07T14:53:45.231 full_node chia.full_node.full_node: INFO     ⏲️  Finished signage point 56/64: a39a2e89a520b9f8e2a7b5e6cf8657f5585542ccd6aee2b9d238dbb8ae89e9c1 
2021-05-07T14:53:52.436 full_node chia.full_node.full_node: INFO     🌱 Updated peak to height 245647, weight 9117474, hh 301dcf4ccdc1989fd79162a51115aac0593cfe1f401f83cb6c48ea88d54d89b5, forked at 245646, rh: c6404499e3ba041241ce350e1385c9a72827dc8052d3743b2ca073e49416465d, total iters: 795529714011, overflow: False, deficit: 0, difficulty: 182, sub slot iters: 110624768, Generator size: No tx, Generator ref list size: No tx

None of those logs are present when Chiadog is reporting problems.

I’ve tried to do everything possible to rule out networking issues. I’m able to successfully open a connection to port 8444 to my external ip from an outside network. When I’m in a “failed” state running chia show -s -c shows that I’m fully synced and I typically have over 50 connections to other full nodes. My system seems otherwise pretty happy, low CPU usage and plenty of RAM.

I’m running a full node using the latest Chia version (1.1.4 at time of posting) on a Synology DS1520+ using the official Chia Docker container. The container is setup to use host networking, port 8444 is being forwarded, and upnp is disabled in my config.yaml.

My full debug.log from the last time I encountered this issue can be found here:

gist.github.com

https://gist.github.com/ajacobson/df8c4df6ecf80b0017e327a09de54a1a

debug.log

2021-05-07T13:36:30.741 full_node full_node_server        : INFO     -> new_transaction to peer 218.249.94.183 9a0444b8d40560e2e6cf697f4aa55c83612d4dc4df07482f10d9f85be7f74134
2021-05-07T13:36:30.743 full_node full_node_server        : INFO     -> new_transaction to peer 95.150.222.126 658d626ac357647e3f259cbf2ad4d58c1aef08150c28d4f72291c328f72558f7
2021-05-07T13:36:30.743 full_node full_node_server        : INFO     -> new_transaction to peer 120.229.59.46 464c3299b6ad1e5581945912cd7fb5ebb9ab1ef4315f3dfb5023206bfd241291
2021-05-07T13:36:30.744 full_node full_node_server        : INFO     -> new_transaction to peer 51.154.15.44 026a47378ad20d84e0227bc4758ae99866f7cb3f97f66432a30dbc630d028301
2021-05-07T13:36:30.744 full_node full_node_server        : INFO     -> new_transaction to peer 209.91.254.212 6c52439663733ac605202b118f2956d3674d6dd9a387987d796a11be3c21ba77
2021-05-07T13:36:30.745 full_node full_node_server        : INFO     -> new_transaction to peer 77.96.105.89 298cb9ace8911ec7a15e2de522fd0a9ce2674f116e4761eed7f6d771c4dbce98
2021-05-07T13:36:30.745 full_node full_node_server        : INFO     -> new_transaction to peer 222.186.60.244 d49e4efc7c00eb2078ece16ded2a383ee3513c462e0fcf954e3cdc1e8c7df9f9
2021-05-07T13:36:30.745 full_node full_node_server        : INFO     -> new_transaction to peer 77.101.117.149 7e2e621be64e2481dacb2c04e19d840c395ce30d79fb64c72440b29b711b471b
2021-05-07T13:36:30.746 full_node full_node_server        : INFO     -> new_transaction to peer 115.152.51.130 ae08a1be31fb987ccefb5463389bea34edc40a9e4ac483b4a0989a79de80d658
2021-05-07T13:36:30.746 full_node full_node_server        : INFO     -> new_transaction to peer 118.114.197.8 960f38dc9c13a5fe269cdf461df858eecad5035b1f087b4074202521661a2933

This file has been truncated. show original

Any help would be greatly appreciated.

Blueoxx · May 7, 2021, 3:56pm

Welcome to the Forums!

There could be a lot of things happening here.

Are you transferring plots over the network? It could be that you get these errors because the network gets saturated with the file transfer.
Hopefully you’re not using any RAID either on those drives and are just running them as JBOD.
Are you using the Synology for anything else? Media server? Media Transcoder?

unkn0wn · May 7, 2021, 4:21pm

Thanks for your response.

Are you transferring plots over the network? It could be that you get these errors because the network gets saturated with the file transfer.

I am plotting on another machine and they are being transferred over the network when complete. I’ve been doing this since mainnet launch and only started having this issue in the past week. I don’t think the failures correlate with plot transfers, but I’ll keep a closer eye on this.

Hopefully you’re not using any RAID either on those drives and are just running them as JBOD.

No RAID, all individual volumes.

Are you using the Synology for anything else? Media server? Media Transcoder?

Nothing resource intensive, just Chia, Chiadog, and Home Assistant.

unkn0wn · May 7, 2021, 8:24pm

I’ve been monitoring my NAS closely and so far each batch of missed signage points does correlate with a plot file transfer.

I’m going to disable the plot file transfer and continue monitoring to see if the issue resolves. If that does resolve it, I should be able to figure out a new plot transfer mechanism that hopefully doesn’t saturate the link.

It is odd that I’m just seeing this now, I’ve had the same setup for quite awhile with the same monitoring tools in place and this didn’t crop up until this week. Maybe a change in Chia 1.1.3/1.1.4 or maybe running a full node just needs a lot more resources with the explosive growth of the netspace

marram · May 7, 2021, 10:30pm

I run into similar issue where Chiadog says my harvester appears to be offline. I haven’t started digging into it but your post looks like the same issue.

All in the same machine. Not correlated to rsyncing files to the final drive. Port 8444 being forwarded (UDP/TCP) to the machine.

Resource wise I’m close to maxed out but with headroom 14/16GB free, all 16 threads regularly over 80% but not at 100% for a long period of time. OS with NVMe, plot drive 4xNVME in raid0 and no other I/O to the plot drive(s).

Also interested in knowing what can be done to troubleshoot this.

unkn0wn · May 8, 2021, 12:29am

@marram are you on Windows by any chance? There is a known issue where Windows log rotation breaks Chiadog monitoring.

github.com/martomi/chiadog

Windows: False "Harvester appears offline" notifications when farm is fine

opened 09:47PM - 02 May 21 UTC

closed 08:22PM - 25 May 21 UTC

suckatlife

bug windows

**Describe the bug** I'm running the Windows version of chiadog. Every now an…d then I'll get the "Your harvester appears offline! No events for the past xxxx seconds" notification, and I will continue getting this notification until I restart chiadog. When I check my farm (using chia farm summary) everything is fine. I actually won a block during one of these fake outages. Looking at timestamps, these outages seem to always coincide with a log rotation, but not every rotation. My log rotates every 40-50 minutes, but this only happens around once a day or so - so it's definitely not with every rotation. It was suggested I turn off any other processes tailing (using get-content) debug.log, which I've done, but it didn't help. **Environment:** - OS: Windows 10 - Python version: 3.9.4 - PIP version: 21.1 - Chia version: 1.1.3

marram · May 8, 2021, 1:14am

Nope. Ubuntu 20.04. Good to know that anyway.

I think the chia tool itself should have a test tool where you can point at your IP and it checks it all works.

unkn0wn · May 8, 2021, 3:37pm

Since the issue was correlating with plot transfers I had assumed that the issue was network saturation. To test that hypothesis I tried (unsuccessfully) turning on QoS on the NAS to prioritize Chia node traffic (on port 8444) and de-prioritize file sharing traffic. I’m not sure the reasoning behind this, but according to Synology docs only outbound filesharing traffic can be limited (SMB, CIFS, AFP, NFS, etc).

I transitioned to using rsync with the --bwlimit option to transfer the plots while limiting the bandwidth usage. Even transferring at 20 Mbps my Chia node would stop participating in challenges, so this had to be more than just a network issue.

It’s at this point that I took another look at my Docker container settings, I noticed the CPU priority was only set to medium (the default). I had previously set this to high, but I think that setting got blown away when I updated to 1.1.4.

So now with the CPU priority now set to max, and limiting the plot transfer speed to 20 Mbps, surely things will work smoothly…

Nope.

Shortly after starting an agonizingly slow plot transfer my node once again stopped participating in challenges. The system isn’t even all that busy while this is happening, here is a screenshot of the resource utilization:

I understand this is most likely not a Chia issue, but a Synology or Docker problem, but it’s driving me crazy. At this point I don’t know what else to try. I suppose another option is to run the full node on another machine, but I was really hoping to be able to set this up to be self contained and not require additional hardware. Another option would be to plot to a USB drive and physically connect it to the NAS to transfer plots over. I’m worried that option is labor intensive and might not even solve the problem.

Blueoxx · May 8, 2021, 4:23pm

Thanks for sharing this info. This is really strange. I wonder if the NAS puts file transfers above all else. Also, could it possibly be a TCP vs UDP issue? Grasping at straws here.

Tigerraiders · May 9, 2021, 2:40am

Hi, have you had any success and have any tutorials to getting chia to work on a synology using docker?

unkn0wn · May 9, 2021, 3:43am

@Tigerraiders aside from the issues I’ve outlined in this post, I would say I’ve had some success

I mainly followed the official Chia Docker documentation found here.

I couldn’t figure out a way to pull Docker images from GitHub in the official Synology Docker UI (seems like it only pulls from Docker Hub) so you have to do some stuff via the command line.

Install Docker from the Synology Package Center
Enable SSH on your NAS
SSH into your NAS
sudo su to switch to root user
docker pull ghcr.io/chia-network/chia:latest

Now the image is downloaded so you should be able to create a container using that image from the Docker Synology UI. Alternatively you can continue setting it up via the command line using the instructions from the official documentation. You’re going to have to configure some volumes to point it to your plots, your Chia config.yaml, and optionally your keyring as described in the official docs.

I think Gene has mentioned over on Keybase that they would like to have an official Chia app on the Synology Package Center, but it’s probably not a high priority for the team right now.

If you have additional questions or issues setting it up, a new topic would probably be appropriate.

Good luck!

roybot · May 9, 2021, 3:51am

I haven’t tried this, but I think this is a better way of doing Chia + Docker + NAS: Run Ubuntu in Docker and then install Chia in Ubuntu. I’m going to try it soon.

unkn0wn · May 9, 2021, 4:13am

That’s essentially what the official Chia Docker container I was referencing does:

The image is based on Ubuntu and it installs Chia just like you would normally.

unkn0wn · May 9, 2021, 5:34pm

An interesting data point… after the chaos this morning *cough* I decided to transfer a bunch of plots while the network was down. I patched my node with the hotfix and got things running pretty quickly, but I still had a bunch of plots transferring (at full speed, no bandwidth limit) and I noticed that I did not experience the missed challenges that I did previously during the plot transfer.

I noticed the amount of full nodes I’m connected to is much lower than before, 10 now vs >50 before. This might be due to the low percentage of nodes running the hotfix?

I don’t know if it’s just related to number of node connections, or potentially there is a lower rate of transactions after the stuck chain this morning.

memix · May 11, 2021, 3:31am

FWIW I’m also seeing this in my chiadog

Also copying plots over the network. On 10GBps but seems I need to upgrade to 40gbps.

memix · May 11, 2021, 3:32am

Out of interest how do you know if your farm is responding to challenges or not?

cyb3rPhilomath · May 11, 2021, 3:18pm

I’m also seeing this and working on resolutions. I agree that I see the most unhealthy signage point logs when transferring plots over the same link that the farmer/harvester use to the NAS.

I have a Synology NAS and am planning on using a RPi 4 as a farmer/harvester.

The NAS has multiple Ethernet ports so I plan on having the farmer/harvester with it’s own eth connection separate from the plotter LAN where plots are transferred.

Also following this as a possible bug here: [BUG] My Raspberry Pi 4 4GB currently misses / doesn't finish plenty of signage points in a row · Issue #1796 · Chia-Network/chia-blockchain · GitHub

heimo_vesa · May 11, 2021, 5:19pm

I am also experiencing this issue (we had a small discussion here and I wasn’t the only one experiencing this). Not sure how long this has been going on but probably not always since I’ve managed to win some coins in the past. There also seems to be quite a few different bug reports in github, reddit etc from even a few months ago.

The Chiadog bug on Windows is a great tip. However, altough I am running everything on Windows, the stopping is real. I’ve gone through all the options I could imagine, and I’ve also started to think if this has something to do with new plots joining the farming after completion.

heimo_vesa · May 12, 2021, 1:37am

Are you guys using only one machine or multiple harvesters? I’ve just realized that the problem might not have occurred before I tested harvesting with multiple machines using the official guidelines.

Does anyone know when user sets up a harvester to another machine, and copies the ca-folder, and configures the IP, is this all a one-way-street, so all the changes are only in the harvester machine, or does this also make some changes automatically to the main system? I’ve noticed that this second harvester shows up in the GUI but now I just want to “erase” it completely to be sure that it doesn’t cause any issues. I’ve already disconnected the harvesting PC from internet, and stopped the harvester from command line. But does the main machine e.g. try to look for this second harvester?

unkn0wn · May 12, 2021, 9:50pm

I’m only running on a single machine. Full node, harvester, wallet, etc are all running on my NAS.