I have a really suspicious delay for almost all (94%~98%) my partials. This started recently (sometime since December 2023, I’ve been pool-farming since the introduction of NFT pooling and before that farming since August 2021) prior to some time in December 2023 all partials were on time.
Here are two recent partials, found on one plot in ~2 seconds. What’s boggling me is that ~130 seconds later there’s an error logged.
Timeline summary:
####-##-##T##:10:07.304 1 plots were eligible for farming 78b17ff893... Found 2 proofs. Time: 2.07469 s. Total 34 plots
####-##-##T##:10:07.320 Submitting partial
####-##-##T##:10:07.341 Submitting partial
####-##-##T##:12:18.633 Received partial in 133.4970691204071
####-##-##T##:12:18.635 Received partial in 133.4970691204071
HH:10:07 and HH:12:18 are approximately 131s apart.
2024-02-19T21:10:07.304 harvester chia.harvester.harvester: INFO 1 plots were eligible for farming 78b17ff893... Found 2 proofs. Time: 2.07469 s. Total 34 plots
2024-02-19T21:10:07.320 farmer chia.farmer.farmer : INFO Submitting partial for e048ee3796fbd43830cf31ec7bcb4e1e6976acfe10157ccf93df55509d4d010f to https://pool.openchia.io
2024-02-19T21:10:07.341 farmer chia.farmer.farmer : INFO Submitting partial for e048ee3796fbd43830cf31ec7bcb4e1e6976acfe10157ccf93df55509d4d010f to https://pool.openchia.io
--
2024-02-19T21:12:18.633 farmer chia.farmer.farmer : INFO Pool response: {'error_code': 2, 'error_message': 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2024-02-19T21:12:18.634 farmer chia.farmer.farmer : ERROR Error in pooling: (2, 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')
2024-02-19T21:12:18.635 farmer chia.farmer.farmer : INFO Pool response: {'error_code': 2, 'error_message': 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2024-02-19T21:12:18.635 farmer chia.farmer.farmer : ERROR Error in pooling: (2, 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')
Usual details:
- chia version 2.1.4 installed from the .deb.
- openchia.io pool
- Ubuntu 23.04
- i5-3450 with 32GiB DDR3 RAM
- / and /home (and ~/.chia/ etc) are on PCIe Gen3 NVMe
- 34
k=32
(~3.42TiB) uncompressed plots are on a 7.28TiB BTRFS volume on 4x 2TB WDC WD20EZBX-00A -data,single
metadata,RAID1
mounteddefaults,ro
(previously withdefaults,noatime
) to to eliminate useless writes. - services running on this host: chia-daemon, chia-full-node, chia-wallet, chia-farmer, chia-harvester all started with the provided systemd service files.
- Network topology: host <=> gigabit switch <=> gigabit switch <=> opnsense F/W <=> fibre ONT (300Mb/100Mb) CGNAT at ISP. IPv4 and IPv6.
- System clock is NTP sync’d via systemd-timesyncd with a ~20ms root distance.
Prior research:
I’ve spent a while trying to find problems with the system and filesystem and daemons but to no avail. I’ve found several leads:
chiaforum: /t/low-pool-earnings-and-lots-of-error-in-pooling-partial-was-received-too-late/19971 suggested that the disks maybe entering sleep monitored with hdparm
and all disks were active/idle even when the delay occurred, also use hdparm -s 0
to disable sleep. No improvement.
chiaforum: /t/tidying-up-the-farm-late-and-inconsistent-partials/17901/8 involved plotting and wifi implied that I could have a bottleneck at the HDD, though these HDD’s are dedicated I disabled some other services (I thought it possible that activity spikes from Jackett/Sonarr/Radarr etc could be the problem) using another pair of similar HDD’s to isolate the chia services. No improvement.
chiaforum: /t/no-partials-in-debug-log-no-pool-plots-in-gui/11697 implies that there could just be junk in the ~/.chia/mainnet/
directory. This actually prompted me to re-sync from the latest DB snapshot just to make sure my DB wasn’t damaged somehow. No improvement.
redit: /r/chia/comments/os2azm/help_i_always_get_invalid_partial_for_my_plots/ directs the poster to check the lookup times. The example log above is actually slightly slower than typical, they tend to be from 0.4s to 1.6s. I also started to look at the HDD’s to see if any of them were showing signs of age… they are old and hdparm -Tt
did show some being quite slow, I did previously have data,RAID1
and reduced that to data,single
to prepare to re-balance and remove one of the slowish HDD’s. The plot files could be fragmented, but the harvester seems to be fast enough; e.g. 1 plots were eligible for farming c7d79c54f2... Found 3 proofs. Time: 2.27235 s. Total 34 plots
Here’s an example of the NFT’s performance: OpenChia.io - Chia Farming Pool
TL;DR: something, somewhere is introducing a 130-ish second delay between my harvesters and the pool.
The 129~133 seconds is suspiciously close to a two-minute time-out plus an actual 9~13 second exchange.
Where do I go from here? What else could I try to diagnose?