Almost all partials are always 130-ish seconds late

nickbreen · February 19, 2024, 10:08am

I have a really suspicious delay for almost all (94%~98%) my partials. This started recently (sometime since December 2023, I’ve been pool-farming since the introduction of NFT pooling and before that farming since August 2021) prior to some time in December 2023 all partials were on time.

Here are two recent partials, found on one plot in ~2 seconds. What’s boggling me is that ~130 seconds later there’s an error logged.

Timeline summary:

####-##-##T##:10:07.304 1 plots were eligible for farming 78b17ff893... Found 2 proofs. Time: 2.07469 s. Total 34 plots
####-##-##T##:10:07.320 Submitting partial
####-##-##T##:10:07.341 Submitting partial 
####-##-##T##:12:18.633 Received partial in 133.4970691204071 
####-##-##T##:12:18.635 Received partial in 133.4970691204071

HH:10:07 and HH:12:18 are approximately 131s apart.

2024-02-19T21:10:07.304 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming 78b17ff893... Found 2 proofs. Time: 2.07469 s. Total 34 plots
2024-02-19T21:10:07.320 farmer chia.farmer.farmer         : INFO     Submitting partial for e048ee3796fbd43830cf31ec7bcb4e1e6976acfe10157ccf93df55509d4d010f to https://pool.openchia.io
2024-02-19T21:10:07.341 farmer chia.farmer.farmer         : INFO     Submitting partial for e048ee3796fbd43830cf31ec7bcb4e1e6976acfe10157ccf93df55509d4d010f to https://pool.openchia.io
--
2024-02-19T21:12:18.633 farmer chia.farmer.farmer         : INFO     Pool response: {'error_code': 2, 'error_message': 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2024-02-19T21:12:18.634 farmer chia.farmer.farmer         : ERROR    Error in pooling: (2, 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')
2024-02-19T21:12:18.635 farmer chia.farmer.farmer         : INFO     Pool response: {'error_code': 2, 'error_message': 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue'}
2024-02-19T21:12:18.635 farmer chia.farmer.farmer         : ERROR    Error in pooling: (2, 'Received partial in 133.4970691204071. Make sure your proof of space lookups are fast, and network connectivity is good.Response must happen in less than 25 seconds. NAS or network farming can be an issue')

Usual details:

chia version 2.1.4 installed from the .deb.
openchia.io pool
Ubuntu 23.04
i5-3450 with 32GiB DDR3 RAM
/ and /home (and ~/.chia/ etc) are on PCIe Gen3 NVMe
34 k=32 (~3.42TiB) uncompressed plots are on a 7.28TiB BTRFS volume on 4x 2TB WDC WD20EZBX-00A - data,single metadata,RAID1 mounted defaults,ro (previously with defaults,noatime) to to eliminate useless writes.
services running on this host: chia-daemon, chia-full-node, chia-wallet, chia-farmer, chia-harvester all started with the provided systemd service files.
Network topology: host <=> gigabit switch <=> gigabit switch <=> opnsense F/W <=> fibre ONT (300Mb/100Mb) CGNAT at ISP. IPv4 and IPv6.
System clock is NTP sync’d via systemd-timesyncd with a ~20ms root distance.

Prior research:

I’ve spent a while trying to find problems with the system and filesystem and daemons but to no avail. I’ve found several leads:

chiaforum: /t/low-pool-earnings-and-lots-of-error-in-pooling-partial-was-received-too-late/19971 suggested that the disks maybe entering sleep monitored with hdparm and all disks were active/idle even when the delay occurred, also use hdparm -s 0 to disable sleep. No improvement.

chiaforum: /t/tidying-up-the-farm-late-and-inconsistent-partials/17901/8 involved plotting and wifi implied that I could have a bottleneck at the HDD, though these HDD’s are dedicated I disabled some other services (I thought it possible that activity spikes from Jackett/Sonarr/Radarr etc could be the problem) using another pair of similar HDD’s to isolate the chia services. No improvement.

chiaforum: /t/no-partials-in-debug-log-no-pool-plots-in-gui/11697 implies that there could just be junk in the ~/.chia/mainnet/ directory. This actually prompted me to re-sync from the latest DB snapshot just to make sure my DB wasn’t damaged somehow. No improvement.

redit: /r/chia/comments/os2azm/help_i_always_get_invalid_partial_for_my_plots/ directs the poster to check the lookup times. The example log above is actually slightly slower than typical, they tend to be from 0.4s to 1.6s. I also started to look at the HDD’s to see if any of them were showing signs of age… they are old and hdparm -Tt did show some being quite slow, I did previously have data,RAID1 and reduced that to data,single to prepare to re-balance and remove one of the slowish HDD’s. The plot files could be fragmented, but the harvester seems to be fast enough; e.g. 1 plots were eligible for farming c7d79c54f2... Found 3 proofs. Time: 2.27235 s. Total 34 plots

Here’s an example of the NFT’s performance: OpenChia.io - Chia Farming Pool

TL;DR: something, somewhere is introducing a 130-ish second delay between my harvesters and the pool.

The 129~133 seconds is suspiciously close to a two-minute time-out plus an actual 9~13 second exchange.

Where do I go from here? What else could I try to diagnose?

Voodoo · February 19, 2024, 1:26pm

So lookup times seems to be fine, just the partials that are a problem.

Considering all you have checked already. I would suggest to try, another pool and see if that solves it.
At least that way you can verify if the problem is in your system or not.

nickbreen · February 22, 2024, 10:11am

It seems that pool.openchia.io was resolving AAAA IPv6 addresses and not responding via IPv6.

The client would apparently arbitrarily flip between IPv4 and IPv6.

nickbreen · February 22, 2024, 7:27pm

Oops, it’s my fault.

I wrecked my firewalls IPv6 rules somehow.

All IPv6 traffic just dropped.

tcpdump showed a pile of syn unacknowledged packets during the periods of late partials.

SBG1967 · February 26, 2024, 8:58am

Hi @nickbreen

I’ve just seen your post and I have been seeing the same ~100 sec delay on partials since early Jan 24. Like you i’ve been checking my disk array, I’ve rolled back versions of chia, madmax farmer and my Nvidia driver and tried a different pool. All to no avail!!!

I will now check my network settings and firewall and see if I’ve got the same issue!

SBG1967 · February 26, 2024, 11:23am

** UPDATE **

After another look in the logs it seems I had lost time sync (my pc was 2 min behind).

After fixing that, all is well!!!

Garethrn · February 27, 2024, 4:22pm

Strange!!! I just came home from work to see all partials invalid / late and all around 130 secs ish. Did a quick reboot of PC / Farmer and all good again now!

Garethrn · March 4, 2024, 3:02pm

Just monitoring my farm from work and I can see all partials are invalid / late again (all around 130 secs ish again). What is the relevance of 130 secs? Why do these lates always come in at 130 secs?

Garethrn · March 4, 2024, 3:30pm

What exactly did you do to rectify this, as it keeps happening to me since changing my ISP? The only way I can get around this is to disable my IPv6, as I don’t know what settings to change for the better. The problem with this method is that it halves my download speed (tested with speed checker). If anyone else is seeing or has seen this problem, please could you let me know how you solved it.

SBG1967 · March 5, 2024, 7:37am

Can you just disbale IPv6 on your farmer?