Harvester not participating in challenges

heimo_vesa · May 12, 2021, 10:17pm

Yup, same here. But I tested before multiple harvester out of curiosity + for the preparation of scaling my system.

I’ve tried literally everything and haven’t figured out this issue yet. Right now, I have a hunch that perhaps me, running 3 plots simultaneously (3x4GB) on 16GB RAM might cause some memory issues. Although there should still be 4 GB left for other usage, this might not be the case in reality. Right now, I am plotting only 2 simultaneously, and it seems that the memory usage is often around 70 % due to all the other stuff (Chrome etc) running on the side. I’m waiting with curiosity to see and learn if some phases create a spike on the system that could cause the shutdown of the harvester. At least now it feels as if the harvester is stopping not-so-frequently…

Otherwise, I am starting to be out of ideas with this.

heimo_vesa · May 13, 2021, 8:23am

So far so good! I decreased plotting in my full node / plotting / farming-PC, and seems like either the memory or the CPU is enjoying the changed situation since I haven’t had any issues for the past 12 hours.

Too bad this might mean that if I wanted to keep plotting on my PC, I’d need to build a separate farming setup, yet that isn’t very practical due to having a dozen external HDDs that are working nicely with PC.

tinycoins · May 13, 2021, 9:47am

I’ve also got a machine which is experiencing this problem; I’ll try without any plotting activity and see if it keeps working for longer.

Madcat · May 13, 2021, 11:10am

Can this be solved with dual NIC´s 1 for chia and 1for filetransfers ( ex new plots coming in)

unkn0wn · May 13, 2021, 9:15pm

Good suggestion! Unfortunately I tried it today and it made no difference. Transferring the plots on a secondary network interface still resulted in my node de-syncing during the transfer.

codinghorror · May 15, 2021, 2:25am

Isn’t it well known by now that Chia and NAS’es do not play particularly well together? And it’s totally I/O based.

My poor terra-master NAS’es had gigabit ethernet support (so a reasonably beefy CPU), were set up as RAID 0, and their reaper proof times (by that I mean the actual disk I/O to check if a plot meets the proof) were consistently 4x of the simple JBOD drive connections.

It does seem like there’s fundamental overhead in NAS architectures, and Chia proof checks are sensitive to this…

unkn0wn · May 15, 2021, 2:54am

Yeah, seems like there are strange issues with NAS’es for sure.

I saw your tread about the issues that you were having, and I think I’m running into something different. My NAS is completely stable while I’m not transferring plot files around. My average proof seek time is 0.7 seconds over the last 24 hours.

I think my Chia Docker container is getting resource starved somehow while transferring files. It seems like file transfers are heavily prioritized by Synology to the detriment of other processes running on the NAS. I was shocked that throttling the file transfer speed down to almost nothing yielded the same results.

I think the next step would be to contact Synology support to see if they have any input. Perhaps there is a way to configure the NAS where it doesn’t de-prioritize Docker.

I’ll be honest, I’m not very motivated to investigate further since I have filled the storage attached to this NAS and things are working smoothly now that I don’t have to move these plot files around.

If anyone else gets to the bottom of this I’d be interested to hear the solution, in case I ever decide to re-plot for pooling.

codinghorror · May 15, 2021, 3:01am

Ah, so just to confirm, you only have issues when files are being transferred to/from the NAS? For me, the 4x proof lookup speed penalty of “plot file on NAS” was always present, compared to “plot file on JBOD attached via USB”.

It was a workable delay, not near the 5 second warning limit, but it was noticeable because it was easily 4x more than the JBOD time. It wasn’t a small difference! Here’s the last harvester times from 9 days ago:

	intel / nas	htpc	amd	datacenter JBOD
avg	3.59	1.59	0.48	0.61
median	3.51	0.12	0.13	0.47

See what I mean? That’s the last time I looked… I stopped looking once I removed the NAS’es from my environment.

unkn0wn · May 15, 2021, 3:39am

Correct, I only have issues when I’m transferring files. When the NAS is “idle” my lookup times seem like they’re in line with your JBOD numbers.

I only recently stopped plotting to this NAS, so I’ll collect some more data on average lookup times and report back.

Overall, my expected time to win has been roughly what I’m actually winning (with a dry spell here or there), so things are actually working.

Tigerraiders · May 17, 2021, 12:57pm

Thanks so much for this.

Are you using SSD cache on your Synology? Is it recommended or required?

What kind of file structure do you use for the volume created? Is it raided? Or should I just read the documentation and stop asking easily answered questions. My new hard drives just arrived and id love to get cracking away (when I can find time!)

unkn0wn · May 17, 2021, 4:06pm

I’m not using SSD caching and I don’t think it would be particularly useful for the Chia use case.

For a file structure I have each drive as a separate volume with no RAID. The user interface will encourage you to use RAID and warn about potential data loss, but do not use RAID it is a waste of farming space for Chia.

Blueoxx · May 17, 2021, 4:38pm

Just FYI, I found that transferring plots between my Windows plotter and Windows Farmer was causing a similar issue, challenges coming in delayed or sporadically. ChiaDog alerted me to this. I even set a throttle on the robocopy of the plots and the issue still remained.

I’m wondering if you need a separate network for moving plots around and one for farming like @WolfGT is doing. So I added a wireless adapter to the farmer to test it out later.

ianj · May 17, 2021, 7:31pm

RAID is useful for when you cannot afford to lose data. Plot files are not like that as you can recreate them. Why waste a disk on RAID to protect something when you can use that extra disk storage for holding plots. Also raid can manage degradation to a certain point - then you lose everything - that might be 1, 2 3 disk but and unlikely to happen but it is still a risk (and more protection = more committed disks) - if you lose half your plot disks you still have half remaining - its is inherently tolerant of losses. The only good reason for RAID is to increase the bandwidth so you can copy from the plotter to the destination faster or if you cannot replace the plots