Virtual Harvesters test

WolfGT · May 27, 2021, 3:04pm

This is a test of how well a dedicated harvester per NAS works running in a VM. I’m not talking about a remote havester loaded on each NAS. All of them on the same computer along with the Full Node. Spoiler alert, so far it works great. First a quick look at my hardware and some history.

Hardware
2 dedicated plotting systems
2 Synology DS1817 NAS
1 Netgear ReadyNAS 424
1 desktop running full node and hyper-v

History
When I originally started plotting I had two NAS online and each plotter was plotting to it’s own NAS. But I had the full node farming both NAS’s. It was fine in the beginning but as I approached 150 or so plots, the challenge times were already getting out of hand. I then split my network into two, one for plotting and one for farming. This helped a lot. Removing the plot writing traffic from getting in the way. But even then, the times continued to climb. So at the time, I loaded docker on the readynas and split the harvesting between the full node and the docker remote harvester. All went great. So that was the plan for the future. Load a remote harvester on each NAS. Well, problem. Without doing some fancy version downgrades of the Synology DSM software, my DS1817’s can’t run docker or any VM (I’m still going to test the downgrade option just in case this set of tests don’t work).

Test 1 (Virtual Machines)
So that lead me to this. I wanted to know if the original problem was the number of plots being harvested over a network connection or the fact it was trying to harvest over the network to more than one NAS at a time. So, on the full node, I setup Hyper-v and loaded a Ubuntu server. Then setup Chia remote harvester and Chiadog. I then set it up to harvest the ReadyNAS which now has 291 plots on it. And it works great. The full node on that same machine is harvesting Synology1 which has 185 plots. So that is way more than the original setup was harvesting on two NAS’s and it is running perfectly.

Test 1 Result
My first suspicion confirmed. A single harvester/farmer has a really hard time harvesting two separate NAS devices. But two harvesters (even on the same machine) can do it no problem.

Since then, I have setup two more VM’s exactly like the first. Moving harvesting off the Full Node altogether. Now I have a Ubuntu server running a remote harvester and chiadog for each of my NAS’s. Running well. The full node just serves farming information to those harvesters and keeps the node/wallet in sync. (again to be clear, all of this is on one computer) Now for the final test.

Test 2 (VM’s Multiple Folders each harvester)
Each of the remote harvesters are currently only farming one folder per NAS. Running well. But, I am currently copying plots from one NAS to another so I can reformat it. So once those are copied, I will be farming another two folders on one harvester. Will the original problem reappear? Because ultimately, each of the Synology’s will have 8 folders (one per drive). So this test will decide if I need to do the software dance and load remote harvesters on each NAS.

WolfGT · May 28, 2021, 1:14pm

So far so good. I filled up the small ReadyNAS, so now both plotters are temporarily plotting to a single NAS in two folders. So the new virtual harvester is monitoring 3 folders on that NAS and there is no difference in search times. Currently around .03 per search with 55 plots over 3 folders.

When I look back at my chiadog daily reports of when it was setup originally (monitoring two NAS’s in one harvester), it jumped to over .10 second average when I started monitoring two NAS’s. As the number of plots grew, that average grew quickly. Before it even hit 100 plots, I was seeing .3 second average and over twenty 5 second+ per day and at least one 15 second+ searches (per day). The current setup has never hit even one 5 second search. Time will tell how it holds up, but so far so good.

One thought in the back of my mind. Is this improvement from the latest version of Chia (1.1.6)? I never tried to harvest back in one system after the new version.

WolfGT · May 29, 2021, 5:25pm

This is working great. I now have 3 harvesters running in virtual machines and each of them are harvesting a NAS. My most recent step is that one of the harvesters is now monitoring 4 folders on a NAS that contain a total of 281 plots. The average search time is about .2 seconds. Way better than what I was getting when trying to monitor multiple NAS’s from one harvester. And this proves the issue was not network traffic or system resources because these harvesters are running on the same machine. Just separating the NAS’s over multiple harvesters solved the issue.

Yae · May 29, 2021, 6:50pm

That is an interesting result. Any speculation on why?

WolfGT · May 29, 2021, 11:49pm

My thought is that when a single harvester is trying to monitor multiple NAS’s, it has to open the connection to the first NAS, perform the search, the close that connection, open a connection to the next NAS and perform the search on that one. I think the multiple connection process is what kills it. But I don’t know for sure.

A new observation that proves something that has been mentioned before. Multiple folders are better than one large folder. Currently I have the perfect example of this. One harvester is monitoring a NAS that has one folder containing 291 plots. It averages between .3 and .5 seconds per search. A different harvester is monitoring a NAS that has 292 plots but they are spread out over 4 folders. It averages in the .2 second range consistently. So, number of plots is almost identical but performance is very different just because of the folder structure.

enderTown · May 30, 2021, 12:32am

Would you mind trying the exact opposite now? Using the same NASes (NASii?), on the first, split the big folder into 4 and then on the other, combine the 4 folders into one? We should see the times flip. This would be a true scientific test using the same hardware to put this theory to bed once and for all!

WolfGT · May 30, 2021, 12:36am

That’s easy, it is the same hardware already.

enderTown · May 30, 2021, 12:41am

Ooooo even better - imagine if you didn’t see the times change. Now you’d have a real mystery!

Moving files between folders should be almost instant…come on, do it for the science!

WolfGT · May 30, 2021, 12:57am

I actually can’t. One of the NAS’s is setup with separate volumes. 1 volume per hard drive. So there are 8 separate folders/volumes that cannot be joined. The other is still setup as one big volume. I could separate that one into folders and see if the speed changes. But I can’t swap them.

enderTown · May 30, 2021, 2:14am

Ah gotcha. No worries - I’m actually running a similar structure. Lots of volumes mounted as folders. Seems to be the optimal setup for sure.

Yae · May 30, 2021, 9:43am

I found 2 more things specifically for Synology

There are many reports of IPv6 not playing nice with them and leading to over 50% transfer speed reduction. Make sure to disable it.
At least on my system, WSD was not running by default, leading to the nasty SMB1 for discovery when using SMB2/3 for SMB/CIFS for which the forced NetBIOS over IP4 was a mitigation. Make sure to enable it.

WolfGT · May 30, 2021, 1:33pm

I had already disabled IPv6 while troubleshooting a different issue (described below). But I hadn’t enabled WSD so I did that.

An absolutely huge issue with the DS1817 is that if you try to use eth0 or eth1 on a 1Gbe network, it will work for about an hour and then go nuts and basically be unusable. I fought this for a couple days and finally opened a case with synology. They said it is a known issue with the chip being used and to get it to work properly a boot up script needs to be added. Under Task Scheduler, create a “Boot-up” event and add the following to the Task Settings.

Screenshot 2021-05-28 111128

Once that was in place and rebooted, it works perfectly. That allowed me to run multiple plotters to the same NAS and also have a dedicated connection for the harvester.

Yae · May 30, 2021, 1:46pm

So you are basically disabling flow control on the Synology side, trusting all other devices will back off to avoid total network trashing?

WolfGT · May 30, 2021, 2:20pm

I guess so. But it is connected to a nice Ubiquiti switch so it can handle the flow control (at least that is my assumption). But I do have a new issue that has appeared overnight.

You can see in the image that about every 2 minutes and 15 seconds (around 135 seconds) it takes an additional 10 seconds to do the search. It is doing it like clockwork. No idea why. Anyone have any ideas?

Yae · May 30, 2021, 2:57pm

I had one of those, but had to retire it for my intended purpose. Turned out once you had enough of its smarts enabled (QoS, filtering etc.), its 1Gbps got bogged down to about 100Mbps

WolfGT · May 30, 2021, 3:26pm

OK, I looked back through the logs and everything was fine at 292 plots and 4 folders, then it started doing this (10 second delay every two minutes). But nothing changed. No new plots, no new folders, nothing. It started at around 7:55pm Friday night. Over 27 hours after I added the big chunk of plots. So it isn’t the number of plots causing it. It’s not the number of folders causing it. Looking in the logs, it appears to be the happening when the harvester searches for new plots.

Here is the first instance of the issue.

And here is what it looked like just 2 minutes earlier.

It is happening every time the harvester does a search for new plots. But I don’t know why. I’m going to reboot the NAS once I have an opening between plot copies.

Yae · May 30, 2021, 3:37pm

Are you 100% sure there are no indexing/encryption/security (AV) services at all running on these volumes/connections from either end?

https://www.synology.com/en-us/knowledgebase/DSM/tutorial/File_Sharing/What_can_I_do_when_the_file_transfer_via_Windows_SMB_CIFS_is_slow

WolfGT · May 30, 2021, 4:02pm

Thank you for the tips. I went through all of the recommendations in there but most of them I already had set. And no, there is no encryption/indexing or AV running on those folders.

But I did find the issue. I setup a remote mounted folder so I could migrate plots from another NAS. That migration finished yesterday but I left the mounted folder. That folder is mounted under one of the volumes that gets scanned for plots. Well, Friday night, I turned off that other NAS. So, the mounted folder is still there but cannot connect to its remote folder. So, delay.

Brought the other NAS back online so I could unmount the folder. Then deleted it. Problem gone.

Yae · May 30, 2021, 4:11pm

I love a good sleuthing story with a happy ending

WolfGT · May 31, 2021, 4:28pm

One of my farms has 291 plots and was in one folder. I am about to migrate those plots off of that device so I can reformat it. So I ran a quick test. I took a screenshot of about 10 minutes worth of “plots were eligible” entries and put all of the times in a spreadsheet. Then looked at the average. It was .36720 seconds with all the plots in one folder.

Then I created 3 folders and split up the plots between them. Then let it run for about an hour and took another sample. This time the average was .34627 seconds. Quicker, but not that much. So is smaller folders really that much better than one large one? This test really doesn’t tell me that is the case.