Harvester falsely identifying duplicate plots

Jacek · September 22, 2021, 4:15am

In the old (pre v1.2.6 or 1.2.7) rescan timeout was controlled by

plot_loading_frequency_seconds: 120 // 120 seconds

With that new change, during the update, they don’t read the value you may have in your config file, but use a new section (that is not inserted in the old config.files):

plots_refresh_parameter
  interval_seconds = 120
  retry_invalid_seconds = 1200
  batch_size = 300
  batch_sleep_milliseconds = 1

Looks like you have a big load on your harvester, so they recommend to increase (by a lot, e.g., 10x) that “interval_seconds” field.

Since for you (or me) that section is not in config file, it is hard to guess what to do, unless you are pointed to it.

From what you described, you are hitting either some OS limit, or Chia code is confused with that number of folders, and goes bonkers.

By the way, it used to be that when that timeout happened, there were INFO entries in the log file about which folders are being scanned, and plots added. I don’t see that “being scanned” lines there anymore (maybe after updating to v1.2.6), and have not added new plots recently. You may want to disable/enable one folder, and check whether log file will have some interesting info.

Although, there may be one more possibility. If you have a lot of folders/plots (don’t know what “a lot” means), it may be that the scanning process is just too slow, and the code is starting to chase its own tail (e.g., getting new data, when the old is not being processed, and then being confused about missing that new data, …).

How many drives you have sitting on that harvester?

Update
Here is the link that talks about those new entries. It looks like it is still under development, i.e., there is no clear understanding how to handle a big amount of folders/plots, as such you may be affected by that bug.

And as that link indicates, that change was added in v1.2.5, so exactly when you started having those problems - new, badly tested code.

So, add that section, make that entry be like 30-60 minutes or so, and see whether it helps. If you are still adding new plots, maybe it is better to wait that 30-60 minutes to pick those plots up, than have the software start barfing again.

By the way, that rescan is needed only when you are adding new plots to the harvesters. Otherwise, I think you may even disable it, if there is such possibility (on startup, that harvester will rescan your folders, whether you have that timeout or not - at least i think so). So, setting it to that 30-60 minutes is really not that big of a deal.

Also, I don’t understand why that section is so rigid, or is there in the first place. When you (I mean, chia code - program) load those folders, you know what speed you are going with, so that refresh should be dynamic, based on the folder/plots count and loading times. It is just kicking the can toward the end-user to avoid coding a more flexible solution. In my opinion, there should be one entry there “scan: on/off” and a way to manually (UI) triggering that rescan. Also, there is a inotifywait (on Linux, but you can code similar thing on Windows as well)
tool that can be used to monitor folders for changes. That would let that rescan chore be handled by the tool, and have your code run only when the tool will notify you (so minimal code will be needed to do those rescans, if at all). However, that is a different story.

I found a better link in chia github sources. Just search for “plots_refresh_parameter” The other link that I put above is to a Chia fork (still based on the main Chia code, I think, so their explanations may apply to Chia as well).