ChiaGarden - A Toolkit for PoST Farming on Linux

efnats · November 19, 2023, 5:40am

Try gardenmount --mount --print-fstab

It won’t work without either --mount or --unmount.

Thanks for pointing out. I should make sure that if any of these two actions are missing there is an error message displayed.

Also with that amount of disks I would recommend not to use fstab but instead use the systemd service. Let me know if you need help on how to use it.

Also I dare you to add the option --mergerfs. It will mount a single drive like filesystem in /mnt/garden/ which makes organization of plots so much easier. Unlike many people believe there is no disadvantage to performance, specifically lookup times. And there is a script in the repo to analyze lookup times.

drewscho · November 22, 2023, 3:03pm

Thanks for the help. So, if I were to use the mergefs option, it would mount all drives into a single drive? How would I determine or find a drive that goes bad?

Also, don’t you have an option to take all the drives when they are “full” and combine their empty space into a drive and plot to that?

Maybe some more cmd line examples would be nice.

Also, when I run gardenmount --mount or --unmount --fstab-print
That doesn’t write to the fstab just exports text to copy? Is there a way to run --fstab-print without trying to mount or unmount the drives?

efnats · November 22, 2023, 7:03pm

Hi,

please kindly have a look at the readme page on github.

There are also lots of examples. If you want to use the --mergerfs it will mount all drives into a single drive like filesystem under /mnt/garden in addition to the drives being mounted at the mount points /media/root/CHI-[Serialnr]. So no difference if a disk goes bad.

Once you are done plotting, unmount all drives with gardenmount --unmount. Then mount the drives again including the --slack feature. For example: gardenmount --mount --mergerfs --slack
It will guide you through the process of utilizing the slack space. It will take a long time for only the first time you are using it and there is lots of text and prompts. When mounting the drives again it will be fast. The entire space is then available in /mnt/garden/ and only the slack space will be available in /mnt/slack/

When mounting a larger amount of drives, do not use fstab!! Use the systemd service gardenmount.service instead! The --print-fstab only prints text, it doesnt change fstab.The option is there only because I never removed it after I wrote the systemd.service files. I will not add any features around using fstab.

Here is a typical example of how you would use gardenmount in systemd after you have done plotting the disks and the slack space:

The file which you want to edit is /etc/systemd/system/gardenmount

ExecStart=/usr/local/bin/gardenmount --mount --slack --mergerfs --no-prompt --read-only

Explanation
–mount → mount the drives
–slack → use the slack space (which you had setup earlier in commandline)
–mergerfs → put everything (all drives and slack space) in /mnt/garden
–no-prompt → do not ask me stuff during slack setup because this runs during boot
–read-only → mount everything read-only. we are done plotting. safety.

Everything else is part of systemd
systemctl daemon-reload (after you made changes to the file, or reboot)
systemctl start|stop|restart gardenmount.service (start stop restart)
systemctl enable|disable gardenmount.service (start on boot, disable start on boot)
Or just re-run the installer.sh and then reboot. It will ask you to enable/disable services.

Jacek · November 22, 2023, 8:30pm

As mergerfs is exposing to the user level API just one logical drive, wouldn’t it potentially slow down proof searches, as harvester may be queueing all proof-search threads on one physical disk after another (thus at best blocking threads, at worst slowing to a crawl disk access).

efnats · November 22, 2023, 10:29pm

Yes, you are right. In theory, but in real life its not

There is a whole amount of theories why mergerfs is slower. And just from the perspective, that it adds a layer of abstraction with some overhead, removing that layer should always be better performance wise.

When I started my whole mergerfs journey around two years ago I saw an around 5s increase in lookup times on a 32disks jbod. Which is a lot - but then in terms of chia blockchain its still unproblematic. I still went away from mergerfs for that matter and had a look back at it around half a year ago with a later version installed.

With that newer version I did not encounter ANY increase in lookup times. I wrote a quick and dirty script in this repo called analyze_lookup where I tested this thoroughly over the period of a week each on an an array with 64 disks and didnt see any disadvantage. I really have no clue, what exactly changed along the way that made mergerfs much more suitable than before for farming. I am just happy that it is.

The author of mergerfs is well aware of some of the chia community members using his piece of software. So maybe he inplemented some improvements just for the farming use case. It is important to know, that the package maintainers version of ubuntu is really outdated. Thats why the installer script of Chiagarden has some logic, that will determine your CPU type and will download and install the latest mergerfs version from the mergerfs github page.

Even if you should decide against mergerfs for farming, there are still a lot of benefits from using it just from the perspective of organizing your farm.

Move all current plots on all chia disks to a subfolder called “bladebit”?
Create a new folder on all disks called “gigahorse”?
Finished plotting? Do chattr +i /mnt/garden/*.plot to set the immutable flag as an extra protection against accidental deletion.
Count your plots on all disks real quick? (ls -l /mnt/garden/*.plot | wc -l) <-(substract one).
Or simply just dont bother with mountpoints for chia farming AT THE MOMENT during plotting and do the dirty work with specifying mount points a bit later when there is time for it.

I found it made my life as a chia farmer so much easier. And I hope that with Chiagarden more people get to use it, too

Jacek · November 23, 2023, 12:59am

Thank you.

I was not trying to imply that mergerfs is slower in any way or is not useful. It should have the same performance as underlying file format, as it only adds “management” simplification. I do really like what mergerfs offers.

Actually, for plotting I don’t see any issues with using mergerfs, as final destinations can be the individual subfolders in margerfs root, so one destination folder will cover one HD.

Saying that, from GH or BB point of view, the harvester is most likely trying to look at how many HDs are out there and spawn possibly as many of proof-checking threads (if it can afford that many). This is where mergerfs is basically stripping the relevant info from the API that those harvesters are using (individual drives).

Therefore, I don’t think that improvements on mergerfs side can help with performance, but rather letting @madMAx43v3r know about it. One partial work-around for the harvester could be to use subfolders as a substitute for drives when trying to spawn those threads, if they see just one big drive.

And for the time being, maybe letting people know to check harvester performance when planning to switch to mergerfs as it may be that performance could be setup dependent.

efnats · November 23, 2023, 11:40am

For plotting with mergerfs I have intentionally setup the write policy mfs (most free space) as default in the mount options. Theoretically one could start with a set of empty disks that are all the same size and when set mergerfs as destination folder, it will always chose a new disk for the next copy operation. I did this myself some time ago with uncompressed plots and it works really well.

However, I think that for plotting gigahorse the plotsink tool is better, since it will pick a new drive depending on its load, not depending on its available space. For the lazy ones: the taco_list tool from this repo will output your available CHIA drives in various listing formats and the plotsink.service included also here is making use of it. I really dont want anyone to be having to type disk paths manually.

About what you said with the amount of threads for proof checking: I really like the train of thought, but also I don’t know if its the case or if its really a bottleneck (of the amount of available cores/memory per GPU).

I am interested in actually testing this. How would you check harvester performance if not by measuring lookup times?

Jacek · November 23, 2023, 3:51pm

On my setup (Ubuntu 22, 256 GB RAM, 3060 ti) built in GH plot distribution (direct or via plot sink) are a disaster. Both are bringing the box down after about 3-5 hours of plotting, so reboot is needed (GH restart doesn’t recover system from crawling). So, I need to let GH leave plots on the NVMe and had to write my own scripts to xfr those to destinations. Basically, the same operations performed, where one is using GH for plot xfrs, the other straight mv command, where GH always dies after short time. So, again it boils down to a particular setup.

I also wish that GH would add regex for destinations and/or HDs as subfolders. Although, GH chia_plot_sink_disable file is a super helpful feature in juggling HDs while plotting.

The example of threads spawn per disk was just meant as a simplification. The point being is that for instance when spawning 10 threads for 10 drives we could have two extreme scenarios. First where each thread would bound to just one HD (and process just it). This would end up in the fastest proof checks. The other (without knowing HD assignment), all 10 threads could potentially sit on just 1 drive and fight for stepper motor (heads movement) bringing the HD read speeds to crawl. In this case processing one drive would be really slow and potentially all threads would be moving from one drive to the next at the same time making proof-checking be hit with double penalty (slow reads, no parallelization). Of course, we can assume that threads will be coming from a fixed bucket, so for small setups the penalty may be not visible, but bigger setups may be more prone to show slowed performance.

You are right, charting lookup times and lookup delays (may show with multiple harvesters using the same remote compute) is the best way to monitor it. Although, it is kind of postmortem process. Potentially iostat -d could be helpful in real-time checks, although I don’t know how it would work with mergerfs.

EDIT: Although, one thing missing in the above is the amount of data read to process a proof-check. Even though the plot needs to be navigated (few different locations - head positions) potentially each chunk read fits in one head movement, so actual head movement congestion may not be that bad. Sure, some penalty will be there but not as pronounced as in reading bigger sequential chunks where heads are constantly switching between different reads. So, it may be that plot layout may be more important than the actual head movement interference.

drewscho · November 24, 2023, 6:25pm

Ok, I am starting to learn more! I plan on adding more drives to my farm. So if I use mergefs now, what happens when I want to add another 24 or 48 drive jbod?

efnats · November 26, 2023, 8:06am

you can add them anytime later and mount them with mergerFS the same way.

mergerFS is only a way how you’re mounting your existing drives in addition to the normal mounting method.

Even though it’s called fs for filesystem, it’s not a different way of formatting a drive and the filesystem is also not written anywhere onto the drive.

So if you’re adding more drives later the mountpoint /mnt/garden/ will just appaer to be a bigger drive in the system

drewscho · November 26, 2023, 5:13pm

Cool. I need to use more of these tools and learn more about them. One minor item I noticed, some of the commands have chia in front of them in your examples…like ./chia_plot_counter is actually ./plot_counter. Not a big deal but it took me a few minutes to figure out why it would work. Like I said I’m a Linux noob, so someone else probably didn’t even copy and paste examples.

efnats · November 26, 2023, 11:57pm

Thanks a lot for pointing that out! I updated the readme.

efnats · December 6, 2023, 4:27pm

New version of chiainit is out:

looks nice, doesn’t it?

its all green checkmarks now if everything is fine for a better overview
in case of errors they are displayed in the standard terminal
logfile still captures everything

Important change mounted disks are processed now. Destructive actions such as wipe and format won’t work on mounted disks anyways. The standard linux tools that are used here, will refuse to eat stuff thats not dead so at least now you’re all getting error messages. This is especially important if you are using any distro thats automounting drives. For example the desktop (not server) version of ubuntu.

I also updated plot_over the replot helper script.
The entire configuration is happening now in the config file. The only argument thats left, is the --config option which loads the specified file. Example is within the repo.

I’ve also fixed a bug which would cause wrong allocation of disk sizes in some scenarios.

Have fun! Let me know if you need help!

efnats

efnats · December 16, 2023, 12:54am

If you are using plot_over, please update to the latest version.

The script now (finally!) detects fully replotted disks correctly.

Those will be removed from the disks being monitored, which is better performance-wise.
But more importantly the amount of desired free disks gets auto adapted accordingly which is important for the end of the replotting phase when there are only few disks left as replotting candidates.

Compatibility to bladebit is currently broken. I’ll fix that very soon.

efnats · December 18, 2023, 3:10pm

Compatibility with bladebit and gigahorse plot naming schemes is restored with the latest plot_over.

If someone is replotting from bladebit it would be great to get positive feedback here.

drewscho · January 7, 2024, 12:38am

Can you explain replot more? Right now, I have a bunch of plots C8 and want to replot to C18. Right now I find 5 drives and run chiainit and then plot those 5 drives as C18. How does this help me do that?

efnats · January 8, 2024, 7:42pm

of course!

If you replot your drives by reformatting them and then create new plots on each drive it means that you will miss rewards as long as the drives are still quite empty.

By using a replot helper script like plot_over you can gradually replace plots by newly created plots with the desired compression level. This means that you still use your full disk capacity during the replotting phase.

drewscho · January 8, 2024, 8:02pm

So how do I make use of it? I went in and removed c18 and higher from the script. So it looks for anything that’s c18 and lower, then deletes plots off the drive, in the meantime I’m plotting and plotsync sees a drive with new empty space and puts plots there?

That is a great tool man! You’re right I miss out on full rewards for 2 to 3 days it takes to replot 5 drives depending on hd size.

efnats · January 8, 2024, 8:29pm

Thank you!

Yes, this part of the config file takes care of that:
replot_levels=“0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,18,19,20”

replot_levels are those plot levels which will be gradually removed by the script. So in this case, anything but level 17 will be removed. That is if you are replotting to C17 for example.

This is how that would look: