Large Farm OS Choice

I am slowly but surely getting to a point where my current rig and operating system is reaching scalability limits.

  • Running out of PCIE lanes. There is not enough 8x slots on a consumer motherboard to connect more HBA
  • The more drives are connected in SAS JBODs, the more weird problems start occurring during POST. Sometimes it takes 5+ minutes to POST! And sometimes the Virtualization technology gets disabled on its own, despite it is configured as enabled in BIOS (thus Docker is unable to start)
  • 128GB RAM is the limit on consumer motherboard and this processor. The more plots are in the system, the more important page cache becomes to reduce lookups
  • Windows 10 Pro has a hard limit for write cache. It seems to be limited to certain percentage of the available memory, and caps out at ~16GB. When you move plots over 10Gbe network from your plotter, this becomes a bottleneck for the plotter (it is unable to “spit out” the plot fast enough, as result gets bogged down by a protracted copy routine)
  • Windows 10 Pro is bloated with unnecessary software and the worst part - it forces a restart when update comes in. I know you can disable automatic restart when running as a privileged user, but I run chia as a non-privileged, to minimize risk of security issues

So with that being said, I have embarked on a journey to pick better hardware and software.

For hardware, the here were my considerations:

  • A server-grade motherboard with a server-grade CPU, decent number of cores
  • Lots of PCIE lanes and 8x or 16x slots
  • Single socket, to consume less power. This system is intended to primarily support the farm. Plotting with MadMax would be nice to have, if I have extra HDDs to fill
  • I’ll start with 256GB RAM and will buy more in the future if needed

For software, main criteria are:

  • A server-oriented OS. No more consumer crap limitations and automatic restarts
  • Some sensible GUI (either remote desktop or web-interface) option, to be able to easily see current performance stats, easily manage disks. I am reasonably proficient in managing Linux through CLI, but it is hard for tedious repetitive tasks like mounting and keeping track of all my disks

So I started researching and tried the following OS.

TrueNAS SCALE

A new kid on the block, a linux-based OS that features ZFS. The promise of ZFS is appealing because it has flexible caching algorithm, automatic data scrubbing (to prevent bit rot). It is intended to support extra large file systems, where disks are simple building blocks. It has a nice web-UI that allows doing exactly what it the system is intended for: create ZFS volumes and easily manage disks in it.

Another +1 for TrueNAS is that it has Chia software available as a 1-click install through their package system.

I installed it and found myself in a situation: it force-feeds you ZFS. You must format all your disks in ZFS, or else it is not gonna work. I quickly realized that TrueNAS is intended for “appliance-like” servers, where all the management is done through web-GUI. If you try doing anything through CLI, first of all you are on your own, second, they have a variety of kernel-level limitations that will make it impossible or very difficult to do anything. So for an existing farm of disks formatted in exFAT, this presented a painful data migration challenge.

But I wanted to give ZFS a shot. So on my plotter system (Debian), I inserted 12 HDDs and created 4 ZFS “striped” pools, containing 3x 18TB disks. Normally, 18TB HDD is able to fit 165 plots, with ~40GB remaining wasted space. I thought if stripe 3 HDDs in a pool, the leftover 40 + 40 + 40 GB would be enough for +1 extra plot. So I started plotting and filling up those pools. To my surprise, I realized that the ZFS has a lot of overhead in terms of space usage. So a pool of 3 disks couldn’t even fit 3x 165 plots. It wasted a lot of space on all the meta information and striping stuff.

That was a deal breaker. I don’t want to put up with ZFS overhead at a price of less plots per disk/pool. While the data integrity and scrubbing are nice, it is not that important for plots after all. You can easily replot a few disks if they start to “bit rot”.

So with ZFS out of the question, it automatically excluded TrueNAS.

Windows Server 2022

I decided to give it a try as this would be the easiest migration in terms of existing setup. And I didn’t want to be bogged down by consumer Windows 10 limitations.

Installed and worked with some quirks (regarding WSL and weird behavior in the terminal), but overall it was usable. What I liked most is that it doesn’t have any bloatware and is already optimized for server use case. Write cache and page cache can take as much RAM as you can afford.

But Microsoft has weird licensing model and limitations. They like to charge some ludicrous amounts for “per-core” “per-user” licensing fees, that can quickly add up to $1000+

There is a Windows 2019 “Essentials” edition that has sort of a “flat rate” licensing, but they don’t mention that the OS is limited to 64GB RAM and 16 cores (logical or physical? I don’t know). And there is no more “Essentials” edition for Win 2022.

Windows 11

Hey, why not give it a shot? Maybe it is better than Windows 10?

I was stopped in my tracks by the fact that my server motherboard didn’t allow for “secure boot”. I don’t know what the hell it is, and apparently I didn’t have a TPM chip. Knowing that the Windows 11 is likely as bloated as Windows 10, I didn’t want to go that route anyway.

Windows 10 Pro

So I briefly tried installing a Win 10 Pro system on my new server.

Actually it offered better performance for RAM Disk benchmarks than Win 2022 using same CPU/Memory settings. But it had bloatware. Lots of it.

So I tried one of those “de-bloating” scripts to disable most of telemetry and Cortana, and Tracking, and God knows what else. Unfortunately among all things it nuked the Terminal application among all things, making the OS unusable for the farm. FUBAR. Time to try another OS.

Ubuntu

Next thing I tried was Ubuntu. It was easy to install. That’s about it :slight_smile:

Gnome GUI is terrible. But what’s more terrible is accessing it from Windows. I tried XRDP and it is completely bugged. It connects and authenticates, but seems to be failing to display some of the “overlays” of the GUI, rendering it completely unusable.

But even when I connected to it through IPMI (Supermicro out-of-band management solution), the Gnome GUI programs sucked ass. The disk management program was a joke. It failed to recognize a hardware raid-0 of my plotting SSDs, and actually destroyed all the contents on them.

The resource monitor was underwhelming comparing to the Windows stuff as well. You feel like you’re blind when using this sparse GUI.

Red Hat Enterprise Linux

OK I thought. I understand that Ubuntu is free and they have no incentive to make a good GUI.
But what if I tried a commercial solution from RHEL? They must be better than that.

Installation of RHEL was difficult and riddled with bugs. It is too long to describe, maybe deserves a separate post. But eventually I was able to install it.

So what do I get? The same piece of shit Gnome GUI. Except that XRDP is not supported, I have to use a TightVNC or TigerVNC. The only good thing about RHEL is that they have well-maintained documentation. It took me ~4 hours to set up a good VNC connection to the GUI. Yet the GUI is still the same piece of shit Gnome. Resource monitor is a joke. Disk management program is amateur.

My conclusion about Linux is that the only way to efficiently use it is through SSH. And I am not ready for this - to manage hundreds of disks on a huge farm only through CLI.

Back to Win 2022

While trying all these other OS, I actually called Microsoft and asked them what would be the simplest license structure for a 1 bare metal server and 1 person using it. They struggled to answer, but referred me to one of their “partners”. Turns out that those partners have a different licensing structure than the “retail” Microsoft. They said something along the lines “Give us $465 per server + $75 per person and we don’t care how many cores you have, and you are free to use any Win 2022 edition”.

I think I’m going to give it a try.

4 Likes

As far as TPM, I saw a couple of different ways to suppress it. One was through some ini file (on the USB install drive), the other through registry (before running in-place upgrade). Although, I have not tried it myself.

On the Red Hat side, I would use Rocky Linux (CentOS successor, server oriented, very long-time support, maybe the most deployed server OS, although it may be too SSH centric - I have never tried to install GUI on it) or Fedora (UI focused, kind of Ubuntu on RH side, packages are the latest, comparing to Rocky Linux or CentOS, but stable / tested packages are later migrated to RH / Rocky). Both are free. If you don’t like Gnome, you can switch to KDE (not sure whether it is any better, though - maybe just preference).

The problems you described with your current setup look like buggy BIOS and/or unstable mobo. As far as PCI lines, you have not mentioned what CPU you have, but Intel rather sucks compared to AMD.

UPDATE
By the way, I am not sure whether this is the same thing, but when my plotter had only 64GB RAM, I was trying to increase Win 10 cache size by modifying LargeSystemCache param. It didn’t help much, if at all. Although, plotter was all about caching writing side, not really reading. Maybe you can check it out - How to increase cache memory in Windows: 10, 8, 7

Thanks for sharing your journey, lots of useful info :slight_smile:
I was wondering about TrueNAS, as I already use it at home. You saved me a lot of time !
Thanks

Why would anyone need a GUI on a server? :grinning:

If you really need to, try x2goserver on the linux server and x2goclient on windows. I use it to access a linux server from I linux client. Can’t say how good the Windows client works.

https://wiki.x2go.org/doku.php

5 Likes

I’m using Centos 8 for the main node, Centos 9 has been released since though not wanted to change just to try it. Also a liking for Trunass and have a remote harvester running on it, they use docker which I’m not familiar with. Zfs does work out 1 maybe 2 plots less per disk, using a single disk as a pool. Much of the advantages are lost without redundancy so it can’t correct errors, but assuming it still checksums, you will get a notification if a disk errors.

To plot to them I mounted them as nfs shares, across the network. Found you need to turn of sync writes or the transfers are really slow.

If I needed to expand I would set up another Trueness system, the original case only holds 16 disks.

Also tried rockstor, based on suse linux but had a problem when plotting to it and farming at the same time the farmed plots became unresponsive. Maybe it was my hardware not enough or just not knowledgeable to have it set up right.

Anyone used unraid? Was interested in trying it out as believe you can combine disks for a larger capacity while only loosing data on one disk if it fails.

1 Like

I gave up my command line stuff 6 years ago, so I’m now a GUI guy with mouse and bottle of beer!!!

1 Like

If you use Windows 10 Pro, use the N version, Windows 10 Pro N. This gets rid of some of the bloatware and media players. I like you have been trying to different software versions on a Dell R510 trying to settle on something I like. I always tend to lean back towards windows, just because I’m more proficient in it.

So I decided to give Ubuntu another shot, this time around using Xubuntu edition.
Thanks to this guide, I was able to set up an adequate remote desktop connection:

The XRDP + Remote Desktop combo works great so far. I am setting up and benchmarking necessary hardware (RAID drivers for temp drive, HBA card, 10Gbe card…) and software (running a MadMax test benchmark, next on the list the Chia software, chiadog, plot mover…)

I’ll post my findings later when everything is tested.

2 Likes

This is AWESOME! Thanks for posting it! Now I can monitor and work with all my Linux systems through my main Windows PC. Thank you thank you thank you! :smiley:

You forgot an easy one

Just what I installed

Except that you still have Cortana, the Windows Store, forced telemetry, Xbox services, and much, much more that are still present, running, and cannot be removed from Windows 10 Enterprise.

Linux is my choice for a large farm all day and every day, but if you must use Windows, use Windows 10 Enterprise LTSC.

1 Like

When you shut things down they aren’t running, hope you didn’t forget the extra
image, I guess you haven’t looked at services lately. But that’s ok.

1 Like

They’re not features you can turn on or off. Run services.msc and have a look there.

How to uninstall Cortana | Tom’s Guide (tomsguide.com)

2 Likes

You can also go in task manager startup tab and disable it.

Installed Xubuntu

Followed guide to connect remote desktop (as described earlier).

Installed RAID software to support Intel VROC RAID-0 (for plotting drives): https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/ssd-software/VROC-Ubuntu-Setup-UserGuide-342787-US.pdf
Used apt install ... for the mdadm and ledmon instead of what is described in the manual.
I also followed Intel’s instructions how to setup ledmon service.
After reboot, Ubuntu was able to recognize the RAID array as a block device and depicted it on the desktop as mountable. I was able to mount it by right-clicking the icon and Mount. It just worked.
However, it was previously formatted as NTFS, and I tried to delete the partitions using fdisk. But I didn’t know how to properly refer the device. I tried various things with various success rate, but eventually just rebooted the thing and re-created RAID in BIOS.
After the system booted, I was at a loss how to mount the RAID disk. The individual SSDs were showing up as /dev/sda and /dev/sdb, but there wasn’t a device that represented the hardware RAID-0 for them.
GUI to the rescue! I installed the Gnome Disks program and it recognized the RAID right off the bat. I simply right-clicked to format the thing as an ext4 file system and the disk emerged as mountable.
Happy camper.

Installed Mad Max plotter as described in the instructions.
Created RAM disk 108GB:

sudo mount -t tmpfs -o size=108G tmpfs /mnt/ram/

Created plot.sh script:

#!/bin/bash
FARMER_KEY=***
CONTRACT=***

THREADS=22
T_MULTI_2=2
BUCKETS_1=512
BUCKETS_2=256

~/chia-plotter/build/chia_plot -n $1 -r $THREADS -K $T_MULTI_2 -u $BUCKETS_1 -v $BUCKETS_2 -t /media/***/tmp/ -2 /mnt/ram/ -d $2 -c $CONTRACT -f $FARMER_KEY 2>&1

I used 22 threads since my CPU has 22 cores. The temp1 storage is a hardware RAID-0 array of two Intel S4610 SSDs.

Observations:

  • Plot completed in ~19 minutes. Using exactly the same hardware on Windows + WSL, the best I could do was 26 min. So Linux bare metal is the best performance you can get
  • Also creating the RAM disk in /mnt/ram/ was instant, unlike creating a RAM Disk on Windows

Chia Software

Downloaded the .deb package from Chia website.

Thanks to Xubuntu GUI, I simply double-clicked on it and it got installed. It had unpacked files in /usr/lib/chia-blockchain/resources/app.asar.unpacked/daemon/

I don’t actually intend to run GUI, so next steps were to migrate the existing farm. Since I have the chia database and keys located on a separate drive (NTFS coming from Windows), I had to move all the contents to a temp directory first, and then re-format it to Ext4.
Since NTFS volumes get mounted with “too much permissions”, I had to “fix” permissions on all of the files migrated from the NTFS volume. Thanks to GUI again, I was able to do that quickly using context menus.
Then I moved the contents back to it, ready to be deployed at the following paths:

/mnt/db/.chia
/mnt/db/.chia_keys

Added the chia “unpacked” directory to $PATH to .bashrc so that I can run chia ... commands.

HBA Cards

Got recognized out of the box in Ubuntu. It just works. As soon as you turn the JBOD on, the GUI starts displaying all the “unmounted” disks on the desktop.

Hard Disk Drives

The big task is to re-mount all the HDDs. Again, GUI for the rescue.
You run the Gnome Disks and for each of the item in the checklist, you do the following:

  • Copy serial number
  • Select the partition to mount
  • Choose “mounting options”
  • Disable “show in the user interface” option
  • Keep the “Identify as” option, defaulted to UUID
  • Change mount point to `/mnt/

Congratulations. You changed the mounting instructions. You can knock yourself out by running this command to confirm that indeed instructions were saved:

cat /etc/fstab

But all that would apply after reboot. To do this before next reboot, simply click the “mount” button in the UI of the Gnome Disks.

Weird observation:

  • If a disk was initialized and formatted using MS Windows, it would have: 1st partition 17MB (Microsoft Reserved, Unknwon), followed by the actual big partition, followed by a tiny bit of “free space”
  • Instead, if you made a partition using fdisk, and created a single partition to span across entire device, then created an exfat file system on it using linux, there is a problem

Linux and/or gnome disks takes a lot more time to process this “big” partition when mounting. Even if you click on it in the Gnome Disks program, it gets noticeably slow. In the meantime, resource monitor displays that one of the CPU threads is occupied at 100%.
Not sure how to work arouud it. Hopefully I won’t have to migrate exFAT to another file system on all of my disks.
After adding almost a 100th disk, the Gnome Disks had hanged for very long time. Eventually it lagged out but it seems to be taking an uncomfortable amount of time.
I rebooted the machine and it successfully mounted all the HDDs. Not sure how this is going to work in the long run, maybe I’ll have to resort to CLI mounting.

TODO

  • Figure out how to run chia farm at startup without logging in
  • Think of gradually reformatting disks from exFat into a linux-native file system, if makes sense
2 Likes

I think there are several different examples floating around, but this is the systemd unit I use to initiate farming automatically at startup. On Ubuntu, put this at /lib/systemd/system/chia-harvester.service, run sudo systemctl daemon-reload && sudo systemctl enable chia-harvester.

[Unit]
Description=chia harvester
Wants=network-online.target
After=network.target network-online.target
StartLimitIntervalSec=0

[Service]
Type=forking
Restart=always
RestartSec=10
User=YOUR_USERNAME

# venv/bin must be in the PATH b/c the scripts expect to be able to call chia_harvester on cli
ExecStart=/usr/bin/env PATH=/home/USERNAME/chia-blockchain/venv/bin:$PATH chia start harvester -r
ExecStop=/usr/bin/env PATH=/home/USERNAME/chia-blockchain/venv/bin:$PATH chia stop harvester

[Install]
WantedBy=multi-user.target

That’s all it takes to farm at startup.

To start/restart: sudo systemctl restart chia-harvester

To see service status: sudo systemctl status chia-harvester

Nice write up and good luck with the finishing touches.

2 Likes

Wo wo buddy.
Why not run each of them?
I think u should look into a real hypervisor. Type A
I’m gonna give you a tip here.Install
PROXMOX at your soonest convince.
Try out each install for yourself. Yes every single os has its own shortcomings. So it’s up to u to best suit your needs.
general rules go tho.

Stick to what your most firmiliar.

No reason to learn all of Linux just to farm chia.

I’m very skilled with Linux yet I choose to have my full node virtualized in a windows vm inside proxmox. With lxc container harvesters. All virtual.
For a few reasons
I want my chia farm personally, rock dum simple, lots of support. No terminal, easy to troubleshoot. Cmon - it’s windows
I run my harvesters lxc Ubuntu containers. Because they are just so easy to duplicate and re deploy.
And the real kicker on all that is
All my vms and ct live on a few small ssd zfs pool orcastrated in proxmox…
Now this is paramount for a few reason.
1 LIVE DAILY INCRIMENTAL ROTATING BACKUP. If there’s ever a problem. I set it back a day and just re sync
2 real live migration with sync for full node Ect…
If a server goes down.
Another chia full node instance appears.

There’s 100 reason. Just trust me. Make ur install on bare metal with proxmox. Dm me if u need any tips
Proxmox is extreamly touchy. And u can very easily break your whole install.

My vote

PROXMOX
virtualize everything.

and use one of those silly osses inside of it. Any will do. Choose easy tho.

Looks like Vmware workstation(or is it more like ESXI on bare metal) , so if you have three VM harvesters and want to have 10 hard disk on each the main machine has to handle this correct? And you have to slice up your CPU per VM guest. For Xmas I want a dual 18core system. :slight_smile: Have to download and play I guess.

i have 3 hype v serves from 2009-2010 2 dells and a hp.
all over 3 ghz with 24 cores in each of them.bare metal proxmox cluster.
boatload of fun.
each physical machine has its own stuck in place harvester. with usbs and internall hard disk aloted in a jbod fashion with mhddfs.

but my main node floats around between the 3 servers and actuallty hAS no hard disks connected to it at all(els it couldnt float)

plus i can use servers for many other things than just hyper coverage,
plex, HOOBS, Cloudron, websites, other cryptos, minecraft servers
one server has 50 minecraft server containers… and a chia hRVESTER LOL

goto fill all them lanes lol.

im entirely self tought. everything kinda happened for me with chia magic. i even enharited a 100 disk 4gigabit fiber SAN from 2006.

i setup all the bells and wistles to proxmox just for fun and to make minecrft servers.

i even virtualize my home router with pf sense.

proxmox the TRUTH