I am slowly but surely getting to a point where my current rig and operating system is reaching scalability limits.
- Running out of PCIE lanes. There is not enough 8x slots on a consumer motherboard to connect more HBA
- The more drives are connected in SAS JBODs, the more weird problems start occurring during POST. Sometimes it takes 5+ minutes to POST! And sometimes the Virtualization technology gets disabled on its own, despite it is configured as enabled in BIOS (thus Docker is unable to start)
- 128GB RAM is the limit on consumer motherboard and this processor. The more plots are in the system, the more important page cache becomes to reduce lookups
- Windows 10 Pro has a hard limit for write cache. It seems to be limited to certain percentage of the available memory, and caps out at ~16GB. When you move plots over 10Gbe network from your plotter, this becomes a bottleneck for the plotter (it is unable to “spit out” the plot fast enough, as result gets bogged down by a protracted copy routine)
- Windows 10 Pro is bloated with unnecessary software and the worst part - it forces a restart when update comes in. I know you can disable automatic restart when running as a privileged user, but I run chia as a non-privileged, to minimize risk of security issues
So with that being said, I have embarked on a journey to pick better hardware and software.
For hardware, the here were my considerations:
- A server-grade motherboard with a server-grade CPU, decent number of cores
- Lots of PCIE lanes and 8x or 16x slots
- Single socket, to consume less power. This system is intended to primarily support the farm. Plotting with MadMax would be nice to have, if I have extra HDDs to fill
- I’ll start with 256GB RAM and will buy more in the future if needed
For software, main criteria are:
- A server-oriented OS. No more consumer crap limitations and automatic restarts
- Some sensible GUI (either remote desktop or web-interface) option, to be able to easily see current performance stats, easily manage disks. I am reasonably proficient in managing Linux through CLI, but it is hard for tedious repetitive tasks like mounting and keeping track of all my disks
So I started researching and tried the following OS.
A new kid on the block, a linux-based OS that features ZFS. The promise of ZFS is appealing because it has flexible caching algorithm, automatic data scrubbing (to prevent bit rot). It is intended to support extra large file systems, where disks are simple building blocks. It has a nice web-UI that allows doing exactly what it the system is intended for: create ZFS volumes and easily manage disks in it.
Another +1 for TrueNAS is that it has Chia software available as a 1-click install through their package system.
I installed it and found myself in a situation: it force-feeds you ZFS. You must format all your disks in ZFS, or else it is not gonna work. I quickly realized that TrueNAS is intended for “appliance-like” servers, where all the management is done through web-GUI. If you try doing anything through CLI, first of all you are on your own, second, they have a variety of kernel-level limitations that will make it impossible or very difficult to do anything. So for an existing farm of disks formatted in exFAT, this presented a painful data migration challenge.
But I wanted to give ZFS a shot. So on my plotter system (Debian), I inserted 12 HDDs and created 4 ZFS “striped” pools, containing 3x 18TB disks. Normally, 18TB HDD is able to fit 165 plots, with ~40GB remaining wasted space. I thought if stripe 3 HDDs in a pool, the leftover 40 + 40 + 40 GB would be enough for +1 extra plot. So I started plotting and filling up those pools. To my surprise, I realized that the ZFS has a lot of overhead in terms of space usage. So a pool of 3 disks couldn’t even fit 3x 165 plots. It wasted a lot of space on all the meta information and striping stuff.
That was a deal breaker. I don’t want to put up with ZFS overhead at a price of less plots per disk/pool. While the data integrity and scrubbing are nice, it is not that important for plots after all. You can easily replot a few disks if they start to “bit rot”.
So with ZFS out of the question, it automatically excluded TrueNAS.
Windows Server 2022
I decided to give it a try as this would be the easiest migration in terms of existing setup. And I didn’t want to be bogged down by consumer Windows 10 limitations.
Installed and worked with some quirks (regarding WSL and weird behavior in the terminal), but overall it was usable. What I liked most is that it doesn’t have any bloatware and is already optimized for server use case. Write cache and page cache can take as much RAM as you can afford.
But Microsoft has weird licensing model and limitations. They like to charge some ludicrous amounts for “per-core” “per-user” licensing fees, that can quickly add up to $1000+
There is a Windows 2019 “Essentials” edition that has sort of a “flat rate” licensing, but they don’t mention that the OS is limited to 64GB RAM and 16 cores (logical or physical? I don’t know). And there is no more “Essentials” edition for Win 2022.
Hey, why not give it a shot? Maybe it is better than Windows 10?
I was stopped in my tracks by the fact that my server motherboard didn’t allow for “secure boot”. I don’t know what the hell it is, and apparently I didn’t have a TPM chip. Knowing that the Windows 11 is likely as bloated as Windows 10, I didn’t want to go that route anyway.
Windows 10 Pro
So I briefly tried installing a Win 10 Pro system on my new server.
Actually it offered better performance for RAM Disk benchmarks than Win 2022 using same CPU/Memory settings. But it had bloatware. Lots of it.
So I tried one of those “de-bloating” scripts to disable most of telemetry and Cortana, and Tracking, and God knows what else. Unfortunately among all things it nuked the Terminal application among all things, making the OS unusable for the farm. FUBAR. Time to try another OS.
Next thing I tried was Ubuntu. It was easy to install. That’s about it
Gnome GUI is terrible. But what’s more terrible is accessing it from Windows. I tried XRDP and it is completely bugged. It connects and authenticates, but seems to be failing to display some of the “overlays” of the GUI, rendering it completely unusable.
But even when I connected to it through IPMI (Supermicro out-of-band management solution), the Gnome GUI programs sucked ass. The disk management program was a joke. It failed to recognize a hardware raid-0 of my plotting SSDs, and actually destroyed all the contents on them.
The resource monitor was underwhelming comparing to the Windows stuff as well. You feel like you’re blind when using this sparse GUI.
Red Hat Enterprise Linux
OK I thought. I understand that Ubuntu is free and they have no incentive to make a good GUI.
But what if I tried a commercial solution from RHEL? They must be better than that.
Installation of RHEL was difficult and riddled with bugs. It is too long to describe, maybe deserves a separate post. But eventually I was able to install it.
So what do I get? The same piece of shit Gnome GUI. Except that XRDP is not supported, I have to use a TightVNC or TigerVNC. The only good thing about RHEL is that they have well-maintained documentation. It took me ~4 hours to set up a good VNC connection to the GUI. Yet the GUI is still the same piece of shit Gnome. Resource monitor is a joke. Disk management program is amateur.
My conclusion about Linux is that the only way to efficiently use it is through SSH. And I am not ready for this - to manage hundreds of disks on a huge farm only through CLI.
Back to Win 2022
While trying all these other OS, I actually called Microsoft and asked them what would be the simplest license structure for a 1 bare metal server and 1 person using it. They struggled to answer, but referred me to one of their “partners”. Turns out that those partners have a different licensing structure than the “retail” Microsoft. They said something along the lines “Give us $465 per server + $75 per person and we don’t care how many cores you have, and you are free to use any Win 2022 edition”.
I think I’m going to give it a try.