Best configuration for 1.8PB - farm slow

fisch88 · February 18, 2022, 10:47pm

I have a server connected in sas to an 18tb 102 hdd jbod.
Each disk is configured as “single” and all are full of plots.
Another similar farm part is mounted on nas Qnap and acts as a harvester.
Unfortunately as regards the jbod (Windows Server 2019 and disks formatted in REfs) the average times are about 3 seconds, sometimes even more than 5.
My question is: how do I optimize the farm? I think the SAS is the safest connection but it is extremely slow …

rfc2324 · February 19, 2022, 6:10am

I guess the seek time increases as you scale number of drives. Perhaps the ReFS has a way to cache file metadata (so that it doesn’t have to search it on HDDs)?

I talked to someone who has a 9+ PiB farm and they said it is all a single node. One powerful computer running TrueNAS with JBODs directly attached. They use ZFS with metadata caching set up in RAM (280GiB only for the cache, and the rest for system and scrubbing services).

Here you can read about these kind of caches:

https://docs.oracle.com/cd/E19253-01/819-5461/ghbxt/index.html

Look for the ARC (aka primarycache) and L2ARC (secondarycache). By default they cache both data and metadata, but with plots it is not feasible to cache data - so it should be switched to metadata only. The L2ARC is intended to work as an overflow buffer (e.g. L2 cache, when there is no sufficient space in L1 cache). A good candidate for that is a fast NVME.

I know migration to TrueNAS (or FreeBSD) to be able to use ZFS would be very difficult, but it is probably your best bet to be able to scale.
But first I’d try to research path of least resistance - find out whether ReFS offers similar caching functionality.

rfc2324 · February 19, 2022, 6:12am

PrimoCache might be your answer for Windows. It has the concepts of L1/L2 caches, and you can set up a strategy (read, write, or read-write), but I’m not sure how to make it cache metadata only. You may want to talk to its developers.

Rasalone · February 19, 2022, 4:12pm

If you change to truenas I suggest you go to truenas scale (linux) as there is a docker image for chia. I unfortunately running truenas core (freebsd) and the newer releases of chia don’t work on freebsd.

storage_jm · February 19, 2022, 4:56pm

The latency of SAS is very small, maybe 40us overhead. The latency is mostly the disk, and what the harvester reports is the cumulative latency of every plot that passes the filter in the entire stack (including file system). Each individual drive is ReFs or are you using storage spaces in a spanned volume?

fisch88 · February 20, 2022, 9:19am

My configuration is 102 single HBA drives formatted in ReFs, these drives are mounted inside a folder and the config file contains the list of all these folders …

p.s. the jbod connected in sas contains sata disks

vchim123 · February 25, 2022, 3:49pm

does anyone understand this? I need help, I’m willing to pay a little

fisch88 · February 26, 2022, 2:07pm

I still haven’t found different solutions to switch to ZFS. I tried before I was unsuccessful, I queried their forum and see if anything interesting comes up.

hedandan · March 17, 2022, 8:58am

I think my hardware configuration is the same as yours.
I was also windows server 2019+ 108jbod (16tb) before
Later, it was found that there are often timeouts in the log, which are greater than 5 seconds, and sometimes even 60 seconds.
Later, I changed the software scheme, and each hba card was a separate virtual machine, so that the pressure of scanning the disk was allocated to each virtual machine.
Now, there is a timeout every time I start the harvester, but it runs steadily after that.