Why not use NAS and RAID for your Chia farms?

WolfGT · May 14, 2021, 6:03pm

A quick update since plotting has started pushing new plots on a regular basis. Since I split the plotting traffic away from the farming traffic, it has really helped. The harvester response times are not affected by plots being written to the NAS. So far, response times are consistently below 0.1 seconds. Normally below 0.05 seconds. Every once in a while, I will see a 1 second response. Very rare and even then it does not correspond to when a file is being written. Just a random blip for no reason I can find but it is very rare (like maybe 1 in a thousand).

WolfGT · May 16, 2021, 12:33am

Well, the temperature was not the issue. It looks like I am going to get 19 plots in a 24 hour period. But that is from a dead start with plots in progress as it rolls over the 24 hour point. So, maybe tomorrow with the plots finishing that are already going it may reach the 24 plot target. Not sure, I’ll let it go and see how it averages out.

As for the second system. I forgot I have a new SATA SSD (samsung) sitting here. I’m going to put it to work while I wait for the NVME to show up on Monday. It will be interesting to see the difference in speed. Also, I’m going to try Swar plot manager for the first time. I’m going to start it at the same time tonight as I did the other last night so I can see apples to apples when I look at when the plot files are written.

Another question for this, would I benefit using the NVME as the temp1 drive and the SATA SSD for temp2 (vs everything on the NVME)? Or would the SATA slow it all down?

codinghorror · May 16, 2021, 12:44am

I compared NVME and SATA here… for one plot the difference is surprisingly little, but for multiple plots it will be enormous due to the heavy overlapped I/O.

I think people forget how fast a 20gbps USB port can be, however, you have to pick the right controller interface chip or things get ugly real quick. The ASM2364 is what I use and can recommend:

Equipped with ORICO’s latest USB3.2 Gen2 main control chip ASM2364, which supports up to 20Gbps transmission bandwidth,uses PCIe 3.0 x4 interface with a theoretical bandwidth of 4GB/s, fully meets the bandwidth requirements of USB 3.2 20Gbps (2.5GB/s) and support NVMe

WolfGT · May 16, 2021, 12:53am

Yeh, I run 8 in parallel on the NVME, I’m cutting it back to 5 for the SATA to try to help it out. But at least I will get some plots while I wait for the NVME.

enderTown · May 16, 2021, 1:25am

This is fantastic logic and convinced me to move off of my software raid using Windows Storage Spaces and back to just a bunch of disks mapped to folders. I gained 1/5th of my storage back and I found a disk that was almost dead in the process - when it died, I would have had to scramble to replace it or risk losing the whole array. Now I sleep easier knowing that if a single drive dies, I just replace and replot it. Works especially well for me cause I’m targeting lots (60+) of the smaller renewed drives that are still available using SATA port multipliers and a custom “enclosure”.

This specific use case is quite different from any other data storage and sometimes it takes direct logic like this to open one’s mind. Thanks @vitekorre!

Quindor · May 16, 2021, 1:30am

Yup with Chia it’s like this.

If you have 8x 8TB drives, that’s 64TB.

All individual, you can farm with 64TB until one fails then you are down to 56TB, another might fail after a year and then you have 48TB.
RAID5 you can farm with 56TB until one fails and you still have 56TB, if a second fails, you have nothing (unless you replaced, but array this big is not advised)
RAID6 you can farm with 48TB until two fail and you still have 48TB.

Looking at it like this, you are always better off using single drives since you start out with the highest amount and even if one or two would fail, you would have farmed with more space for longer.

4KDefinition · May 16, 2021, 2:34am

I had a good read through all this, on the note of backing plots up with raid, are the files compressible? to save on back up storage in the event of a main drive failure? I’m not an expert at all so it may be a dumb question.

enderTown · May 16, 2021, 2:46am

No* because:

The plots are filled with random data and compression works because normal data has repeating patterns
Surely the chia devs would have built in the best available compression to save as much space as possible - there is even a “compression” phase in the log during plotting.

* Note: I have not actually tried compressing a plot file.

Blueoxx · May 16, 2021, 2:55am

To add to what enderTown said, Phase 3 of plotting already does compression on the plot file from ~249GB to ~108GB. I don’t think it’ll squeeze down anymore than that.

4KDefinition · May 16, 2021, 2:58am

I see, that makes perfect sense, thanks.

WolfGT · May 21, 2021, 3:14pm

Well, splitting the network between plot and farm traffic definitely helps. But it does not make everything perfect. As the plot count grows, so does the search times. It was still within reason but every day, it grows. So I spent yesterday setting up a harvester directly on one of the NAS systems. In a docker container. Today I setup Chiadog in the same container with it. Running really well. Now that I have the process down, I’m going to do this same thing to the other NAS’s. Farming over the network just isn’t quick enough. Maybe with 10GbE, but I don’t have that luxury. So, it was a good experiment, but in the end, not good enough. Local harvesters are the only way.

naq1 · May 21, 2021, 4:08pm

In raid 5 what is the rebuild time of these large disks going to be?
I suspect it will be measured in days not hours.

Yae · May 21, 2021, 4:24pm

Days, and it very much depends on whether you want the NAS to keep serving, or you shut it down for the rebuild. E.G. Synology puts forward 10 MB/s if you have the storage still serving, and 125 MB/s if you go for a dedicated rebuild process.

If it were pure for Chia, I would go for JBOD, but I just am using spare space on my backup NAS which is at the moment RAID 5.

gladanimal · May 21, 2021, 4:26pm

IMHO: RAID5 has N-1 space and it goes on if 1 of disks fails. And for non-RAID N space used and if one disk fails it becomes the same (N-1).
So if you use RAID5 and has one broken disk I can say you not actually used it in this non-typical storage scenario.
P.S. I think there can be a strategy to have as many plots as you can at the same moment, but not save this plots as long as you can. Plots can be re-plotted.

Yae · May 21, 2021, 4:31pm

In practice N-2 as you would also keep a spare for emergency replacement

gladanimal · May 21, 2021, 4:34pm

I added ‘P.S.’ )))
The goal is not to save some data what couldn’t be restored. All what we are storing on those disks can be “restored” )))

WolfGT · May 21, 2021, 4:44pm

To be totally honest. I had new drives coming in for the small NAS and the Synology’s were not up yet when this discussion started. So I took everyone’s feedback and went and did some research. In then end, I setup all of my Chia NAS’s as JBOD. I appreciate the information and glad I was in the stage that I could make the change.

luckidog · May 22, 2021, 2:33am

Any drive you use for parity and not plotting isn’t making money, guaranteed. Better to have it in service than reserve unused space. JBOD is basically solved as the correct solution.

danarbraz · May 22, 2021, 3:11am

With zero doubt! I’m a bit stuck with two LDs I initially created when I decided to go bigger. Both are RAID5, 100TB raw capacity. Think I want to mess around breaking that up for weeks at the moment? Nope. So, I leave darn near 20TB stranded in parity land for now.