Why not use NAS and RAID for your Chia farms?

I have read through this thread because I am headed down the same road as you went. Multiple NAS devices. What I don’t understand is that there are clearly two solutions to your issue, why you picked the path you did.

  1. Abandon your expensive but very solid storage setup and replace it with a very convoluted mess of USB drives (which I have seen issues in other threads with that setup).

  2. Just add a harvester for each NAS. For the cost of one of those USB drives, you could purchase a micro PC that could be a harvester.

I don’t understand your logic of taking the path you did. You say that multiple harvesters would make your setup over complicated but yet you are removing NAS devices and adding a butt-load of USB devices. It would have taken maybe an hour to setup the additional harvesters and you are spending weeks (most likely) to migrate onto a weaker setup. I am very confused.

I think the answer is riiiight here in those words. You see “convoluted mess” I see “extremely simple, easy to understand, and inexpensive to set up”.

USB is a fairly fast and sophisticated protocol these days in 3.2, and will get even better in USB 4 when thunderbolt is merged in.

1 Like

Trying to be respectful, but if stacking 20+ USB drives was extremely simple or even remotely reliable you would see server rooms all over the globe with thousands of USB drives. Well, it’s your setup. I don’t have to agree with it. It is just a huge step backwards when you had a very simple solution right in front of you.

I can’t imagine how frustrated/pissed off you had to be just watching rewards slip away.

Starting this weekend, I’ll be adding about 60 plots a day to my NAS setup. I’ll let you know how it goes as it grows. My first NAS has only 15TB but my second has 120TB. Should fill them in a couple weeks. I’m holding off on what comes after that until I see how it all performs and how dramatic the network space grows. I wish I would have known about this when it was small.

But Chia isn’t your typical storage use case. It has its own peculiar requirements, and durability/reliability don’t matter… at all. Speed does though!

Reliability matters when you spend days/weeks filling a storage device and then a week later it dies. The reliability/durability is protecting the speed you put in. But then again, you didn’t have any protection setup on your NAS either. I always run RAID 5 at least. The extra cost is worth not loosing weeks of plotting. All my opinion of course.

Any sort of RAID is likely a waste (or insanely dangerous if we’re talking RAID0). Assume you’ll experience your first drive failure very soon, like, in one year. You’ve just lost a drive-year of storage storing parity, because RAID5. What does it buy you? However many drive-days it takes to fill up a replacement drive.

What if a second drive fails before you can replace the first failed one? You get to replot the entire array. Good times.

Drive failures happen. Having proper RAID setup protects you from that inevitable situation. I have been managing drive arrays for over 20 years. I have never had a second drive fail before the first one is replaced. That situation just doesn’t happen. I’ve never seen it.

Statistically it can, but for example, Samsung SSDs are in practice MORE RELIABLE THAN INTEL CPUS. Let that one sink in. :exploding_head:

This is likely not true of spinny rust HDDs though, see backblaze for latest data…

3 Likes

For an array size of N, by running RAID5 you’re just wasting 1/N of your capacity. Assuming you don’t have drive failures on a weekly basis, you’ll clearly come out ahead by just re-plotting a replacement drive.

Your 20 years of dealing with storage arrays are biasing you. Just do the math of “Plot-Hours-Farmed” between a bunch of single drives, and RAID5 array.

1 Like

Chia is really, really not a “typical” storage scenario. That’s problem number one.

2 Likes

Losing data, or even the possibility of losing data makes me lose sleep. So, yeh, I’m programmed that way. I’ll “waste” a drive to keep it all running smooth. I would rather not redo work already done. But, we are drifting from the topic at hand. Codinghorror, keep us informed of how things go.

OK! We have a new topic to discuss the issue at hand. Note that the process of splitting the topic creates a link between the two topics automagically…

2 Likes

It’ll be interesting to hear your experience running NAS. Knowing that a single late response could cost one $2k I’d be nervous unless I was seeing consistent farming numbers with plenty of margin.

Thanks for splitting this out. I really didn’t want to muddy your thread with that discussion. For anyone interested, I will continue to post updates here on how it is working for me. I keep track of my “plots were eligible for farming” response times regularly. So as my repository grows next week, We’ll see if it makes a difference.

Here’s what I think:

  • For one NAS you’re probably safe; it will work fine no matter how you set it up. Just don’t put a zillion plot files in one single :file_folder: folder, though. That makes many things go boom :boom:

  • Any amount of disk space spent on RAID level backup protection (RAID 5, RAID 6, RAID 10) protection is totally wasted on a Chia farm.

  • If you are committed to running multiple NAS’es, you definitely should run a harvester per NAS to minimize possible performance issues.

It’s unfortunate that the tooling isn’t really there for remote harvesters. The most you get from your farmer’s logs is something like this:

2021-05-11T22:15:13.507 farmer farmer_server              : INFO     <- new_signage_point from peer 83e33fea63207b5d4531c3fc2bf576544ba50a5dd4fe051d62bf97cb703cb52b 127.0.0.1
2021-05-11T22:15:13.510 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 127.0.0.1 46e081be69034072cc6c9ab590e749a1b0df3352a584a139b227ef9f8449a0ab
2021-05-11T22:15:13.517 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 100.108.246.59 3ddedc31396a99fb0673f091693533d7d42fe15118fbf41b86b09fcf78a58bea
2021-05-11T22:15:13.519 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 100.73.29.120 dd28f17778b80f0880a2182386fb1157f24975920e26ba76f0667d9aa74ff658
2021-05-11T22:15:13.520 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 100.72.209.109 1688e5fbdc806d4dbc23c938911ca62905726d607c799de021444936900a7584
2021-05-11T22:15:13.521 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 100.101.194.3 7d63dd1fc8960e1f09ed787af50214b8aee97cc643a6885d1866e7f5fc9cafd2
2021-05-11T22:15:13.522 farmer farmer_server              : INFO     -> new_signage_point_harvester to peer 100.85.62.105 fd11dacb8fa9d9fe086f3d7fdae6812ea1d2b0e51753e1de475b76290ac0bdc2
2021-05-11T22:15:13.525 farmer farmer_server              : INFO     <- farming_info from peer 46e081be69034072cc6c9ab590e749a1b0df3352a584a139b227ef9f8449a0ab 127.0.0.1
2021-05-11T22:15:13.549 farmer farmer_server              : INFO     <- farming_info from peer 1688e5fbdc806d4dbc23c938911ca62905726d607c799de021444936900a7584 100.72.209.109
2021-05-11T22:15:13.555 farmer farmer_server              : INFO     <- farming_info from peer dd28f17778b80f0880a2182386fb1157f24975920e26ba76f0667d9aa74ff658 100.73.29.120
2021-05-11T22:15:13.689 farmer farmer_server              : INFO     <- farming_info from peer 3ddedc31396a99fb0673f091693533d7d42fe15118fbf41b86b09fcf78a58bea 100.108.246.59
2021-05-11T22:15:13.707 farmer farmer_server              : INFO     <- farming_info from peer fd11dacb8fa9d9fe086f3d7fdae6812ea1d2b0e51753e1de475b76290ac0bdc2 100.85.62.105
2021-05-11T22:15:13.854 farmer farmer_server              : INFO     <- farming_info from peer 7d63dd1fc8960e1f09ed787af50214b8aee97cc643a6885d1866e7f5fc9cafd2 100.101.194.3

Notably, it contains no info about how many plots each harvester has, how many passed the filter, etc.
So there’s no easy way to get feedback that things are actually working.

For anyone interested, that’s a full node running on a Synology in my home. The harvesters are remote (like sometime on other continents). All of them participate in a wireguard network (thanks to tailscale, which is awesome), which is why you the 100.x.x.x addresses for the remote harvesters.

Edit: that said, remote harvesters harvesting local-to-them plots seem to work well. Here’s an excerpt from a harvester log. This harvester harvests for the farmer described above.

2021-05-11T22:20:57.155 harvester harvester_server        : INFO     <- new_signage_point_harvester from peer 138295fbdabcfb873ceecec06813ef4e883e251e00d53b0d779803caa6072f0c 100.69.238.62
2021-05-11T22:20:57.250 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming 4d49f34a89... Found 0 proofs. Time: 0.09429 s. Total 322 plots
2021-05-11T22:20:57.251 harvester harvester_server        : INFO     -> farming_info to peer 100.69.238.62 138295fbdabcfb873ceecec06813ef4e883e251e00d53b0d779803caa6072f0c
1 Like

If using a NAS, RAID 5 is a requirement in my opinion. If you are using a NAS then you are putting large amounts of plots in one spot. If a drive in the NAS dies, no wait… WHEN a drive in the NAS dies you are losing a huge amount of work. RAID 5 protects that work. When using a USB or something the amount of loss is limited to that smaller space.

I am going to start with putting all my plots in one folder. I know, you said that’s bad. But you also discovered that was bad in the middle of other major issues that you never resolved. So I would like to see the issue myself. I’ll know soon enough and if the issue appears, it’s an easy fix.

Are you assuming RAID 0 (vs RAID 5)? Why not JBOD? When one dies you’ve lost the plots on that one drive which you simply replace and plot.

1 Like

WHEN a drive in the NAS dies you are losing a huge amount of work. RAID 5 protects that work.

But that work is not really a “huge amount”. Yes, plotting to fill a replacement drive is non-trivial work, but it’s not that much. Compared to sacrificing a whole drive that you could be farming the whole time.

Over a one year timeframe, for 10x 1TB disks:

mode + failure earning (TB/days)
JBOD , 0 failure 3650
RAID5, 0 failure 3285
JBOD , 1 failed immediately 3285
RAID5, 1 failed immediately 3285
JBOD , 2 failed immediately 2920
RAID5, 2 failed immediately 0
JBOD , 3 failed immediately 2555
RAID5, 3 failed immediately 0
JBOD , 4 failed immediately 2190
RAID5, 4 failed immediately 0

edit: The usual strategy for business data RAID of having spare drives, is just wasted space for farming purposes.

4 Likes