Troubleshooting failure to farm with lots of plots, due to harvester 30s timeout

dchuk · April 25, 2021, 3:03am

I’ve been chatting with the guys who run chiadecentral, and they’re going to host a zoom call tomorrow night, we should discuss this with them in it!

I’ll send this thread their way

ryan · April 25, 2021, 3:21am

The total I/O on idle plots looks trivial to me. This is what I see with iostat -m 60 (60s sample). Disk sdi is idle with 54 K32s. Disk sdj has 36 K32s and is getting hit with chia plots check -n 5. It took just under 3m total to run the -n5 check on 36 plots.

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdi               0.30         0.01         0.00          0          0
sdj             121.33         1.68         0.00        100          0

Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
sdi               0.00         0.00         0.00          0          0
sdj             123.47         1.70         0.00        102          0

The way I figure that it’s 1.8MB of read per challenge, right? It would have to be a huge number of challenges within a couple seconds to overwhelm a NAS like that, wouldn’t it?

Long enough to cause a 30s delay though? That seems like a lot to me.

If the requests happen in parallel instead of sequentially it might help enough to get you under the 30s threshold, at least as a temporary thing. I have no idea how the queries happen on a single harvester though (sequential vs parallel), so I think the only way you’ll get a definitive answer to that is if you ask someone that knows what the file access per harvester looks like (IE: How does a harvester query the plots when it’s looking for proofs).

RAID0 won’t give you any protection against a disk failure. Does that NAS claim to have some other type of protection or is that a typo or…?

codinghorror · April 25, 2021, 3:44am

Well, you can set up separate volumes per drive. Then you’re protected in case of drive failure. JBOD presents everything as one big volume, as I understand it. The terra-master forums confirm this:

JBOD (abbreviated from “Just a Bunch Of Disks”/“Just a Bunch Of Drives”) is an architecture using multiple hard drives combined into one or more logical volumes using a volume manager like LVM or mdadm, or a device-spanning filesystem like btrfs; such volumes are usually called “spanned” or “linear | SPAN | BIG”. A spanned volume provides no redundancy, so failure of a single hard drive amounts to failure of the whole logical volume.

You can read my reply here which describes how to set up a ‘fail safe’ JBOD where each drive is individually addressable. You don’t get one large merged volume though!

ryan · April 25, 2021, 4:06am

They’re mixing a lot of terminology into a single run-on sentence there.

You did it exactly like I would have; 5 independent disks / volumes. That’s not RAID0 though. RAID0 is one big stripe where the disks would be combined similar to the way they describe a “spanned volume”. So if you lose one disk you lose the whole set.

The way you did it is way better and will be much easier to deal with while diagnosing than any kind of spanned volume or RAID set would be. Thank goodness!

codinghorror · April 25, 2021, 4:08am

I didn’t do it that way; I did striping because I wanted speed of ingestion for network launch!

It would be nice to have a magic button to convert raid 0 back into individual disks once “the need for speed” is over, though. Heh.

codinghorror · April 25, 2021, 4:10am

Check this out

2021-04-24T21:07:13.557 harvester chia.harvester.harvester: INFO 8 plots were eligible for farming 8aed06162a... Found 1 proofs. Time: 35.58643 s. Total 3960 plots

… 35 seconds, and I lost a coin.

This is unbelievably frustrating.

My theory is that it’s hitting multiple plots on multiple different NAS’es (I have 5) and they all have to finish the work within 30 seconds. I have plenty of proofs that happen in a few seconds, if it happens to pick plots that are all on the same NAS, it goes smoothly, but 5 files across 5 different NASes… that’s what I’m thinking.

vandy · April 25, 2021, 4:31am

The more I read of your experience here, the more I’m convinced these consumer NAS devices are toxic. The big boys running server grade jbod’s have no issues with 24 drives, and these small devices can’t even do what they claim with 4 drives. I picked up a used Buffalo Terrastation 3400D to play with and after 5 hours of experiments knew it was unusable. insane latency and transfer speeds over what was supposed to be a dual gigabit connection.

codinghorror · April 25, 2021, 4:42am

Remember, too, that things were fine until I added (x) NAS’es. One or two was fine. Well, eventually I’ll get everything on the supermicro JBOD at the datacenter…

ryan · April 25, 2021, 4:54am

That’s probably a reasonable guess, at least with a single harvester. You could check to make sure the NASes aren’t spinning the disk down for power saving mode if you haven’t already.

codinghorror · April 25, 2021, 5:37am

Definitely not; all NASes default to “no spin down” mode out of the box. These do too.

dan90266 · April 25, 2021, 5:53am

JBOD shows each disk in the device as a separate mount point / drive letter. RAID 0 is one stripe, which means if one of your disks goes, you lose everything. Super risky. I am 100% confident in this, FWIW.

farmerfm · April 25, 2021, 6:54am

When my times go above 5 seconds, I’ve tracked it to network contention. If one of my plotters upstairs was copying a file to the NAS downstairs, my times were spiking above 20-30-60s, when 2 or more were copying, significantly higher… when the traffic dies down, my times are back under 4s. I have stopped all upstairs plotters from copying over the network, will let them pile up plots on an external ssd and them carry them down to the nas usb port. Just an anecdote, I guess, I’m approaching only 300 plots. The times don’t spike when my 2 nuc plotters downstairs copy to the nas (all connected to the same alien router).

to add more context… i have an Amplifi Alien mesh system. Upstairs the mesh node has 1 ethernet port, which I have a 5 port switch on, and 4 devices are sharing that 1gbps port… but even a wifi-connected device in the same room was causing contention. I just ordered tonight another Amplifi Alien (not the mesh node) to add upstairs, which will give me 4 more ports and hopefully more bandwidth. I’ve never had a home network situation where latency is so important before.

@codinghorror what’s your network traffic like?

freeze · April 25, 2021, 7:35am

You could run the harvesters on the NAS boxes and likely get comparable performance to what people get with those “4 disks on Pi” setups, which by all accounts seems to be fine.

You can probably SSH in and if they happen to have Python 3.7+ installed, you’re all set. If they don’t, TerraMaster advertises both VirtualBox VM and Docker container support on these models. (There’s official Docker images for Chia).

Creeper · April 25, 2021, 12:29pm

I think it would help to separate the 5 NAS boxes and connect a maximum of 1 NAS per node, then see how that goes.

You can run one full node on each of your pc’s and just use a vpn if you worry about conflicts. I run 3 nodes (all 3 attached to same NAS, for redundancy) and I have no problems getting proofs back in under 1 sec. Only 1 node is port forwarded, the other 2 are behind a vpn and all 3 are syncing fine.

Also, wouldn’t raid 0 make things slower? It will need to fetch parts of every plot from every disk, for all of your plots all at the same time?

Maran · April 25, 2021, 4:49pm

I tried to read through all the replies but I might have missed some information so sorry if this has been asked before.

For some context: I run harvesters on content hosted in a different datacenter remotely and I have no issues. So using it on a NAS should be very possible but you must have a bottleneck somewhere.

First off: Does the NAS itself allow you to run software? If it does you could run a harvester on each NAS making the disks local. This should eliminate the issue if the bottleneck is the SMB connection.

On other thing that it could be, although it is not super likely, some (NAS) OSs come with energy saving mode that turn off power to the disks if they are unused. Spinning up disks can introduce latency, although not this much… usually.

codinghorror · April 25, 2021, 6:15pm

Yeah mostly I am abandoning all my NAS devices altogether… unfortunately it is difficult to pull 90tb of data off devices… luckily I can attach two 18tb drives via USB and run rsync to copy to them simultaneously, then I can also copy over the network at the same time, so that gives me 3 simultaneous methods of egress at 18tb drive copy speeds more or less, around 16 mins per plot.

In the meantime here’s what needs to happen @Maran

The GUI client needs to tell you if proofs are taking longer than 30 seconds since that is MAJOR ERROR STATE … you can literally never win a chia if that’s the case! Never ever! You could farm infinite chia plots and never win for the rest of history!
Log level for these over 30s harvester proof events should not be INFO, it should be WARN or (really IMO) ERROR
In the logs, when the harvester chooses certain plots for proofs, tell us which plots specifically so we can isolate which device / drive is particularly slow.

My theory is that as the number of NAS devices proliferated, it’d pick winning plots from (x) different devices and all those devices have variables attached to them around speed and performance… the more variables… the more danger.

I really wish there was a “smoking gun” of a really slow device, but as you can see from my above tests when running

chia plots check -n 100 -g {filename}

I didn’t see radically slow times from any particular NAS device:

NAS A → 1:57
NAS B → 1:37
NAS C → 2:21
NAS D → 2:13
NAS E → 1:16

(well, actually, C and D look slow here and they were the most full… but not so slow that it’s like OMG! Clearly This Is The Problem™)

Network drive shares on the AMD machine did much better, about 2x better:

K share → 0:53
L share → 0:53
M share → 0:55
N share → 1:00

All the above is coming from over the network, nothing is a local drive to the farmer, it’s all network attached drives in one way or another…

Maran · April 25, 2021, 6:43pm

I’m still unclear about one thing. I assumed you had one PC and mapped the NASs to this PC as network drive.

If you could harvest directly on the NAS itself you could bypass the network effect, wouldn’t this solve it?

codinghorror · April 25, 2021, 6:45pm

Yes that is 100% correct

I guess; but unfortunately I am not smart enough to be able to figure that out … I guess I can try?

In the meantime I am desperately exfiltrating data from the most full NAS’es, C and D via two 18tb USB connected drives and the USB copy app. It does about 16 mins per plot, 165 plots per drive, so 2640 minutes or 44 hours or about 2 days.

codinghorror · April 25, 2021, 6:55pm

This is another excellent point that I missed, thank you for highlighting it.

My primary farmer is connected via a 2.5gbps native uplink (it’s an Intel Phantom Canyon NUC), on these Zyxel switches, but I have been doing a fair bit of copying data around at 2.5gbps which could cause contention

All the more reason the GUI needs to be sounding a big ol’ siren if proof times are taking more than 30 seconds!

I guess… don’t copy things over the network that could saturate your network interface and interfere with harvester proof checks … this validates my “attach two 18tb drives directly to the NAS and exfiltrate that way” strategy!

Maran · April 25, 2021, 7:17pm

If you tell me a bit about the NAS I can give you some pointers. This wiki page explains the process.