Is your plotting hanging randomly on Linux? Then read this

dctech · November 30, 2021, 12:29am

Sorry for the length of this post but I hope this helps folks who experienced similar issue and I wanted to provide all the details and mention what I tried so far, so bear with me

A little background: When I started with Chia I was plotting on Win 10 and moved finished plots to external HDDs (NTFS formatted, did try exFat but that has a much higher CPU overhead on Linux) attached to my Ubuntu farmer/harvester. I did not see any problems plotting K32s in parallel on Windows with SWAR. Then I upgraded my RAM from 64G to 128G when MadMax came out and started spitting out multiple plots an hour which was awesome (kudos to dev)! I’ve noticed that using MadMax in Linux provided a significantly better performance (better than in WSL) so I setup a dual boot on my system (5950X) and started using Pop!_OS 21.04 (Ubuntu) distro for my daily use and continued to plot on MadMax in Linux.

So all the drives are filled now and my OCD kicked in seeing all this <100G unused space on each drive which I wanted to fill. I tried creating a single K34 in Linux (using same NVMe for temp 1 and 2 which was formatted with Ext4) to get an idea of performance before trying SWAR and I ran in to an issue as plotting appears to be hanging at random parts of stage 3 with 1 plotter thread pegged at 100% CPU. I knew it was not my NVMe or the RAM as I’ve been using these heavily to plot in Win and Linux till now and I even checked SMART stats on NVMe to make sure it was not dying on me yet with all this plotting to date and NVMes are at ~50% life left and no sign of any critical issues according to SMART data. I also have no issues with anything else running on this box under Linux or Windows so I started digging. I first tried tweaking number of buckets and buffer allocation (fyi I’m plotting in CLI not GUI) but that appeared to have not changed anything and plotting would still hang at stage 3 on random bucket. I know that I had enough ram (128G) as I was allocating 2x the maximum u_sort min I could find across all stages and I was not seeing any QS on buckets except the force QS on last buckets in a table which is normal, but still hanging, WTF!!! I then started monitoring cpu, ram, and IO utilization of Chia python threads with htop and iotop and I’ve noticed that SWAP was slowly growing during plotting. I added M_SWAP column to htop and left it plotting over night. Next day plotting appeared to have hang again so I looked at htop and to my surprised my SWAP partition (8G) was FULL and main chia plater thread had a LARGE SWAP mem usage!!! It looks like the plotter is just hanging because of SWAP space exhaustion even with 128G RAM and the fact that 1 concurrent plotter can easily fit in physical RAM. I tried turning off the SWAP and plotting again and it works every time without getting stuck! Plots check with -n 1000 passes on the completed K34 plots so they are all good.

So is this an issue with the plotter or OS/kernel/env config? I’m plotting using current Chia v1.2.11 but I do not have a wallet setup on this install as it is only used for plotting NFT plots. I tried switching vm.swappines to 0 (default = 10) but plotter main thread would still leak to SWAP men. A part from disabling SWAP completely I could not find any workaround to this issue, any suggestions on what else I can try to prevent Chia plotter from leaking to SWAP and hanging?

While watching RAM usage during plotting I’ve noticed that RAM is almost completely used non-stop but majority of RAM is used by buffers & cache which I thought kernel should release automatically when more active memory is needed by a thread/process which maybe is not happening when SWAP mem is enabled? or maybe this is something specific to Python or Chia plotter?

Jacek · November 30, 2021, 1:38am

It looks like the plotter has memory leak(s) and it is blowing the swap space up. You should report it on chia’s github / Issues page, to have devs look at it.

Assuming that this is the case, disabling swap is just making the problem to happen faster.

You may want to check how much memory plotter is using during that phase, as most likely only that phase is broken. Once you see your RAM gone, that would be the time to start looking at swap space to see it grow.

xkredr59 · November 30, 2021, 5:53am

Bullet #1 in the release notes on Ubuntu 21.04: Better anonymous memory management to reduce swapping

Maybe it’s an OS thing, memory management is or should be at last.
I’m on Ubuntu 20.04 LTS and did a k.34 yesterday to get an idea about the time it takes on my system.
No problems (but the 5,5 hours…). Didn’t look at it while running.
I’ll try another one and check resources used once in a while, specifically phase 3.
Only running the chia client plotters cmd for the k.34, a chia harvester and system monitor.
Intel i7 8C/16T with 128GB RAM and 2TB NVME for temp space.

chia plotters madmax -k 34 -n 1 -r 12 ..... etc

seymour.krelborn · November 30, 2021, 6:00am

I am creating plots via the command line, in Windows 10.

When I first attempted to create a K34 plot, the job crapped out relatively early.
I do not recall the error message. But it was memory related. Perhaps a buffer message?

I ran:
chia plots create --help
and discovered a “-b” option, which allows you to change the size of the “slot/plot buffer” (whatever that is). But I had a suspicion that that might be a factor.

The default “-b” value is “3389”, which I was using (because I never set it before, and that is the default), worked fine for K32 plots.

Since K34 plots are roughly 4X the size of K32 plots, I figured that maybe Chia would need 4X the “-b” value (4X the buffer value).

I added:
-b 13556 to the command line, and the plotting works every time.

chia plots check -g blahblahblah.plot always shows a valid K34 plot.

My conclusion is that although my plotter (my hardware) had enough memory (64 GB), Chia declined to use it, until I included the higher “-b” value.

Although my fix worked, I have no idea if I picked the best value for “-b”?

Perhaps this is related to @dctech Linux version of creating a K34 plot?
Perhaps if he used a higher -b value, then maybe his chia app would have taken advantage of his available memory? …and not gone to his swap file, when his memory (kind of) ran out?

xkredr59 · November 30, 2021, 8:10am

So you used the ‘old’ chia plots create.
This uses the single threaded phase 1 plotting, taking way longer than madmax delivered on his multi-threaded approach. I used it once also unknowing of the v1.2.11 alternative and thought i’d never use it again because it took soooo long for k.33 and k.34

From the chia github:
There is a new chia command for creating plots called plotters. For compatibility, the original command for creating plots chia plots create remains in place, however, this will always use the reference chiapos plotter. In order to use the other plotters, you must use the new chia plotters command. Command line options differ with each plotter, so be sure to check the available options using chia plotters <plotter> -h. Available plotter values include “chiapos”, “bladebit”, and “madmax”.

See chia plotters madmax -h if you’re interested.

A bit typical of Chia not to mention this alternative command explicitly in the v1.2.11 release notes but kind of hide it in -More plotting info here.-
Must admit I missed it at first.
Well, rtfrnt I guess… read the f#cking release notes twice

seymour.krelborn · November 30, 2021, 8:21am

I know that other plotters are much faster.
However, a lot of what you quoted from github goes over my head, and it is the main reason I stick with only the original Chia app.

Also:

What I have is mostly automated. It works. I am afraid to tinker with other software.
I am at the end of my budget with hard drive purchases, and their impact on my electric bill. So after filling up a few more drives that I have, my plotting days will be behind me. I will be performing only farming/harvesting.

If I were just starting out, I would probably go for the much faster app plotting options. But at this point, based on what I do understand, and what I do not understand, I feel it is best to leave well enough along.

xkredr59 · November 30, 2021, 8:24am

Which plotter (or plot command) do you use for creating the k.34?

xkredr59 · November 30, 2021, 8:34am

I am where you are I guess, invested more than enough for this hobby (but happy for it) and not doing any more serious plotting.
But just as a hobby I like to stay up to date and enjoy this forum and all to learn from it.

For others however, the new chia plotters command is the original chia app, not additional software.
Just an extra option to use the integrated madmax and bladbit plotters developed by their great developers. So in your automated workflow you’d just need to adjust chia plots create ... with chia plotters madmax ... (even the parameters are almost the same as you can find with the -h option for both command.
Happy farming

seymour.krelborn · November 30, 2021, 8:55am

I have been through every command and their options, applying the “–help” argument, to see how the item in question works. I have never seen any help output mention a “madmax” argument.

So to use madmax, I have nothing new to download; nothing new to install?
Just change the command line?

When I ran:
chia plotters --help
I got a usage error. chia.exe did not accept the “plotters” argument.

Please advise.

xkredr59 · November 30, 2021, 9:02am

Which version do you run, the plotters command was introduced only in v1.2.11.

seymour.krelborn · November 30, 2021, 9:26am

I am running version 1.2.9.

I read that the newer versions were rushed out, and not clean.
Also, there is the mandated password “feature”, that I do not want to deal with.

But thanks for clearing up my madmax question.
I always assumed it was some 3rd party add-on or replacement for Chia. Knowing that it “is” Chia, I am comfortable with using it, if I upgrade.

I just can’t deal with anything potentially going wrong with what I have working. Together with not having too many more plots to create, I will be holding off on making any changes.

Thank you.

dctech · November 30, 2021, 9:49am

I’m using the classic “chia plots create”, but also tried “chia plotters chiapos” (available in 1.2.11) and same results with the SWAP mem leak. I have not tried the new build-in MadMax (available in 1.2.11) as I’ve always ran a standalone build of MadMax on this PC with Pop!_OS/Ubuntu 21.04 without any problems. As for the buffers (-b) I’ve allocated way more than should be needed (46000MiB) when using 64 buckets and I did this because I’ve noticed that some steps allocate only half of that so wanted to be on the safe side. Buffer allocation does not appear to be a problem as I do get an out of mem error if I set this too low from my testing. With the above buffers+buckets and SWAP turned off the plotting of a K34 completes without a problem. I should mention that I assign 6 threads (-r) to the plotter which appear to be only used when writing to the tmp storage while reading is using only 1 regardless of how many threads you assign. You can confirm this by monitoring the plotter threads in htop by adding the IO_READ_RATE & IO_WRITE_RATE columns. You may be correct and this may be unique to Ubuntu 21.04 “better” mem management, usually I try to stick with LTS release but because I wanted to use this machine for gaming as well I went with the cutting edge version. I will try reverting Chia to previous version to see if I hit the same problem as I did come across at least one other forum thread mentioning some plotting issues in 1.2.11.

If anyone is interested in manually tweaking buffers & buckets during plotting I found this very informative article a while back which explains everything very well.

dctech · November 30, 2021, 10:01am

You are correct, I feel the same way and 1.2.10/11 feels slightly buggy to me. For example I’m still not able to get Chia CLI to read the new passphrase from a file so I had to disable this new functionality on my farmer/harvester which I created a service for in Ubuntu so that it auto starts on boot. Was going to start a new thread for this but did not get around to it and it was not on top of my priority list as my farmer has other protections in place and farms everything to a cold wallet anyway

dctech · November 30, 2021, 10:30am

The additional plotters like Madmax available in 1.2.11 are 3rd party and were included due to popular demand with permission from their developers. You do not have to use them but they are now available in Chia as a lot of ppl used them anyway.

seymour.krelborn · November 30, 2021, 10:35am

Thanks for the clarification.

I trust that, since the developers include the 3rd party Madmax code with the general installation, the developers reviewed the code to ensure compatibility and the absence of “mnemonics” phoning home.

xkredr59 · November 30, 2021, 10:55am

As promised I plotted another k.34 and observed memory/swap usage during the phases, here are my findings.

For the record, Ubuntu 20.04 LTS, i7 8c/16t cpu, 128GB RAM, 2TB NVME as temp storage. Only running chia harvester and system monitor ‘steady state’ memory usage is appr. 2GiB, swap is 980MiB. I have chia version 1.2.11 installed.

Starting chia plotters madmax -k 34 -n 1 -r 12 -t -d -c -f with appropriate parameters memory usage is around 20GiB for phases 1 and 2, swap around 1.2MiB.
In phase 3 console output gives 2 subphases for the 7 tables to be processed.
During subphase 1 memory is around 20GiB, climbing to around 40GiB during subphase 2 for all tables (i really only observed tables 4 to 7 but assume similar behaviour for 1 to 3…). Swap doesn’t move a muscle here.
So in phase 3 it’s memory usage/release/usage/release/… (7 times)
If the memory release is not properly working (OS driven I think) swap usage could be explained and causing hanging further processing?!?

Maybe not so easy given your dual boot config, but I’m curious whether switching to Ubuntu 20.04 (or PoP!OS 20.04 if you want the nicer desktop) will solve the issue for you.

Now for some other mystery, my first k.34 plot on the same machine, same cli command took some 5.5 hours. This second one took a little under 4!!! Faster times for all phases.
Did they implement machine learning/AI also?

dctech · November 30, 2021, 11:44am

Hmm my tests were done using old fashion slow plotter and not Madmax as I though it needed much bigger RAM temp storage than the 110G I’ve been using with K32 so will have to give this a try. When using stand alone Madmax during K32 plotting I did see some little SWAP utilization but assumed it was expected due to other things I had running like Firefox. The SWAP just blew up when I tried plotting K34 with the legacy plotter.

What are you using for tmp1 & 2 when plotting K34 with Madmax? Is 110G RAM drive enough and it’s just using more of tmp on NVME?

xkredr59 · November 30, 2021, 12:21pm

No, madmax for K.33 and K.34 requires much more than 110GB, so both temp and temp2 on my NVME.
If you ommit the -2 parameter temp2 defaults to the location given by -t, at least the stand alone madmax does.
To be sure I use

chia plotters madmax -k 34 -n 1 -r 12 -t /mnt/chiaplots/ -2 /mnt/chiaplots/ -d /mnt/chiaplots/temp1/ -c xch1cxwykg5c7rqaxh2ne604r2hv93xxxxxxxxxxxxxxxxxxxxxxx -f b4db259690463b0f7849b039cbe0c9e3fbcdb8b9a32bb4e99703ac3581615082ayyyyyyyyyyyyyyyyyyyyy
Also my destination directory was on the NVME for this test.

Suppose you could use the new plotters madmax first on your current linux install. If this works out ok could you come back with plot times, just curious how my i7 is doing against your 5950x;-)

O yes, a single K.34 adds some 11TB to the TBW’s on your temp NVME or SSD!

dctech · November 30, 2021, 12:31pm

That’s good news, I did not think of trying the new madmax K33/34 plotting with NVME tmp as I just assumed it needed a RAM drive. Will definitely give this a try once my legacy plotter is done with current K34 plot which took only 100286.848 seconds last time around (using -r 6 -u 64 -b 57000). Will be interesting to see what I can get for K34 as standalone madmax would take ~21min for K32 on this system.

xkredr59 · November 30, 2021, 12:42pm

24.685,7143 seconds I would guess, based on my 36 minutes k.32 and close to 4 hours for k.34.
Give or take a few But lots and lots faster than 100286.848 for sure! Those AMD cores are gonna want some air…