I got my dell r720 running on redhat (make sure to use the supported OSs with the driver packs or you will face major issues, and the biggest issue was getting glibc in the right version. It is version locked by rhel OS version, unless you install the developer tools packs, I recommend doing this and then you need to update to cmake3 and remove cmake so that the proper cmake3 is called and not cmake. After all this madmax will compile).
I played around with numactl a bit and thought by pinning the plotting instance to cpu0 or cpu1 would help with reduced IO between cpus and memory of said cpus, however pinning the cpus and memory does not help the plotting time. As well if i tell numactl with -l to use the local cpus memory of whichever cpu has that thread running, worse results still.
Right now with 2 instances running using numactl --all I can get 4.5 hour plot times on my r720 (20 threads as 2 cpu).
I should also note that while the dell r720 only lists 768gb as max ram you can actually go upto 1.5TB with the latest bios revision. Cloud ninja’s sell a 1.5TB kit.
Overall I would not recommend the r720 as I get the same times on my Z440 using windows and no ramdisk, and it is a lot cheaper than the r720.
Wondering if anybody knows of the best numactl configs or ways to decrease plot times with my setup. I am debating trying with all nvme and not using the ramdisk to see if that helps.
You have not provided enough info about your r720 setup (CPUs, RAM, NVMes). So, it is hard to suggest anything.
I have Dell Precision t7610 with two E5-2695 v2 CPUs, 256 GB RAM running on Ubuntu (basically, the same H/W as you have), and when I run two MM instances, I got 40 mins / instance (effective 20 mins per box). I didn’t plot in a long time, and actually didn’t use that plotter that much. Saying that, I didn’t try to use numactl. However, if you search this forum for numact, you will find posts where people reported improved times using numactl. Still, some of those people increased RAM to 512, and were getting even better plot times with a single BB instance (I think BB may be doing numactl in the code).
Also, in my case, I had to water-cool my CPUs, as otherwise, the box was temp-throttling, plus just too noisy.
Also, I would rather go with Rocky Linux (kind of a new version of CentOS - same people, pissed at RedHat for basically killing CentOS).
2x e5-2690 v2, 1.5TB DDR3, 2 2TB NVME, NVME for temp1 and ramdisk for temp2.
Are you doing k33 or k34? I am doing k34.
I would love to use a different OS but any other installs, debian, centos, etc fails due to missing drivers pack for the OS. I wish Dell wouldn’t do that…
Sorry, my bad. I only created k32, and those other threads were most likely also about k32.
Saying that, if the progress is linear (maybe it should be, if enough RAM / NVMe is there), then my single k32 would translate to 160 mins per k34. Not sure, whether that difference can be attributed just to differences in our CPUs (not that much, but still). I also have DDR3 1886 RAM, but just speed doesn’t tell much with respect to RAM, so maybe this is irrelevant.
Also, with respect to your NVMes, did you check your mobo architecture, and use PCI slots that are directly connected to those two CPUs (each NVMe on a different CPU)?
By the way, I am a current / former CentOS person, but making a switch to Rocky Linux (as CentOS 8 is dead). Although, on my dell I have Ubuntu (latest, not stable), and I didn’t have any issues with it. Although, some people that were trying to use stable version were reporting problems (as you pointed out).
I guess, if you want to get the most out of your box, I would just drop the k-value, and ride with the one that is the fastest / biggest (per TB). Even assuming that you will do k32, Chia needs to prove that it will not go belly up before those k32 plots will need to be replotted.
Actually, I just had an exchange with Max (behind MM) on his discord about advantages of going with bigger k-value plots, and most likely the only time that comes into play may be when plots are sitting on a slow NAS, or a long daisy-chained JBOD. From what he stated, problems are visible, if using k26 plots on 20 TB drives, what translates to roughly k32 plots sitting on 1.3 PB (peta byte) drives.
What size ramdisk and temp 1 are you guys using for the k34’s? What’s the bare minimum?
I am using a 2TB nvme for tmp1, for tmp2 I have a 1TB ramdisk for 2 plots at once, more than is needed, for example on k32 you only need 110G ramdisk, on k34 you would need around 475GB? Plus it takes a decent bit more just for the chia plotting process. 512GB should be enough for plotting a single plot at once.
Tried tuning my k34 plot times further by changing -u and -v buckets from 512 to 256 and 128 and a mix of those. I also tried 1024. 512 seems to be the sweet spot.
I also saw some recommendations to use 75% of threads instead of core count (50% of threads) or even 100% of threads and those results were terrible, the sweet spot is definitely core count and if you have 2 threads per core using -K 2.
I tried setting numactl for specifying physical cpu and memory binding to the cpu, and making a ramdisk just for that cpu. That did not work as intended. Using numactl --all and using the cores from both processors is way more effective for me at least and in numastats I have very few misses.
I also tried to mess with using all nvme for tmp1 and my other nvme for tmp2 (alternating the two active plotting instances) but even with only ddr3 ecc ram, ramdisk in linux was faster than nvme. I should note the r720 is on pcie 3.0 not 4.0.
I have a 5950x on the way to try plotting with a non-server cpu. That of course will use all nvme. I looked at the new 5945wx and 5955wx and for the price not worth it, the 5950x is in fact faster than the 5945wx both in single-core and multi-core testing; plus I would need a new motherboard etc and I have a PC with an older 3900x and thats the only component I need to upgrade.
I also tried playing with bucket today and both 512 and 1024 were both 31-34 minutes each to make a plot. trying on another machine with 512 also.
TR Pro 3955WX Win10P
- I’ve found 2x MM each 32 threads works well - they soak up each others wasted CPU cycles.
- Agreed, setting core affinity works, but then wastes CPU cycles. See#1
- 2 fast PCIE 4.0 nvmes work best of anything tried, even RAMDisk
- The big diff is multiple 16bit PCI-4.0 lanes, super nice. Most Ryzen MBs limit PCIE 4.0 nvmes and avail. slots.
the 5950x with same tmp1 and tmp2 as nvme takes 4.5+ hours on 1 plot.
So is the hurdle the number of cores/threads and not so much the actual CPU? I have very different architectures and generations of CPUs and it seems like what helps the most is thread count.
I have a 2xe5-2690v2 with tmp2 ddr3 ramdisk taking same time on two plots at once, and e5-2699v3 (single cpu) taking same time on two plots (all 3 computers within 20-30m of each other) with alternating nvme for tmp1 and tmp2.
5950x is 16T @4.5GHz
2xE5-2690v2 is 20T @3GHz
E5-2699v3 is 18T @2.3GHz
Both the number of cores and the thread count matter.
On my 5950x, I am plotting two k34 plots, simultaneously, via madmax, and they each take approximately 6 hours. So in 6 hours, I have two new k34 plots.
Note that I have two 2-TB NVMe drives. My times would be much slower if both plots were grinding on only one NVMe drive.
It has been a long time since I plotted a single k34 plot. But I believe that it took approximately 3½ hours. So creating two plots, one at a time, would take 7 hours. Whereas, creating two plots, simultaneously, takes 6 hours.
If my memory is correct about a single k34 plot taking 3½ hours, then your times can probably be improved.
I use the “-r” option, and set its value at 14. I could probably use 15, but never tried.
The “-r” option tells madmax how many threads to use. The 5950x has 32 threads, so I leave 4 of them free for whatever else the computer might need them for.
If you are not using the -r option, then you are using 4 threads (the default for madmax).
That would probably result in slower plotting times. I never tested it, but it stands to reason that less threads means less processing.
Also, my memory is running at 3200 MHz. The default is (I believe) 2600 MHz.
I do not know if that makes a significant difference. Probably helps a bit. But the thread usage is probably significant.
Have you checked taskmgr.exe?
If your CPU usage is not 100% for most of phase 1 and phase 3, then increasing your -r value might help.
Also, if your CPU is waiting on your NVMe drive, then there is not much you can do to pin your CPU.
What NVMe drive are you using?
Just tried this, 512GB of ram, ramdisk set at 475G on Linux Mint, tried a single instance K34 plot
MM exited at after P1 Table 3 completed, it just said Killed
Memory is 88.8%, with cache 445.2 GiB, ramdisk folder shows 32GB free space.
Any idea why, I’m presuming it run out of memory?
Just got my K34 plot time down to under 2 hours for a single plot.
T7910 dual E5-2699v3, 512GB ram, DC P3700 1.6TB drive
Phase 1 took 3067.28 sec
Phase 2 took 1263.53 sec
Phase 3 took 2493.03 sec, wrote 87499248595 entries to final plot
Phase 4 took 298.525 sec, final plot size is 461498668316 bytes
Total plot creation time was 7122.59 sec (118.71 min)
I have been using madmax to create K34 plots from my laptop for several months, via MX Linux (Debian based).
I use the “-n -1” option, so the laptop creates the plots until I cancel the job.
Over the months, I have gotten that unexpected “Killed” message, perhaps twice.
I do not have an answer for you as to why you and I got that message. But you are not alone.
Also, other than that rare “Killed” message, my laptop is running madmax K34 plotting for weeks at a time, with no hiccups.
Like mine, your “Killed” message might also be a rarity. If no one chimes in with the cause of the message and the resulting exiting of the plotting job, then I suggest you start, again, and see if it happens again. If your experience ends up like mine, then it will be a long time before it happens again.
Thanks for the reply, if I’m about tomorrow and the sun is shinning (free electric) I’ll give it another go - currently no need to plot so just experimenting as the electric was free.
I only had a single 2tb nvme at the time, cpu connected. I got another nvme connected to pcie which is also connected to the cpu directly and my times did improve and now i can do two at once with similar times to yours.
It just surprises me that these much older xeon chips I’m using have better performance, when in cpu comparison charts they do way worse than the 5950x.
I will say to close this out that if you have an nvme in a chipset socket you will have a bad result, at least with amd. I’m on an x570.
Finally it doesn’t seem to matter what generation xeon, just the core count and frequency mostly. At least with my z440 (18 core and ddr4) doing roughly the same as a dual socket r720 (10 core each and ddr3).
Just tried it again, same result but this time after P1 table 2 completed, just says Killed.
Did another run, but this time monitored free memory, and it hits 0% then goes back up, then hits 0% free again, yep, definitely caused by running out of memory, might have to decrease the size of the ram drive a bit. But that’s for another day - going to go out and enjoy the nice weather.
Anyone know the absolute minimum space for t2 for a k34?
Lowered the Ramdisk to 450GB, still ran out of memory, but the ramdisk had 475GB of files in it, stating 8GB free which seems odd. Seems that 512GB of ram is not enough to do k34 t2 on a ramdisk.
If you have two NVMe drives, then direct temp1 to NVMe drive #1, and direct temp2 to NVMe drive #2, and monitor the disk usage.
One plot will be enough to let you know what size RAM disk you will need for either of the two temp locations.