Slow plot times with Dual Xeon 2699 (72 threads) + 256gb Ram + 8TB NVME (Windows + madMax)

The ubuntu vs windows is at least true in my experience for dual xeon systems.
Normal desktops not so much.

My 3900x plotter had basically the same MM plotting time on windows vs ubuntu.
dual Xeon 2680v2 system; the difference was 45 minutes vs 26 minutes.

At first I believed it was due to the ramdisk not being any good on Windows, but after seeing some comments here on the forum I found that this was operating system that caused this. Maybe you can tweak windows so it behaves better, I don’t know. What I do know (as a complete linux newbie) is that installing Ubuntu is very easy.
Few learning points along the way about mounting drives and getting madmax running but all in all fairly easy.

1 Like

Thanks! I have HP Z420 E5 2680 V2 128GB DDR3. I use a whatever brand NVMe as -t. -2 Ramdisk 110GB. 48 minute a plot under Win10. Yield speed is about the same of yours for 1:1 regard.

R720xd Dual E5 2660 V2 360GB Ramdisk Win 10. It does 65 minute a plot I use -r 40. I just started trying it so not much data to share.

So, you used -r 24 for one MM on your T7610? How to assign 24 to one CPU?

So, your box has 72 and you set 30. Can you see whether the CPU is fully utilized? Thanks!

I don’t have the machine anymore as I have since upgraded. But when I did check it before, it did show the CPU was mostly utilized. Even on the MM GitHub he states

“Make sure to crank up if you have plenty of cores, the default is 4. Depending on the phase more threads will be launched, the setting is just a multiplier.”

If I set the threads to high it would either be slow or the process would crash. I hope that helps!

For me it usually worked out best to use a little above cpu core count for -r
so 20 cores, 40 thread cpu, used 22-26 for -r
Going much above that either doesn’t help or makes it worse

Actually, I used -r 12 -K 2. My understanding (possibly wrong) is that when you specify -r 24, it just grabs 24 threads, what boils down to 24 physical cores. When you specify -r12 -K 2, it goes after 12 physical cores, and is more aware about hyperthreading. Again, this may be my wishful thinking only, though.

To explicitly pin it to a given CPU, NUMA needs to be used. Otherwise, you trust OS to do the best, what may not be what we want to see.

Are you trying to say that games are not compute intensive, and not that well optimized? And you are trying to support that notion with an example of 2,000+ (Linux) servers that mostly run idle while serving from time to time some WordPress pages? Sure, I also only run headless CentOS servers, but computing performance is the last item on my checklist (if at all on that list) to decide what OS to run.

I think the problem is rather opposite. Games potentially have the tightest code for a given platform, and where needed, the code is assembly and hand optimized. There is a reason that Linux is not that successful as a gaming platform, and it is not only that it is less used by gaming community.

On the other hand, MM plotter is a general code that is just re-wrapped to be used on a Win platform with the hope that it will somehow run, and this is most likely the main reason that it doesn’t work that well on Win side.

Maybe you are right, I am not a programmer and don’t know much about crossplatform code migration. But facts are as they are - about 100% boost that I measured couple of times on different hardwares. I was a big Windows fan for 25 years of my life until April last year when I went all in Chia, taking a loan to buy 24-core Threadripper build that occured working horrific under Windows10 and 11. There is no week without random BSOD, random disk removal or ton of other things that gave me tousands of $$$ lose when trying to farm Chia & Forks. I am still transitioning to Linux but I bet all this drivers, stability, compatibility, strange OS issues will be no existent after I transition to Debian-based or Arch Linux distro and find alternatives for my favourite programs.

1 Like

Hey, no worries. I also couldn’t get MM to run on my dual processor box when running Win, so I switched to Ubuntu right away. It is also possible that your Threadripper is pushing the limits on the H/W side, so subtle compatibility issues with your motherboard or RAM could be at play. BSODs are kernel related and that is where the code meets the H/W.

I also think that we are conflating two separate problems. The first is how well the code is optimized for a given platform (as we can see, running Linux MM gives ~10% faster plots than running it on Win). The second problem manifests itself when the box has more than one processor, and for whatever reason MM code just breaks down on Win. If I remember right, people reported running BB fine on Win boxes with multiple processors, so this problem is most likely MM related. Most likely, this problem is exactly with those cross-platform libs that MM is using, not really with MM code written by Max (some thread wrappers giving long timeouts).

Although, saying that, we could try to “partition” MM code into two components. The first would be the code crunching part, and this most likely runs with the same speed on all platforms. The second is thread handling part, and potentially this is where all those differences are coming from. Not sure whether this is the case, but that would be my bet. Trying to optimize the second part for Win may not be a trivial task, though.

What I am trying to say is that we should not generalize based on just one program, especially as we know that MM development is purely under Linux, where Stotik is just trying to wrap it up in a Win binary without doing much if any code changes. Sure, Max is using general purpose cross platform libs to make it run virtually everywhere, but that is not the same as hand optimizing the code for a given platform. Some or maybe most of those libraries are providing the lowest common denominator optimization and are usually tilted in the way the original developer was more proficient.

Actually, the driver issue you brought up is a good example. Usually, under Linux, there is a “native” driver (e.g., for Ethernet or USB chips), and as long as the H/W is not really butchered badly, all works fine. On the other hand, on Win side, most of the drivers are derived from a Win sample code that is just a sample code. What it means that most of the low-cost parts run rather deficient drivers. IMO, this is where Linux really has less issues. Although, if there is no good match for a given H/W, most likely such H/W is just not usable under Linux (where virtually every H/W has some sort of Win drivers).

Are you using the storage with RAID configuration for plotting?

If so, I would recommend you try a creating few plots on a NVMe as “stand alone” drive attached to the Mother board directly. The RAID in a plotting process will be adding an overhead given the very high I/O

Good luck

Pete

I have those ducts, and as far as I can tell using Sensors in Linux Mint my ram temperatures are quite low - it gives four lines of temperatures, each prefixed SODIMM so I’m presuming that is my ram temperature, although I have 16 sticks.

Now its been running for a while I have 24,40,47, and 27c

1 Like

As you mentioned in the other thread, your box takes DDR4 that runs cooler. I have 4 sticks per CPU (as I have only 256 GB).

With this 140mm fan blowing over my RAM, I have (2x SODIMM):

  1. 56, 31, 68
  2. 72, 33, 75

With that big fan, the case fans are running really low, so basically there is no air coming from them. When I didn’t have that fan, temps were pushing 90C, and case fans were at full speed. Still, not much air was coming out of those air duct connectors. I put some test air ducts there, but temps were not really dropping.

I am not sure what one sensor line represents, though. Although, it looks like one sensor line represents sticks on one side of both processors (two rows going across the mobo).

By the way, I use PSensor program. It should run on your box as well. I would say it is worth to check out - Psensor - A Graphical Hardware Temperature Monitoring Tool for Linux

1 Like

I’m pretty sure I tried Psensor, but it kept asking for the password every few minutes.

That would be a real bummer. It doesn’t ask at all on my box. Not sure what difference it could be.

Just installed and it seems to be working fine thanks, may have confused it with something else I tried for monitoring.

1 Like

Hello bros. I am still experimenting with dual Xeon / ubuntu.
HP Z620 Dual 2660v2 128gb RAM.
Mad Max can do 35 minute a plot the fastest.
The plotting time is not sustainable. Some time (most of the time in fact), P1 table 3 and table 4 take long time. It adds 10-20 minute to total plotting time.
Any suggestion?

What RAMs do you have and how those DIMMs are connected?

Install psensor to get RAM/CPU/NVMe temps, provide a screenshot. Extend monitoring duration to 40 mins, so you will see the whole plot in the graph area.

Most likely, you will need water cooling (120mm AIO per CPU will do it, as long as it fits LGA 2011 socket). You will also most likely need to get some extra air over your RAM.

MM may be running better as 2 instances in parallel, each NUMA bind to a given CPU and connected to it RAM.

What brand / model is your NVMe?

Screenshot from your Resource Monitor of CPU utilization would be helpful. Change sampling rate to like 5 sec to get longer duration.

Provide MM command line you are currently using.

Z620 has 12 DDR3 slots. I use 8 for 128GB

Memory is cooled with fan. It is a pretty well designed computer.

CPU is 95W each. Heat sink is not that hot when running. I have 135W 2696v2 running on Z420. That is hot. But the Z420 holds well.

I just could not make two MM work in parallel. Have not figured out the NUMA thing.
Z620 is a crippled dual Xeon. 12 slots take 196GB ram most. Don’t know how ram is assigned to CPU.
128GB to CPU1 and 64GB to CPU2?

NVMe is Inland performance brand. It works well on single CPU system.

So, you think it is memory problem? I will add more to 196GB.

Do you have a link to manual for it that would also have mb layout / architecture?

There is a reason to have 12 slots, but I am not really sure what. Those CPUs are 4 channel RAM, so should have 4 or 8 DIMMs per CPU. I don’t know what 6 DIMMs means, so that could be crippling. Still, for MM (or any other plotter), you want to have all DIMMs populated.

Also, based on what I found is that it takes RAM up to 1,600 MHz, but doesn’t say in that doc what type (e.g., LRDIMM, …). Although, that page has max at 96 GB RAM what is also odd.

That 95W is just a paper number. Once you push that CPU, my take is that it will easily draw 150W if not more (given the heatsink will remove that heat to prevent temp throttling). If you will not vent out that heat to the side, it will bring RAM, and everything else around by few deg up, so throttling may happen on any component.

The fact that the box runs cool implies that it is not really pushed much, or it is throttling a lot.

My suggestion would be to try with just 1 CPU, but most likely it will just see half of your RAM, so that would be SOL.

To run 2 MM instances, you will need to have at least 256 GB RAM (2x 110-115 GB per MM instance). So, you are SOL here as well.

I would really clean that box nicely, sell it and purchase another similar box that takes 256/512 GB RAM and the same gen CPU (e.g., that z420, or Dell t7610). If you step up to v4, you can get potentially big speed improvement, but at much higher cost, so may not be worth it.

A couple of options to try would be to try BB Disk (in my case, it is much slower than MM, at least on dual v2 CPUs running Ubuntu). Another to use Win, and something like Primo Cache to front t2 folders. In both cases, upgrading your RAM to max would be helpful (again, I couldn’t make MM work in Win on my box, as it was 4-6x slower with 2 CPUs than with just one installed). Although, that money potentially can be better spent on that other box.

2Rx4 ECC registered DDR3 12800 16GB module will do Z620.

12 slot will do 196GB. I will add more memory. Thanks!

Will find Z820 or Dell T7610 or Lenovo D30 as the next project.