Madmax optimization/bottlenecks

Here’s my log, would like to find out my sweetspot.
Cpu tops at about 75% according to sysmon

Multi-threaded pipelined Chia k32 plotter - b18f5c7
Final Directory: /mnt/TEMP/
Number of Plots: infinite
Crafting plot 1 out of -1
Process ID: 4714
Number of Threads: 64
Number of Buckets P1: 2^8 (256)
Number of Buckets P3+P4: 2^8 (256)
Pool Public Key:
Farmer Public Key:
Working Directory: /mnt/220G/
Working Directory 2: /mnt/110G/
Plot Name: plot-k32-2021-06-30-19-50-xxxx
[P1] Table 1 took 11.5813 sec
[P1] Table 2 took 129.519 sec, found 4294904454 matches
[P1] Table 3 took 138.067 sec, found 4294810808 matches
[P1] Table 4 took 144.499 sec, found 4294674133 matches
[P1] Table 5 took 142.531 sec, found 4294419885 matches
[P1] Table 6 took 141.886 sec, found 4293853818 matches
[P1] Table 7 took 130.787 sec, found 4292771111 matches
Phase 1 took 838.913 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 8.36622 sec
[P2] Table 7 rewrite took 56.0519 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 32.9999 sec
[P2] Table 6 rewrite took 31.3148 sec, dropped 581403344 entries (13.5404 %)
[P2] Table 5 scan took 23.5202 sec
[P2] Table 5 rewrite took 29.1175 sec, dropped 762123015 entries (17.7468 %)
[P2] Table 4 scan took 19.9207 sec
[P2] Table 4 rewrite took 27.8411 sec, dropped 828975432 entries (19.3024 %)
[P2] Table 3 scan took 20.3378 sec
[P2] Table 3 rewrite took 27.2522 sec, dropped 855126389 entries (19.9107 %)
[P2] Table 2 scan took 19.1003 sec
[P2] Table 2 rewrite took 27.0511 sec, dropped 865587139 entries (20.1538 %)
Phase 2 took 356.752 sec
Wrote plot header with 268 bytes
[P3-1] Table 2 took 145.016 sec, wrote 3429317315 right entries
[P3-2] Table 2 took 28.9748 sec, wrote 3429317315 left entries, 3429317315 final
[P3-1] Table 3 took 117.08 sec, wrote 3439684419 right entries
[P3-2] Table 3 took 31.2064 sec, wrote 3439684419 left entries, 3439684419 final
[P3-1] Table 4 took 119.554 sec, wrote 3465698701 right entries
[P3-2] Table 4 took 30.3118 sec, wrote 3465698701 left entries, 3465698701 final
[P3-1] Table 5 took 118.319 sec, wrote 3532296870 right entries
[P3-2] Table 5 took 31.281 sec, wrote 3532296870 left entries, 3532296870 final
[P3-1] Table 6 took 115.684 sec, wrote 3712450474 right entries
[P3-2] Table 6 took 32.9761 sec, wrote 3712450474 left entries, 3712450474 final
[P3-1] Table 7 took 145.489 sec, wrote 4292771111 right entries
[P3-2] Table 7 took 37.4429 sec, wrote 4292771111 left entries, 4292771111 final
Phase 3 took 963.204 sec, wrote 21872218890 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 126.385 sec, final plot size is 108805736889 bytes
Total plot creation time was 2285.4 sec (38.0899 min)
Started copy to /mnt/TEMP/plot-k32-2021-06-30-19-50-xxxx.plot
Copy to /mnt/TEMP/plot-k32-2021-06-30-19-50-xxxx.plot finished, took 186.69 sec, 555.814 MB/s avg.

any suggestions on how to improve?

Remaining memory(including SYS): 54GB

What CPU, memory, drives also what the commands you are using?

del r820, 4x E5-4657L,384GB ram, temp1/temp2 in ram, final storage ssd sas
command: chia_plot -n -1 -r 64 -u 256 -t /mnt/220G/ -2 /mnt/110G/ -d /mnt/TEMP/ -p * -f *

If you are getting 38min then there is something Wong or I would expect it to be wrong. I have an r620 with only 40 threes and 128 ram and I do low 30 min

I would expect full ram to bu much faster. What speed is your ram?

I would suggest you dome some test as follow:

-r 32 -u 256 if that shows better then test different combinations for

-u, --buckets arg    Number of buckets (default = 256)
  -v, --buckets3 arg   Number of buckets for phase 3+4 (default = buckets)

To get the absolut best

The problem i see is that the cpu’s limit at around 70%.
Can you post the ph1/ph2/ph3/ph4 times? so i can see if i need to work on ph1/2 or ph3/4 for optimization?

ddr3lv 1600

if you have 384GB ram, shouldn’t you only need to create 1 big ram drive? i am not sure if a -2 for 2nd temp drive is helping your situation or hurting.

as other poster mentioned, I would definitely try different -r -u -v values (start with 16 threads first to see what your baseline value times are) to find your sweet spot. if your CPUs in use aren’t maxxing at 64threads then it looks like your memory is just not able to match up to full speed of the threads. On windows, there is something called NUMA where memory size and cpu grouping come into play. I wonder if this needs to be optimized better for linux (and/or you need a setting) too since you have so many threads.

Sounds like ram bottlenecked. With 384gb ram you only have 96GB per cpu. That means it needs to use ram from opposing numa nodes cpus which is slow.

I don’t know enough about the server you have to determine if you can move ram around asymmetric so that you have 128gb on a cpu…or just get more ram.

And fyi I am ram bottlenecked even with 128gb ddr4 @2133 on my e5-2680 v3. I get very similar times with others with more cores and clock speed of same cpu gen. (2100s single and 2260-2300s looped with built in transfer to usb hdd).

My thought on ram was the following: 48 cores x 8GB/core. Will try to mount tmpfs as a single drive and see if speed improves. Thank you all for your suggestions. Will post updates.

changing to 32 threads seems to have reduced plotting time to 29 min
will play and see if i can reduce it any further