There are a lot of knobs we can turn when trying to optimize plotting performance. So far, only a few have been tested in head-to-head benchmarks, and that information is scattered all over the place. I figured we should have a list of possible plotting performance tweaks we have benchmarked or should benchmark.
This list focuses on software/configuration matters since those are things you can adjust without buying anything.
Buckets are sorted with quicksort when not enough memory is available for uniform sort. Somewhere above the default - I used 4608 MB - uniform sort is always used. What the performance implications are, what the exact threshold is and whether adding memory past it has any effect is unclear to me.
Whether going beyond 4 helps is unclear. Worth comparing on systems with a lot of excess hardware threads at their maximum sensible number of parallel plots. Note this is currently only relevant for phase 1, so all parallel plots in other phases only count for one thread.
Sort buckets
Fewer buckets need more RAM (2x for half, I believe). Usual recommendation is to keep it at 128 but 64 seems worth trying for people who happen to have a lot of excess RAM anyway. Definitely do not sacrifice parallel plot count for fewer buckets.
Windows vs. Linux vs. macOS
Equal environment (NTFS, no software RAID) to test the plotting code, practical environments for practical purposes
Linux filesystems
Impact of disabling journaling in Ext4 (no performance guide recommends this, but that’s because it’s too risky for any use case except ours)
The usual Linux distribution default - TRIM once a week - is obviously bad for plotting but these two strategies both sound sensible.
May vary by filesystem, disk and firmware version.
poll_queues NVMe driver parameter on Linux
Intel recommends tweaking this in an Optane performance guide. I’ve heard the suggestion to set it to the number of CPU cores. I believe this feature is off by default.
Impact of CPU side channel mitigations
Defaults vs. mitigations=off kernel parameter on Linux. Results only valid for one CPU microarchitecture.
Core pinning with parallel plots
No pinning vs. allow one hyperthread of each core vs. allow one hyperthread of a subset of cores (on many-core systems)
Note you should definitely pin each process to one CPU on multi-CPU systems and make sure each process only uses memory from one NUMA node on NUMA systems (multi-CPU and Threadripper)
Most of these most likely have little, if any, effect in practice. But you never know.
Unfortunately this will have to be an Ideas Guy post as I did my POTLing on rented hardware I’ve already returned and own no useful hardware myself. But hey, someone had to write it…
I’d like this to be a community resource so I’m happy to update this post with any additions or corrections you guys have and I would not mind moderators making content edits either.
Generally you are cpu bound or io bound. Get cpu to max saturation and you will not see scaling in TB/day output, Hyperthreading does not do much so just target the number of processes to core count. If the CPU is always busy then number of threads and staggering will not make a difference. I stagger mostly to relieve stress on the I/O for the destination drive. For the SSD use a good data center NVMe and enable discard. Ideally more drives will provide higher IOPS per terabyte but fewer drives is easier to attach.
I see too many people obsessing over r value and staggering. Plot times are not important in isolation (good to compare run versus run) but total system output is
Assuming you have a good (ie not outdated CPU) and an ample amount of RAM, the main bottleneck is usually (most likely) the NVMe. Not all NVMe’s are created equal. No matter how you tweak kernel, memory or filesystem settings, if the NVMe cannot cope with the RW requirement of multiple plots, the iowait will tend to go up and slow things down drastically.
My plotter is a 10th gen i7 6c/12t with 16GB DDR4 2666MHz RAM and a 1TB WD SN550 running Ubuntu Server 21.04 can do 12/day using the default plot settings. Still working on to squeeze a bit more to do more than 12.
My setup isn’t the most efficient nor the best performing plotter, but I’m good with it. I don’t wanna spend $ for a really good plotter.
I have a system with 96GB of RAM and can confirm that increasing RAM per plot to around 8-10GB yields noteworthy performance improvements (I only have a 1TB NVME so am limited to 3-4 concurrent plots). Also, I realized that ensuring only 1 plot is active per phase 1 with max number of threads (I have 12 so I dedicate 8 to phase 1 plots) makes things very smooth, as all other stages only use 1 thread anyway…
Thanks for the wiki, super handy! Maybe worth adding something on NVMEs with buffers that can reduce write speeds which I agree with the OP continues to be a bottleneck even on NVMEs
Been playing with buckets. 128 as the default works great with 3300 per 4 threads. 5200 per 4 threads is marginally better, but better.
Tried 64 sized buckets. Jesus you need like 4x the ram, so running 12800 works, but anything lower and the 64 sized buckets just drag. I don’t see how unless someone has like 128gb memory for an 8 core the 64 sized buckets would help.
I have a 3rd gen i7 spitting out 2/day in queue (consistent on the output rate) on SSD - I will try 8 threads, 8gm ram on 64 buckets on that, willreport any noticeable improvements (near or above 30 mims).
From what I’ve seen so far RAM speed is important (only) on Ryzen systems.
Someone explained it somewhere on the forum, I don’t know where anymore.
Somethin about the limitations of the infinity fabric causing the bandwidth to be limited and this can only be helped by having faster RAM to increase the throughput.
Refer to the chart on DDR4 write MB/sec, similar to Chia plotting. This also affect 5950x as AMD is using the same Infinity Fabric design. For optimization, use Linux and overclock the memory clock speed. Use multiple ssd instead of 1 ssd to avoid bottleneck on pcie or sata. Maybe Raid0 will help too.
Else sell ur AMD CPU and switch to Intel one for plotting.
Well I think I’m running pretty ok dollars spend/plots a day.
I’m not sure at this point if there is much difference between Intel and AMD in that metric.
I might just sell the ram and get something faster instead. But I’m much more interested in storage space atm
Intel plots faster, AMD you get more cores/dollar.
If you plot faster you can save on a bit of temp space, but then again intel has many limitations on the m.2 slots and sata ports. So pros and cons either side.
But a lot of the building advice people are looking at on like chiadecentral is Intel focused. So It’s good for people to know that when going for AMD, get fast memory.
I based my build on the info I was reading which focused on number of cores and SSD endurance.
Only later I found out about sustained write speed, the massive importance of single core performance and the fact that you need fast memory for Ryzen.
In many articles, It says things like: just get the cheapest memory you can find…