The bottleneck of plotting is not CPU, SSD or the amount of RAM!

I have a lot of old servers and plenty of space to store plots. So I thought: why not give it a shot?

I want to share some interesting experience for those, who think buying super-fast SSDs will help. I think it won’t.

Here is my setup:
My desktop PCs are all AMD FX-8350 (8C, 8T) 3,5 Ghz. 32-64GB Ram.
My servers are all AMD Opteron 6380 (16C, 16T) 2.6 Ghz. 64-128GB Ram.

First test:
Desktop:Plotting: 2 Plots, 4 Cores per Plot, 8 GB RAM each, Temp-Drive is an SSD with 1TB. Both are using this drive, connected SATA3 (6GB/s). Time to finish for both plots: 10h.
Second test:
Server-Plotting: 2 Plots , 4 Cores per Plot, 8 GB each, Temp-Drive is an SSD with 1TB. Both are using this drive, connected SATA2 (3GB/s). Time to finish for both plots: 13h.

Why is the server PC slower? CPU-Speed or the slow SATA-Interface? Let’s find out:
Server-Plotting: 2 Plots parallel, 4 Cores per Plot, 8 GB RAM each, two separate 1 TB SSDs connected by a SATA3-Controller (PCIe x4). Time to finish for both plots: 13h.

So, SSDs speed or interface it NOT a bottleneck! A single SSD connected via SATA2 gives the same speed as two separate SSDs connected via SATA3.

And here is where it get’s really interesting. Let’s create 4 plots at once:
Server PC, 4 Plots parallel, 4 Cores per Plot, 8 GB RAM each, 4 separate 1 TB SSDs connected to an SATA3-Controller (PCIe x4). Time to finish for all four plots: 26h.

That’s what really surprises me. CPU cores do not matter, too. I tried everything, but I cannot utilize all the cores. I can create just 2 plots at once, no matter how many cores the CPU has, how much RAM is available or how fast the SSDs are. The third plot always increases the time by 50%, the fourth plot by 100% and so on… I think the real limitation of plotting is the memory interface/bandwidth.

What do you think?

Are the “separate 1 TB SSDs” all connected to the same SATA3-controller? Not all controllers are created equal. If it is the same controller I’d guess that is your bottleneck.

I just started four plots in parallel with a 1TB SSD drive for each. Two drives are connected to onboard SATA2 and the other two SSDs have a separate SATA3-Controller. I am curious if there is any speedup to the 26 hours. I will report tomorrow.

1 Like

It’s a combination of all those things combined really, I actually see plotting speed differences between my 3900x and 5900x purely because the 5900x is faster, but you’ll only see that if you don’t have any other bottlenecks which limit performance first (such as NVME, etc.)

So yeah, in the end you’ll always have “a” bottleneck, but building a well rounded system you can at least plan for what that is going to become.

The 4 plots are doing well now. They might not be finished within 13 hours, but it will be way faster than 26h.

I always thought that SATA3 means that each channel has 6GBit/s, but that’s actually the bandwidth of the controller and its shared among all channels. So my statement was right: the bottleneck is not CPU, SSD or the amount of RAM. It’s actually the SATA-Controller.

Yes, this is fairly well understood now. There was a weird hyper-emphasis on disk speed in earlier versions of the plotter, before they optimized away a lot more of the disk activity.

It’s mostly cores and clock speed these days, and reasonably fast disks.

Look up the topics here on ramdisks for proof. That’s infinitely fast disk.

I plot in ramdisk on a enterprise-scale server machine (160CPU, 6TB RAM, L1-L3 cache), and my best times for a single k=32 plot is 25Ksec. I tried various plot options and a few simple system tweaks–e.g., CPU affinity to avoid NUMA issues–with modest gains. I thought I saw someone post a plot time of 4hrs, not quite sure how.

Anyway, I’m working on 2 different methods in the 1 area I noticed that may be ripe for speedup, stay tuned…

1 Like

I’d have to disagree with that, although yes a 8 port SATA controller might have less the 8x 6Gbps as a backend bandwidth, it’s certainly not a general thing on Intel or AMD motherboards that they have really heavy limitations.

But you can easily check, what chipset are you using and what kind of bus connection does it have to the CPU? Maybe you are running more devices through the same chipset link slowing your SATA down.

This is a big reason to go for a x570 AM4 motherboard vs a B550 since the B550 “only” has a x4 Gen3 PCIe link where the x570 has a x4 Gen4 PCIe link! Especially when you are going to be running lots of USB3 and/or SATA devices + a NVMe in the second slot, these will all share bandwidth and thus the Gen4 can really help in some situations.

But generally it’s not that 6 motherboard SATA ports are limited to 6Gbps in total, they are 6Gbps a piece.

1 Like

That’s super helpful to know, hadn’t even considered that.

I’m going for a big USB farm, probably going to switch out my Prime B550-Plus for a Prime X570-P soon then. It actually has fewer ports on the USB3 root hubs, but if I’m understanding you correctly all that matters is the PCIe bandwidth?

USB Root bridges are hard to figure out how much bandwidth they have on the backend. For instance, although my little box only has a single root bridge, it’s motherboard ports could do more performance then 5Gbps. So the backend connection was for sure higher then 1x5Gbps, although it’s hard to say what it’s actually going to be.

For Ryzen and x570 part of the USB ports come directly off the CPU and partly they come from the chipset but with x4 Gen4 lanes going to it, you have lots and lots of bandwidth to play with so it’s not going to slow down very soon. Just try to spread the drives over the available root bridges and you should be perfectly fine, especially for farming which is a low workload anyway. :slight_smile:

1 Like

Very very fast single threaded CPU. My i9-9900ks does 4.5 hours to NVME SSD!

5 Likes

^^^^ This
I’m finding clock speed to be the biggest gain on plot times. Obviously more cores can give you more Tb per day, but core clock is where it’s at for speed per plot.

2 Likes

Have you tried using AS SSD Benchmark?

If you run it once on each drive, then run multiple copies of it simultaneously on all drives you can see if they are bottlenecking each other.

I did this with my motherboard that has 2 x NVME slots and noticed that they share the same lane. Using an adapter to put one of the NVME’s in a PCI-Express slot doubled the speed of both drives when running simultaneously.

This is why a lot of people say that raid NVME’s offered no performance boost to them, they do not realise the underlying bottleneck in the motherboard design.

2 Likes

6Gbps per channel. Motherboards frequently put two SATA ports per channel. Which are, as you stated, run through a common PCIe x4 link with all the other peripheral IO, which on Intel chipset would be the DMI and on AMD the uh… forget… too lazy to look it up. You get the idea. You’re absolutely correct, there is just a nuance to the statement of “6Gbps”

Yep. I overclocked my CPU to ridiculously high unstable levels, and it cranked through a plot straight to RAM disk lickety spit, but… I get more plots per day out of parallel plotting to SSD on lower clock speeds that doesn’t turn my workstation in to a toaster oven capable of creating delicious pizza.

5 Likes

But I love pizza :’(

Yeah, this is the way to go, more fairly fast storage space and then reducing down to 1 thread per plot using a fast single-core performance (5ghz+). It seems to work very well here, I just wish I had more fast storage haha

I think some old dual Xeon servers (3.5ghz+, 48 threads) could seriously rip through plotting if they were paired up with some fairly fast raided SSD’s (10tb+??).

I am currently running 30 plots in parallel, 4 threads per plot, staggered at 20 minute intervals at around 3GHz across 8x NVMe SSD in RAID0. So far it is doing pretty well, though I am still tuning and tweaking the plotting process. 30 parallel plots means the workstation is still usable and I can get my daily work done, or play Diablo III whenever I feel like it.

Damn, what you using that has 120 threads???

Only phase 1 makes use of multiple threads. Phases 2-4 are single threaded and those other threads are sitting idle (i.e. not using a core). So with proper phasing one can actually run seemingly more threads than cores/hypercores since only those processes in phase one will be trying to use more than one thread.