Performance loss plotting on multiple HDDs in parallel

miguel · April 30, 2021, 12:43pm

Is there anyone doing the same–employing a bunch of HDDs as temp drives and plotting in parallel on them, one plot on each HDD? I have many small SATA HDDs (~3TB) but only one 512GB SSD so I tried taking the HDDs as temp drive.

Actually with just one HDD the speed is pretty acceptable–around 10hrs, compared to 7hrs on NVME SSD. (i7 10700, 32GB RAM, -r 2, -b 4000).

But then comes the problem: when I try to add more plots in parallel (with or without 1-2hrs stagger, -r 2, -b 4000) on more HDDs, one on each, the speed decreases dramatically–about 16 hrs with 4, and ~20 with 6, the more the slower. Telling from the time spent on each step, every phase got slowed down. I tested the same on other platforms such as R5-3600, i5-10400, and symptoms are the same.

I really wonder where the bottleneck is–CPU has more threads than allocated by the plots, and CPU usage is only 20-40% most of the time; RAM is abundant; and I ran reading/writing benchmark on all the HDDs (even if 8 together) at the same time, and their speed is just the same as solo.

Is anyone aware of the reason behind this and how to mitigate that? Much appreciate!

miguel · April 30, 2021, 12:45pm

To clarify, all these HDDs are CMR not SMR, with similar performace and similar lead time when single plot is running on them as temp drive

InternetGuy · April 30, 2021, 12:50pm

There’s only so much bandwidth available on all your sata ports combined. I think this is it. It’s why dedicate controllers or m2 drives are preferable. There’s more bandwidth available. Controllers have their own pcie lanes as do most m2 slots (this all varies greatly board)

Blueoxx · April 30, 2021, 12:51pm

This is the issue, SATA saturates very quickly if more than one drive is being used at 100%.

basilhorowt · April 30, 2021, 12:58pm

If you have lots of these drives to use it might be worth it to look if a SATA add on card would give you more IO bandwidth for drives. Seems like it would? 10hr is certainly pretty reasonable. I’m surprised that more people aren’t talking about HDD plotting.

vandy · April 30, 2021, 1:25pm

We’re talking HUGE differences in bandwidth (width of pipe) between 6gpbs SATA and a PCIe 4.0 m.2

miguel · April 30, 2021, 2:57pm

Thanks for the reply bro, however one SATA HDD could only run up to 200MB/s, and all onboard SATA ports combined should be sharing an overall bandwith of 4GB/s. Acutally I ran 4-6 instances of AS SSD Benchmark or HD Tune Pro, testing the read/write rate of the drives at the very same time, and there doesn’t seem to be any noticeble difference than testing alone. Even if I add one SATA SSD which itself could take up to 400-500MB/s in the test pack, still no speed loss.

How comes 4 HDDs would be slowing down so much? This is not explained yet.

miguel · April 30, 2021, 3:01pm

Yes and I’m aware that, however plotting itself isn’t THAT fast as it’s bottlenecked by so many factors.
Even with one single HDD as temp drive, one plot can be finished within 10hrs, and the difference is much smaller than that between the speed of HDD and PCIe M.2

miguel · April 30, 2021, 4:22pm

It seems as if when plotting on multiple HDDs in parallel, the CPU constantly stops (as a whole) waiting for one or another HDD to respond and therefore the CPU utilization rate stays really low (usually 40-60% in the log which I believe should represent that percentage of just 1 thread, as in single plotting it often exceeds 100%)

InternetGuy · April 30, 2021, 7:41pm

is seq the correct benchmark to go against? Watching writes, it’s definitely not a large chunk all at once. Are seek times on the HDD a factor here? I mean, folks are doing RAID0 m.2 x4 or a bunch of enterprise drives tied together without issue. Just something w/ the SATA interface maybe?

codinghorror · May 1, 2021, 7:33am

That’s better than I expected. SSD vs NVME is covered here; TL;DR NVME is 13% faster. But that is all single plot testing; none of that matters if you are trying to do parallel plots!

miguel · May 1, 2021, 8:42am

well, I tested 4K randow R/W together as well and the results are still the same–no difference between single run or simultaneous run

Jesion · May 1, 2021, 10:28am

@miguel look at this topic: Extremely slow results from Samsung 980 m.2 SSD - #11 by Jesion and read gerhard replay, maybe it will help you. When I set trim my speed changed from 9h to 5,5h per plot. 1 plot = 4threats 3408. Now I create 3 plot with 2h stagger. I have this same cpu as you and 32GB RAm 3600 but disk nvme 1TB

miguel · May 2, 2021, 1:51am

I think the problem is almost solved.
Tried PrimoCache which utilizes some RAM as a buffer for the HDDs and the performance increased dramatically, now I can finish 6 parallel plots in 12hrs. So, it seems to be a problem caused by heavy simultaneous random access of multiple HDDs–CPU and/or other parts will have to wait for the slow HDDs constantly and that dragged down the speed. With the RAM buffer this situation is improved greatly.
Now I’ll try to tweak the parameters for even better performance. Thanks guys!

ryan · May 2, 2021, 3:14am

I wondered about doing something like that because the workload seemed to have bursts followed by idle. I did over 300 plots directly to HDD and 12h is really good. The best I could get was about 14h with most of mine taking 17-18h because of being CPU limited. I have older, slower CPUs than yours though.

If you’re not doing it already, try using the same directory for -t and -d. I think it might save copying the plot at the end (unless the software is smart enough now to do a move if it’s the same filesystem).

codinghorror · May 2, 2021, 3:44am

What the heck is “primocache”?

ryan · May 2, 2021, 3:50am

It’s software. I glanced at it. I didn’t look close, but I’m guessing it enables disk write caching and / or ReadyBoost.

PrimoCache - Excellent Software Caching Solution to Accelerate Storage (romexsoftware.com)

That’s not a recommendation. I wouldn’t buy it personally since enabling write caching on each disk should accomplish the same thing for free.

JustinLloyd · May 5, 2021, 1:17am

Not quite. It is so much more than drive write caching or ReadyBoost and thoroughly different use cases.

PrimoCache is secret sauce for a lot of applications, especially in build systems that use Windows, or when you need lots of storage but also high speed write to that storage, e.g. computer vision analysis, or non-linear video handling. Closest equivalent on Linux is bcache, or the new tiering feature of btrfs.

For my configuration: I have my plotter configured to plot to eight separate, Intel DC P3700 NVMe drives for the -t. All eight of the -t drives have separate, dedicated 16GB RAM-based write-only caches in front of them.

A single larger Intel DC P4610 for the -2. The -2 has its own dedicated 256GB RAM-based write-only cache.

And then a 10TB HDD with a 100GB+ L1 RAM cache and a 1TB L2 SSD cache that sits in front of the HDD for the -f.

On the farmer, I have a 4TB DC SATA SSD with a dedicated 1TB partition to act as an L2 cache that sits in front of the farm HDDs which lets the HDDs keep up with the 10Gbit connection to the NAS.

The tiered caching is provided by PrimoCache. I’m not saying it makes a huge difference, but it has helped out my configuration.

The company also makes PrimoRAMDisk, which is kind of neat, exactly what you expect, but obviously not much to say about that subject because, well. it’s a RAM disk.

JustinLloyd · May 5, 2021, 1:20am

AFAIK -2 to -f is a move operation when they exist on the same physical volume. But -d to -f, even on the same volume, is always a copy operation. But don’t quote me on that, it’s been a while since I tested that and it could have also changed since then.

JustinLloyd · May 5, 2021, 1:25am

You might want to try creating a separate read-cache and a separate write-cache in PrimoCache and set your delayed writes to around 20 seconds. You can also utilize a separate, small SSD as a L2 cache. Though be aware, PrimoCache will obliterate everything on the SSD when you do that.