Optimizing plotting on an AMD Ryzen 5950x (or any other 16c/32t CPU)

Plus remember

On my son’s 5950x which is only doing 8 in parallel, (to 3 target drives, also 980 pro), the numbers are between 20-21k per plot.

It is looking like 14 parallel plots pushes plot times to 32k secs each. That’s probably still more plots being generated all told; CPU usage is still “only” 80% overall.

I think there’s a point of diminishing returns with so many plots at once; I’d be terrified of trying to optimize a 32c/64t machine for this… whoo boy. That’s a lot of I/O endpoints and a lot of potential CPU/RAM crosstalk. I think 14 parallel is as far as I wanna push things, personally.

I also have too many drives in this machine. I think 4 plots per drive is fairly low risk on the 2tb drives, so I could get away with 4 drives instead of 6. I may pull drives out of this machine to put in another 5950x I am building up.

We can math this out, actually? 86,400 seconds in a day, therefore:

  • 21k secs = 4.1 plots/day × 8 parallel = 33 plots/day (!)
  • 27k secs = 3.2 plots/day × 9 parallel = 28.8 plots/day
  • 30k secs = 2.88 plots/day × 12 parallel = 34.6 plots/day
  • 32k secs = 2.7 plots/day × 14 parallel = 37.8 plots/day

There we go! I guarantee you there’s no I/O bottleneck here, unless the 980 pro firmware is an issue. However it does look a bit odd because my son’s rig is doing better with the native I/O on the mobo M.2 so perhaps I misconfigured the M.2 PCI card… and at some point I will update firmware.

(Thing is though, I’m not seeing big plot time differences on the plots that happen on the native M.2 ports on this machine. Drives D and E are mobo M.2 ports.)

I do not see any path to 50 plots/day. Maybe on a 24c/48t machine, or 32c/64t machine…

This could certainly be the case with Windows, but I think Linux appears to be handling it better. In my case, when increasing the number of concurrent plots my SSDs (4x PCIe 3.0 x4) hit their “limits” before the CPU does. By “limits” I mean the point at which progress noticeably slows down and iowait shoots up.

Hopefully the firmware update and Samsung NVMe driver will improve matters, but like you say there does seem to be a separate issue at play here - something in your BIOS settings perhaps.

Haha - I’ve hit 51 (now 52 after the queue levelled out) plots a day (12 parallel) on my Linux 5950X machine (not overclocked) so I think there’s something that’s being missed here (other than just Linux). I’ll post my setup details later if it helps.

4 Likes

Well done! :beers: So it is possible. Just not easy… so you are hitting 50 plots/day with 12 parallel? That’s good info to know.

What are your I/O targets for the 12 parallel plots?

1 Like

Please send us the configuration used.

hmm pretty similar system, I am running the 3900X stock on AMD Prism cooler, 64GB DDR4 3200 CL116, Also have multiple destinations drives.

Big difference seems to be in the ssd’s though. Those micron disk cost a pretty penny but I would recon they are a good choice for plotting. I’'m using 3x WD sn750 1TB atm, they at least work a lot better than the ones I had before.

Also I guess you run on linux, i’m on windows. Seems like you are over assigning threads, when I do that its tends to end in BSOD. Anyway, thanks for the info, I’ll keep trying to bump up the count.

@the rest, sorry for hijacking the topic for a 3900x discussion :innocent:

2 Likes

Yeah, so that’s the thing - my targets aren’t all that special:

2x Samsung 970 Pro 1TB M.2 NVMe PCIe 3.0 Gen 3
2x Intel P3600 1.2TB AIC NVMe PCIe 3.0 Gen 3 (7 year old drives, average performance by today’s standards)

These are installed in a B550 motherboard, so 1x 970 Pro and 1x P3600 share just one PCIe 3.0 Gen 3 x4 bus via the chipset. The other 2 each have dedicated x4 lanes to the CPU. I could use my Asus Hyper card so ALL SSDs get their own x4 lanes (or switch to an X570 board), but given my results I don’t feel I need to so that card is going into another machine.

I run Ubuntu Server 21.04 which is supposed to have better support for the latest AMD CPUs, and I connected a video card only to set up the BIOS (enabled XMP for the 64GB Crucial Ballistix 3600MHz RAM) and disabled unwanted things like audio. Then I removed the card and run it headless over SSH.

Each pair of drives are in Btrfs RAID0 with asynchronous TRIM so commands are queued to avoid blocking. I believe XFS discard does this as well, don’t know about Windows.

sudo mkfs.btrfs -f -d raid0 -m raid0 /dev/nvme0n1 /dev/nvme2n1
sudo mount -t btrfs -o ssd,nodatacow,discard=async,space_cache=v2,nobarrier,noatime /dev/nvme0n1 /mnt/pool

I use plotman as below. I settled on 12 parallel plots as the iowait numbers start to rise beyond that (the 1 and 2 in the fourth column of dstat).

directories:
        tmp:
                - /mnt/pool
                - /mnt/pool2
        dst:
                - /mnt/hdd1
scheduling:
        tmpdir_max_jobs: 6
        global_max_jobs: 12
        global_stagger_m: 25
        polling_time_s: 20
plotting:
        k: 32
        e: False
        n_threads: 8
        n_buckets: 128
        job_buffer: 4500

Part way through the day:

+-------+----+-------------+--------------+-------------+-------------+-------------+---------------+
| Slice | n  |   %usort    |   phase 1    |   phase 2   |   phase 3   |   phase 4   |  total time   |
+=======+====+=============+==============+=============+=============+=============+===============+
| x     | 35 | μ=100.0 σ=0 | μ=6.6K σ=106 | μ=5.0K σ=40 | μ=8.3K σ=69 | μ=561.8 σ=9 | μ=20.4K σ=129 |
+-------+----+-------------+--------------+-------------+-------------+-------------+---------------+

Key points

  1. I don’t know if the RAID0 makes a difference as I set it up like this from day one and don’t want to change it now.
  2. I’m using older PCIe 3.0 Gen 3 x4 SSDs (2 of which are bandwidth restricted!) so there’s room for improvement here.
  3. CPU usage averages about 60% and occasionally spikes to 80% so there’s some headroom here, even more with overclocking.
  4. I use 8 threads for my plots simply because I saw this suggested somewhere. Don’t know if it’s optimal. Usually no more than 4 plots are in phase 1 at one time.
  5. Higher RAM speeds appear to be beneficial so overclocking this is an option (see 2666mhz vs 3200mhz RAM Plotting Results - #5 by KoalaBlast)

As there is more headroom available, adding more SSDs to bump up the IOPS will probably get it to the 55-60 daily rate, only at this point will the CPU be fully utilized.

12 Likes

Well, you’ve convinced me – it is possible to achieve 50 plots/day on a 16c/32t machine. It is surprisingly tricky though. My simple approach of just throwing a bunch of 4t/6gb parallel plots and tons of I/O bandwidth (six full bandwidth nvme ports which is crazy overkill) at the problem didn’t really work.

My gut feeling says the specifics of the stagger must be important, the timing?

(Also I’ve read that the 970 pro is actually one of the best plotting drives there is. It’s incredibly fast on my i9-9900ks … I get 4.5hr plot times even running 3 parallel)

I have since removed 2 of the drives (so it has 4 now, instead of 6) and am moving on to throwing more machines in the mix.

4 Likes

My gut feeling says the specifics of the stagger must be important, the timing?

In order to finish 50/day, we have to start at least 50/day. So, the maximum allowable stagger would be 28.8 minutes.

1 Like

That doesn’t make sense, we are not using a single plotter! We always use plotters in parallel.

The absolute fastest I’ve seen a plot finish myself, on hardware I own, is 270 minutes (4.5 hours), that is 5.3 plots per day. With that machine we’d need 9 parallel processes, and no speed degradation as we increase the number of plotters… fat chance of that! So it’s a question of

  • how many plotters
  • when do the plotters start
  • how fast do the plotters finish once fully loaded

The general rule of thumb I use is to stagger so you are starting a new plot as soon as phase 1 completes; that is 44% of the total plot time.

I read that 8 threads is actually worse than 4, so I am puzzled by a lot of this @harrisperhaps there’s some magic in the 8 threads when everything is under full load? I always do 4 threaded plots, except on my 4c/8t machines where I do 2.

I’ve hit 51 (now 52 after the queue levelled out) plots a day (12 parallel) on my Linux 5950X machine (not overclocked)

So 12 parallel to hit 50… means each of those 12 must be doing 4.16 plots per day, or 346 minutes per plot, under 21000 seconds per plot or 5h46m? :dizzy_face:

On another Ryzen 5950x I just set up, I see 21000 seconds or under for the very first plot (low contention from other plotters), but 25000, 26000 seconds for the subsequent plots. To achieve 50 per day, at 12 plotters, all of the plotters must be doing 21000 seconds or better.

2 Likes

Yes but this possible, I have very little storage bottleneck left, it only becomes a thing in phase 4 which is basically, I can’t write to my USB3 connected disks any faster. :wink:

This is a 5900x with up to 12 plots going at the same time.

Still not done fully tuning so don’t know exact figure per day, it’s between 40~45 I expect.

2 Likes

Hey Quindor,

What are your current plotman settings for that configuration?

Have a similar setup without the USB drives haha

I think most important you are asking about would be:

        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 5

        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 12

        # Don't run more than this many jobs at a time in total.
        global_max_jobs: 12

        # Don't run any jobs (across all temp dirs) more often than this.
        global_stagger_m: 30
1 Like

What you’re saying makes sense to me and your observations about plot times and speed degradation is consistent with what I see. But I’m not too concerned with individual plot time.

My thinking is that our total throughput will be limited by our stagger time in an otherwise unconstrained system, correct? For example, if the stagger time is set to 1 hour; it doesn’t matter if each plots finish in 10 seconds or 10 hours, the system will not exceed 24 plots in a 24 hour period.

If we are targeting 50/day, any stagger value greater than 28.8 minutes guarantees that the system will fall short of our target.

Does that make sense, or perhaps I am misunderstanding something?

2 Likes

Sure, I can follow that train of thought. :slight_smile:

No, I’m not following; the stagger only affects the start time for the first plot, all subsequent plots kick off immediately afterwards. I think we have different understandings of the word “stagger”. For me it is a one-time value, that determines how offset the plotters are against each other, and it is only set once, forever e.g. “plotter 5 will be on phase 1, while plotter 6 is on phase 3 – they’ll always be four hours apart”.

There might be ways of doing stagger every single time every plotter begins a plot, but I personally only use stagger values when starting the plotter initially.

Hmm now I’m confused about how you’re staggering plots. Do your plots start in a way that is consistent with Quindor’s screenshot from post #112? Or are there times when you have more than 1 plot starting at once?

In particular, the “wall” column in plotman reports the time a plot has been running. So for plots # 0-5 we can see his system was unconstrained and each plot is offset by exactly 30 minutes, his stagger time. Plots 6-10 have some additional variability in the timing offset, presumably due to some secondary plot limit (indicating a minor system constraint).

So in this case, with a stagger of 30 minutes, Quindor would be able to achieve 48 plots/day as an absolute maximum. In practice it looks like he would land just shy of 48 because some plots (ie plot 9) are having to wait up to 35 minutes before starting.

That is not correct, it will always wait until it’s stagger time is up and then check the other values if it’s allowed to start a new job. If not it waits until it’s allowed to and then stagger starts counting again. Check my stream shot again, you see the stagger time is 1024s/1800s :slight_smile:

I see, so we’re using different versions of “stagger”. I should probably say “offset”; I let everything constantly plot, just offset from everything else by {x} minutes.

It’s possible I’ve been using the word wrong! Let’s clarify here

2 Likes

Excellent data! Thanks for sharing! I just got my 4x 970 Pros in. I can only use 3 for now until I get NVME to PCIE card. I will be testing 4 once that arrives. If I can plot 15 in parallel on 4 x 1TB, I’ll stick with that, but if not, I’ll do 5x 1TB. May do 5x 1TB anyway just to get more IO bandwidth.

I just started using the 970 Pros but they already seem to be ~25-30% faster than the 2TB Firecuda 520s I’ll be sending back. I will have to see the final results when the plots finish, and will post the results here.

2 Likes

Anyone have a motherboard they recommend for the 5950x? Just got one today (Cambridge Microcenter has 20+)