Optimizing plotting on an AMD Ryzen 5950x (or any other 16c/32t CPU)

adam2112 · May 14, 2021, 6:31am

Ryzen 5950x
64gb Patriot Viper Steel 3600mhz RAM
Intel 3.2tb 4610 x2
Running Swar in Windows

I have been running 20 plots in parallel, 10 plots concurrent per 3.2tb Intel 4610 with max of 3 plots in phase 1. 45 minute delay. 8 threads/plots, 4000 RAM/plot. No buffer drive. CPU usage hovers around 75%

Phase 1 times are 3hrs 30min, phase 2 2hrs 45min, phase 3 4hrs 40min, with total plot time around 11.5 hours or so.

Getting 41-42 plots/day.

It seems like I am running alot more plots in parallel than most of the comments here. I have not tried dialing it back yet, just wanted to load it up and see how it did. Is the plot time of 11.5hrs inefficient? Have any of your tried pumping it up to 20 or 22 plots in parallel?

I am planning on switching over to linux, which would hopefully bump current setup to 45 plots/day.

DougieC · May 14, 2021, 6:50am

Odd that even with a single gen3, i found by best plot/day was also around 21.5k/s per plot (8 plots). Increasing plots obviously increased time per plot but also dipped my plot/day.

Voodoo · May 14, 2021, 8:45am

What kind of plot settings are you using on the 3900x ? And what are the plot times.

I can’t seem to get past 30/day, on the 3900x, tried quite a few different options already

Harris · May 14, 2021, 8:52am

Don’t evaluate the efficiency based on the plot time. Longer plot times are fine if you’re fully utilizing your subsystems without overloading them, as this will result in a higher daily plot rate (which is what really matters). You need to identify if there’s a bottleneck in your pipeline and adjust the workload based on this.

CPU - 75% is okay and there is a bit of headroom there. You have the option to overclock if this starts to creep past 90%.
RAM - You may be hitting your RAM limit and swapping might be occuring so you need to take a look at this.
Disk IO - Check the iowait (I don’t know the Windows equivalent but in Linux you can use glances or dstat). You should aim for this value to be as low as possible as you don’t want the CPU waiting on the SSDs.
PCIe bandwidth - 2x PCIe 3.1 x4 drives is okay if they are on dedicated lanes (not shared)

41-42 plots a day is a bit low considering the cost of those SSDs. You probably have a disk IO limitation due to “only” using 2 drives. Usually, 4 smaller drives would be better due to twice the amount of controllers and a higher total IOPS - THIS gets the iowait down so the SSDs stop being the bottleneck.

In your case you should reduce the number of parallel plots to get your iowait down. Or, if you have more breathing room here (unlikely) then you can slowly increase your parallel plots up to the capabilities of the CPU, depending if you have enough RAM without needing to swap.

PAVoutsinas · May 14, 2021, 11:45am

I have hit best plotting times with the all core overclock on my 5950x. My cpu runs at constant 4.64ghz. My MB bios does not have DOC bs gimick so it was a little tricky to find the right combo of settings. I found that following the youtube video step works … if you don’t have DOC option in bios then just enable TPU II after all other setting are set. It seems like it triggers the same setting as DOC.

Quindor · May 14, 2021, 12:46pm

For anyone running Linux and wanting to firmware update their drive, their boot ISO is horribly broken, won’t even detect SATA SSD’s or give you a working USB keyboard if you are running AMD, check here for a workaround: Firmware update Samsung SSD in Linux - Intermittent Technology

dmurphydrtc · May 14, 2021, 1:37pm

What is the 5950x clock speed set to. Thanks

Xerror · May 14, 2021, 1:37pm

I’ll be sure to update if I get it sorted.
At the moment I’m just keeping an eye on it with bpytop and manually editing my script when a drive is getting full. You have plenty of time to edit, even when a copy is in progress, so you can easily 100% the drives.
My internal rust on the 5800x is getting maxed out, so I have just set up a NAS with a JBOD array which kind of negates the issue for the time being… As it sees all the drives as a single directory.
But if I have time I’ll still try and get a python sub-routine hashed out that should do the job.

Xerror · May 14, 2021, 1:41pm

I’m on a lowly 5800X atm.
I chose not to overclock, as in my initial tests it did not play well with plotting, although I may revisit this.
Letting the processor sort out its own boosting it sits at 4.5-4.6 pretty consistently.
I dont understand all these complaints about the 5800X running hot…
Mine sits at 50C on a noctua air cooler quite happily.

adam2112 · May 14, 2021, 2:47pm

Thank you for the input.

I have another 64gb of RAM I will be installing at some point once I find a good time to shut down the rig. Currently RAM is around 60-65% or so.

I will be adding 2 additional 2tb Sabrent Rocket 4.0 gen 4 SSD drives using a Asus Hyper M.2 PCIe adapter.

I just reduced the parallel plots down to 16 from 20, so this hopefully will reduce any bottlenecks temporarily until I can install the things above.

Thanks again. I will report back with updated performance.

legcramp · May 14, 2021, 4:13pm

I am running the 3900X stock on AMD Prism cooler, 64GB DDR4 3600 CL18.

3.2TB Micron 7300 MAX Raid 0 (2x drives) - 8 concurrent - 5 in phase 1, no stagger.

1.8TB LITEON NVME Raid-0 (2x drives) - 5 concurrent, 3 in phase 1, no stagger

1.8TB Micron 5300 Max SSD - 3 concurrent - 2 in phase one max, no stagger

I counted 38 plots yesterday, so the 40+ count isn’t consistent from the day before. They are finishing around 8-9 hours depending on the drive. I do have 10x 16TB destination drives on this one system though so they are sending the finished plots to different drives locally.

codinghorror · May 14, 2021, 7:08pm

Resource Monitor – it’s very awesome:

Start, Resmon, click disk tab, sort by write column is my rec. I kinda feel sorry for people running Linux, the tooling is so easy to use on Windows. But, joke may be on me if there’s that 10% Windows perf penalty… still haven’t seen conclusive benchmarks, but perhaps.

Quindor · May 14, 2021, 7:51pm

Ah no, in Windows check the disk queue length, anything between 1 and 2 can be considered ok (depends on how many processes are writing to that specific disk) anything above that and your disk subsystem is slowing down whatever is read/writing.

findingmyglasses · May 15, 2021, 1:00am

hey man, ive seen a few of your posts now, just want to say thanks for your input, i can speak for myself but im sure the community agree; your posts are educational!!
ive actually got a intel p5510 and still havent managed to optimise it lol. so i know i’m doing something wrong. a friend gave me the ssd out of a server so that was a blessing, i just feel ive let down the team, not plotting the optimal jobs. ( ID HODL if i won any hahah)

codinghorror · May 15, 2021, 2:06am

Plus remember

On my son’s 5950x which is only doing 8 in parallel, (to 3 target drives, also 980 pro), the numbers are between 20-21k per plot.

It is looking like 14 parallel plots pushes plot times to 32k secs each. That’s probably still more plots being generated all told; CPU usage is still “only” 80% overall.

I think there’s a point of diminishing returns with so many plots at once; I’d be terrified of trying to optimize a 32c/64t machine for this… whoo boy. That’s a lot of I/O endpoints and a lot of potential CPU/RAM crosstalk. I think 14 parallel is as far as I wanna push things, personally.

I also have too many drives in this machine. I think 4 plots per drive is fairly low risk on the 2tb drives, so I could get away with 4 drives instead of 6. I may pull drives out of this machine to put in another 5950x I am building up.

We can math this out, actually? 86,400 seconds in a day, therefore:

21k secs = 4.1 plots/day × 8 parallel = 33 plots/day (!)
27k secs = 3.2 plots/day × 9 parallel = 28.8 plots/day
30k secs = 2.88 plots/day × 12 parallel = 34.6 plots/day
32k secs = 2.7 plots/day × 14 parallel = 37.8 plots/day

There we go! I guarantee you there’s no I/O bottleneck here, unless the 980 pro firmware is an issue. However it does look a bit odd because my son’s rig is doing better with the native I/O on the mobo M.2 so perhaps I misconfigured the M.2 PCI card… and at some point I will update firmware.

(Thing is though, I’m not seeing big plot time differences on the plots that happen on the native M.2 ports on this machine. Drives D and E are mobo M.2 ports.)

I do not see any path to 50 plots/day. Maybe on a 24c/48t machine, or 32c/64t machine…

Harris · May 15, 2021, 2:09pm

This could certainly be the case with Windows, but I think Linux appears to be handling it better. In my case, when increasing the number of concurrent plots my SSDs (4x PCIe 3.0 x4) hit their “limits” before the CPU does. By “limits” I mean the point at which progress noticeably slows down and iowait shoots up.

Hopefully the firmware update and Samsung NVMe driver will improve matters, but like you say there does seem to be a separate issue at play here - something in your BIOS settings perhaps.

Haha - I’ve hit 51 (now 52 after the queue levelled out) plots a day (12 parallel) on my Linux 5950X machine (not overclocked) so I think there’s something that’s being missed here (other than just Linux). I’ll post my setup details later if it helps.

codinghorror · May 15, 2021, 8:23pm

Well done! So it is possible. Just not easy… so you are hitting 50 plots/day with 12 parallel? That’s good info to know.

What are your I/O targets for the 12 parallel plots?

samxsas · May 15, 2021, 9:02pm

Please send us the configuration used.

Voodoo · May 15, 2021, 9:18pm

hmm pretty similar system, I am running the 3900X stock on AMD Prism cooler, 64GB DDR4 3200 CL116, Also have multiple destinations drives.

Big difference seems to be in the ssd’s though. Those micron disk cost a pretty penny but I would recon they are a good choice for plotting. I’'m using 3x WD sn750 1TB atm, they at least work a lot better than the ones I had before.

Also I guess you run on linux, i’m on windows. Seems like you are over assigning threads, when I do that its tends to end in BSOD. Anyway, thanks for the info, I’ll keep trying to bump up the count.

@the rest, sorry for hijacking the topic for a 3900x discussion

Harris · May 15, 2021, 10:32pm

Yeah, so that’s the thing - my targets aren’t all that special:

2x Samsung 970 Pro 1TB M.2 NVMe PCIe 3.0 Gen 3
2x Intel P3600 1.2TB AIC NVMe PCIe 3.0 Gen 3 (7 year old drives, average performance by today’s standards)

These are installed in a B550 motherboard, so 1x 970 Pro and 1x P3600 share just one PCIe 3.0 Gen 3 x4 bus via the chipset. The other 2 each have dedicated x4 lanes to the CPU. I could use my Asus Hyper card so ALL SSDs get their own x4 lanes (or switch to an X570 board), but given my results I don’t feel I need to so that card is going into another machine.

I run Ubuntu Server 21.04 which is supposed to have better support for the latest AMD CPUs, and I connected a video card only to set up the BIOS (enabled XMP for the 64GB Crucial Ballistix 3600MHz RAM) and disabled unwanted things like audio. Then I removed the card and run it headless over SSH.

Each pair of drives are in Btrfs RAID0 with asynchronous TRIM so commands are queued to avoid blocking. I believe XFS discard does this as well, don’t know about Windows.

sudo mkfs.btrfs -f -d raid0 -m raid0 /dev/nvme0n1 /dev/nvme2n1
sudo mount -t btrfs -o ssd,nodatacow,discard=async,space_cache=v2,nobarrier,noatime /dev/nvme0n1 /mnt/pool

I use plotman as below. I settled on 12 parallel plots as the iowait numbers start to rise beyond that (the 1 and 2 in the fourth column of dstat).

directories:
        tmp:
                - /mnt/pool
                - /mnt/pool2
        dst:
                - /mnt/hdd1
scheduling:
        tmpdir_max_jobs: 6
        global_max_jobs: 12
        global_stagger_m: 25
        polling_time_s: 20
plotting:
        k: 32
        e: False
        n_threads: 8
        n_buckets: 128
        job_buffer: 4500

Part way through the day:

+-------+----+-------------+--------------+-------------+-------------+-------------+---------------+
| Slice | n  |   %usort    |   phase 1    |   phase 2   |   phase 3   |   phase 4   |  total time   |
+=======+====+=============+==============+=============+=============+=============+===============+
| x     | 35 | μ=100.0 σ=0 | μ=6.6K σ=106 | μ=5.0K σ=40 | μ=8.3K σ=69 | μ=561.8 σ=9 | μ=20.4K σ=129 |
+-------+----+-------------+--------------+-------------+-------------+-------------+---------------+

Key points

I don’t know if the RAID0 makes a difference as I set it up like this from day one and don’t want to change it now.
I’m using older PCIe 3.0 Gen 3 x4 SSDs (2 of which are bandwidth restricted!) so there’s room for improvement here.
CPU usage averages about 60% and occasionally spikes to 80% so there’s some headroom here, even more with overclocking.
I use 8 threads for my plots simply because I saw this suggested somewhere. Don’t know if it’s optimal. Usually no more than 4 plots are in phase 1 at one time.
Higher RAM speeds appear to be beneficial so overclocking this is an option (see 2666mhz vs 3200mhz RAM Plotting Results - #5 by KoalaBlast)

As there is more headroom available, adding more SSDs to bump up the IOPS will probably get it to the 55-60 daily rate, only at this point will the CPU be fully utilized.