Plotman: powerful CPU, lots of RAM, only 1 NVMe - how can I optimize?

menehune · May 21, 2021, 1:35am

My CPU is a Threadripper 3960X with 24 cores, 48 threads. I have 128 GB of RAM.
For plotting, I use a 2TB Inland Premium NVMe. I’m assuming this is the bottleneck in my system. Can you suggest how I can make the most out of this configuration?
Currently using the default Plotman settings:

tmpdir_stagger_phase_major: 2
tmpdir_stagger_phase_minor: 1
# Optional: default is 1
tmpdir_stagger_phase_limit: 1

# Don't run more than this many jobs at a time on a single temp dir.
tmpdir_max_jobs: 3

# Don't run more than this many jobs at a time in total.
global_max_jobs: 12

# Don't run any jobs (across all temp dirs) more often than this, in minutes.
global_stagger_m: 30

Since I only have 1 plotting drive, but a CPU/RAM that should be able to handle at least 12 plots at a time, would you suggesting upping tmpdir_max_jobs to 12?

Thanks, grateful for any advice!

danarbraz · May 21, 2021, 1:55am

You’re most limited by the single NVMe device. Assuming you are dedicating this system to plotting, I’d start here for your system:

tmpdir_stagger_phase_limit: 2
tmpdir_max_jobs: 4
global_max_jobs: 4
n_threads: 6
job_buffer: 16000

That should get you a ~6.5hr plot time. If that’s the case and the above settings result in little to no io/wait time, bump tmpdir_stagger_phase_limit, tmpdir_max_jobs and global_max_jobs each by 2 until you see unacceptable IO wait times. …AKA your NVMe is totally pegged.

Keep in mind that, no matter what you call it (or folders), in your setup, you have a single tmp dir. So, tmp dir jobs = global jobs. Also, keep in mind that for now, only phase 1 is multi-threaded. So you won’t see 6threads/plot through the duration.

menehune · May 21, 2021, 2:02am

Thank you @danarbraz ! Yes, this system is currently only used for plotting. I’ll give those settings a try. By IO wait times do you mean the “io” column in “plotman interactive”, or are you using something else to determine those?

danarbraz · May 21, 2021, 2:05am

Yes. That’s IO wait. You don’t want that being in the high double-digits in mins consistently . It’s ok to have some IO wait but the more you have, the less efficient which will mean less plots over time.

danarbraz · May 21, 2021, 2:08am

…and also know that your destination matters (the speed or lack of it). The final copy alone can lead to wait times. Totally fine but you need to be able to see the diff between wait time during plotting vs wait time for a completed plot to copy to the destination drive. If you only see wait time increase in stage 4, that’s going to be a result of time to copy to dst.

menehune · May 21, 2021, 2:31am

Final drive is a 12TB spinning platter drive connected locally (SATA). I’ll keep an eye out for those wait times…

menehune · May 21, 2021, 6:41am

Is there a way to tell how long my plots are taking? Other than glancing at my terminal whenever possible… I tried “plotman analyze” but that didn’t seem to show me the total time.