Plotman: Let's share settings

tl;dr post your rig (CPU, # or cores, RAM, temp drive storage size) and your Plotman settings to compare.


I just started using Plotman last night, and while I got it to work, I’m not entirely sure I’m using it to its full potential. Most of the settings in plotman.config I understood…except for the scheduling part.

schedule:
  tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 1

What? I have a vague inkling this has to do with the stage of plotting, but the Plotman docs don’t explain it well.

Anyway, after blundering for an hour setting things up, I settled on these scheduling settings:

schedule:
  tmpdir_stagger_phase_major: 3
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 3

Other settings:

  • global_max_jobs: 6 (6 because I have 6 cores on my CPU)
  • global_stagger_m: 10

With my 6-core CPU, two 2 TB nvme temp drives, and 44 GB RAM, I ended up with 3 plots running on each temp drive 10 min apart. Seems to be working well, the first plot should be done soon.

PROTIP Use Space Not Tabs - when editing plotman.yaml I used tabs and that caused an error. Use 4 spaces instead of a tab. You also now know where I fall on this debate :laughing:

Update My first 6 plots finished and I checked plotman interactive and now there’s only one plot job running. These settings don’t do the desired “plotting 6 plots simultaneously non-stop until drive is full.”

1 Like

ı’m still struggling to understand below part. do you have any detailed explanation for that part? i think it is waiting for phases to start new plot in same tmp dir. but still confused. any detailed explanation would be great!

schedule:
  tmpdir_stagger_phase_major: 3
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 3

I drafted one here: [Documentation] The tmpdir_stagger_phase_ options are very hard to understand · Issue #151 · ericaltendorf/plotman · GitHub

I don’t suggest using major: 3 and minor:1 (3:1) because this means you are limiting the plots that are running in phase 1 and 2, generally speaking people want to limit the plots that are running in phase 1 (so the setting would be 2:1)

5 Likes

oh thank you! i will check tonight.

Thanks! I will read this.

I don’t mess with the tempdir stagger settings. I’m using an ML350 Gen8 with 2x 1.6TB datacenter SSD PCI and 8x900gb sas drives.
I found the disks were causing RAM issues with caching so now I just use 4 folders on each NVME SSD and make each of those 8 my temp drives alternating in order in my config file.
I then set up an hour between stagger and 8 global jobs. I had a bad ram slot drop me to 112GB ram, so 3400 each is enough with overhead, but HTOP keeps showing my yellow ram bar fully using RAM as cache.
I get crashes sometimes, and if I start a plot manually plotman interactive breaks because it can’t find the logs for it and crashes out.
Still pretty great, I don’t have to baby it too hard.

I used the example settings in that Github discussion but it’s still not doing what I want.

How do I set it so that I’m continuously plotting 6 plots at any given point in time? Right now it completes each plot until there’s just one and then starts doing them in parallel again up to the max of 6 I set. It’s very unclear :sweat:

would you share your config file? i can take a look.

and, what do you mean by “given point in time”? if you can write your desired plotting scenario, we can find a solution together.

Maybe I understand this…maybe I don’t. Here’s what I think is going on:

When plotting, you go through 4 major phases. Each major phase has minor phases. You can see some more detail about what these phases do by searching for “phase” here and by checking the Chia consensus doc for the technical explanation. Knowing what the minor phases isn’t important for what I’m talking about.

These are the default phase settings for plotman in plotman.yaml:

        tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 1

Here’s the explanation of those settings in the plotman.yaml:

        # Run a job on a particular temp dir only if the number of existing jobs
        # before tmpdir_stagger_phase_major tmpdir_stagger_phase_minor
        # is less than tmpdir_stagger_phase_limit.
        # Phase major corresponds to the plot phase, phase minor corresponds to
        # the table or table pair in sequence, phase limit corresponds to
        # the number of plots allowed before [phase major, phase minor]

I think the key thing is this:

phase limit corresponds to the number of plots allowed before [phase major, phase minor]

Using the default settings as an example:
tmpdir_stagger_phase_limit → 1
[tmpdir_stagger_phase_major, tmpdir_stagger_phase_minor] → [2, 1]

What these settings mean, is another plot won’t start on your temp drive until you reach phase [2, 1] for all the currently running plot(s).

This implies, if you set global_stagger_m to, for example, 30 minutes, but you don’t reach phase [2, 1] until 75 minutes, you won’t start a new job in 30 min, you’ll start a new job 75 minutes after the first job. ( ← I think I’m correct on this because as I’m writing this, I’m running with the stagger set to 15min, but I’m still in phase [1:2] and no new plot has started)

Can anyone double-check my understanding?

2 Likes

I think you have it sorted out. To reiterate:

  tmpdir_stagger_phase_major: 2
tmpdir_stagger_phase_minor: 1
# Optional: default is 1
tmpdir_stagger_phase_limit: 1

This would not allow a new plot to start until the total number of plots before phase 2:1 is <1 (the stagger_phase_limit). If you wanted to allow 6 plots simultaneously before 2:1 (ie 6 in phase 1), you would change to tmpdir_stagger_phase_limit: 6

It sounds like you want 6 plots simultaneously, but not necessarily before phase 2:1. The simultaneous plot count is governed by:

# Don't run more than this many jobs at a time in total.
global_max_jobs: 6
1 Like

Thank you! You made me realize I didn’t change the global_max_jobs in my config just now.

Something doesn’t seem right. I have 2 nvmes, temp00 and temp01. Because of my CPU limitations, I want to run 3 plots on each nvme.

I used this config:

tmpdir_stagger_phase_major: 2
tmpdir_stagger_phase_minor: 1
# Optional: default is 1
tmpdir_stagger_phase_limit: 6

with a max global of 6, but right now it only started 1 job on temp00 and no jobs on temp01. I thought it would start one job on each temp drive at the same time.

The global stagger still applies, so with the default setting you would get 1 plot on temp00 at t=0. A second plot would be added (to temp01) at t=30. Every 30 minutes a new plot would be added until you have 6 plots total.

Relevant lines from the default are copied below

# Don't run any jobs (across all temp dirs) more often than this, in minutes.
    global_stagger_m: 30

# Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 3

Because you want 3 on each temp drive you can comment out the tmp_overrides section

Oh duh. Of course, the global stagger is the culprit. I’ll just decrease that.

So I could just set the global stagger to zero minutes and two jobs would start at once, one on each nvme, and then with the phase limit of 1, the 2nd plot on each nvme won’t start until [2:1] is reached.

Is it a bad idea to set the global stagger to zero?

1 Like

I haven’t tested a global stagger of 0, but I would guess it works as you are describing.

I’m going to do a global stagger of 15 min. It would be slow writing two 100+ GB files onto a HDD at the same time. Thanks for the help :grinning_face_with_smiling_eyes:

@chianudist, I’m at a loss here…I just checked and there are only 2 plots running. One on temp00 one on *temp01. It’s been 90 min since since it started plotting on temp00 and it started plotting on temp01 a few minutes ago.

Here are my settings, using two temp drives:

  tmpdir_stagger_phase_major: 2
        tmpdir_stagger_phase_minor: 1
        # Optional: default is 1
        tmpdir_stagger_phase_limit: 8

        # Don't run more than this many jobs at a time on a single temp dir.
        tmpdir_max_jobs: 4

        # Don't run more than this many jobs at a time in total.
        global_max_jobs: 8

        # Don't run any jobs (across all temp dirs) more often than this.
        global_stagger_m: 30

        # How often the daemon wakes to consider starting a new plot job
        polling_time_s: 20

Based on my understanding this should work… :confused:

That can be a big issue! I learned that the hard way. Now I try to avoid having multiple finished plots writing to the same drive. It was painful watching 5 completed plots transfer at 100 bytes /sec.

1 Like

Are you leaving “plotman interactive” running ? With a stagger of 30 you would have to wait at least an hour to see a third plot start

Does closing plotman interactive afffect anything? I only open it periodically to check what’s going on.

Right, it was over an hour, 90 min, and only 2 plots though, both in phase 1. I was expecting 3 plots already at 90 min+.