Plotting results on a Ryzen 9 3900X with 128GB DDR4-3200 and an Optane 900P

Hi folks,

Despite being interested in Chia late last year, it wasn’t until the introduction of madMAx’s plotter and official pooling that I felt comfortable making a modest investment. So I’m still quite new but wanted to post a few observations.

Fortunately, I already had an X570-based system in my workshop with a U.2 port, 2x2TB 860 EVOs, etc. running Ubuntu 21.04. I upgraded the CPU to a 3900X, which is ideal for madMAx because it’s the least expensive Ryzen CPU with two CCDs—and therefore twice the memory write bandwidth of e.g. a 3700X or 5800X. I also found good deals on a 128GiB kit of DDR4-3200 CL16 and a 280GB Optane 900P.

The 3900X is not overclocked, per se, but I have the usual PBO, Fmax enhancer, etc. enhancements turned on. The 900P is formatted with f2fs and mounted with nodiscard (Optane does not need to be trimmed); I haven’t tried fsync_mode=nobarrier yet but I don’t expect it to make much of a difference. The 860 EVOs are striped to quickly receive and hold the final plots until they can be copied over to a harvester.

Here are my best times (-r 12 -K 2 -u 256 -v 256):

  • Phase 1 took 653.426 sec
  • Phase 2 took 374.992 sec
  • Phase 3 took 336.229 sec, wrote 21877233523 entries to final plot
  • Phase 4 took 50.7809 sec, final plot size is 108835864169 bytes
  • Total plot creation time was 1415.47 sec (23.5912 min)
  • Copy to /chia-plots/plot-k32-<snip>.plot finished, took 100.213 sec, 1035.73 MB/s avg.

Watching the system during plotting, I’ve noticed that there are some periods of time when the CPU is significantly under-utilized (i.e. <50%) and the 900P isn’t busy reading or writing. My assumption is that the tmpfs is the bottleneck during these moments. My next step is to try a mild RAM overclock to 3600 CL18 to try to alleviate this bottleneck somewhat. This should also help speed things up between the CCDs, CCXs, and the I/O die thanks to the higher FCLK.

The other issue I’m dealing with is that any other I/O during plotting absolutely kills performance. Unfortunately, this includes copying the plots over to the harvester. It makes perfect sense… the NICs and SATA controllers hang off the chipset, along with the U.2 port, so they’re all drinking from the same bucket. If the RAM overclock doesn’t help, I might try moving the 900P so it’s directly attached to the CPU, but I use this system for other things and want to keep it workable when I’m not actively plotting.

I also have an X299-based system with a 7960X at my disposal, so if push comes to shove I can move the RAM and the 900P over there to take advantage of its additional memory channels and PCIe lanes. I don’t see a lot of folks plotting on X299 so that might be interesting to try regardless.

If you made it this far, thanks for reading, and please let me know if you have any questions or suggestions!

Mike

2 Likes

Good times that’s around what I get with a 5950x, 128GB DDR4 2666 and a 980 pro 512gb with copying etc. 512/256 bucket.

It might be possible to improve the 23.6 minutes by modifying the madMax plotter to use mmaped I/O for file reads. Modifying madMax to improve its write performance is harder to achieve, in my experience.

With 128 GiB of memory it would be possible to partially or completely avoid using the filesystem in some phases of the madMax plotter.

I get about 22 minutes with a 5950x, 128gb DDR4 2400Mhz and seagate firecuda 520

[P1] Table 1 took 10.8133 sec
[P1] Table 2 took 72.383 sec, found 4294908693 matches
[P1] Table 3 took 96.8459 sec, found 4294826899 matches
[P1] Table 4 took 119.142 sec, found 4294802102 matches
[P1] Table 5 took 116.864 sec, found 4294700701 matches
[P1] Table 6 took 110.398 sec, found 4294338643 matches
[P1] Table 7 took 77.7619 sec, found 4293759234 matches
Phase 1 took 604.214 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 4.63847 sec
[P2] Table 7 rewrite took 15.205 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 25.5051 sec
[P2] Table 6 rewrite took 40.1182 sec, dropped 581322334 entries (13.5369 %)
[P2] Table 5 scan took 25.1944 sec
[P2] Table 5 rewrite took 39.3965 sec, dropped 762053467 entries (17.744 %)
[P2] Table 4 scan took 25.8621 sec
[P2] Table 4 rewrite took 37.8558 sec, dropped 828885464 entries (19.2997 %)
[P2] Table 3 scan took 25.8207 sec
[P2] Table 3 rewrite took 37.5701 sec, dropped 855088924 entries (19.9097 %)
[P2] Table 2 scan took 25.9261 sec
[P2] Table 2 rewrite took 36.6886 sec, dropped 865548773 entries (20.1529 %)
Phase 2 took 350.032 sec
Wrote plot header with 252 bytes
[P3-1] Table 2 took 27.0352 sec, wrote 3429359920 right entries
[P3-2] Table 2 took 19.9361 sec, wrote 3429359920 left entries, 3429359920 final
[P3-1] Table 3 took 31.4857 sec, wrote 3439737975 right entries
[P3-2] Table 3 took 20.4022 sec, wrote 3439737975 left entries, 3439737975 final
[P3-1] Table 4 took 33.2462 sec, wrote 3465916638 right entries
[P3-2] Table 4 took 20.5513 sec, wrote 3465916638 left entries, 3465916638 final
[P3-1] Table 5 took 33.6825 sec, wrote 3532647234 right entries
[P3-2] Table 5 took 21.2035 sec, wrote 3532647234 left entries, 3532647234 final
[P3-1] Table 6 took 35.2327 sec, wrote 3713016309 right entries
[P3-2] Table 6 took 22.2251 sec, wrote 3713016309 left entries, 3713016309 final
[P3-1] Table 7 took 28.436 sec, wrote 4293759234 right entries
[P3-2] Table 7 took 25.4538 sec, wrote 4293759234 left entries, 4293759234 final
Phase 3 took 321.89 sec, wrote 21874437310 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 39.0808 sec, final plot size is 108819155551 bytes
Total plot creation time was 1315.26 sec (21.921 min)

As suspected, it didn’t affect my plot times whatsoever.

My 3900X struggled to hit 1800/3600 at four ranks per channel so I had to settle for 1766/3533. Here are the results:

  • Phase 1 improved by ~17s, or about 2.6%
  • Phase 2 improved by ~7s, or about 1.9%
  • Phase 3 improved by ~8s, or about 2.4%
  • Phase 4 worsened by ~1s, or about 1.9%
  • Total plot creation time improved by ~31s, or about 2.2%

While the theoretical bandwidth increased by about 10.4%, memory access latency also increased by about 1.9%, so it was basically a wash. That’s a little disappointing. I’m definitely going to try plotting on the 7960X to see how literally doubling the memory bandwidth affects performance.

Well, I think the overclock helped a little… with copying, etc. I’m at ~24.8 minutes per plot. Still not great.

I think those RAM speeds are really holding you back!

1 Like

I know but it is a compromise, as I will run out of storage very soon, so speed does not matter that much.

2 Likes

i get a little less with a similar setup. Your times are good.

Yup pretty much this and I already had 128GB of DDR4-2666 on hand so no point in upgrading to shave off a few minutes.

FYI, getting 20min (+/- few seconds) on my 5950X, 3600Mhz (Crucial Balistx), 2TB Sabrent TLC Nvme 3.0 in Pop! OS (Ubuntu 21.04) vs 27min in Win 10