Plotting results on a Ryzen 9 3900X with 128GB DDR4-3200 and an Optane 900P

bambinone · August 13, 2021, 3:58pm

Hi folks,

Despite being interested in Chia late last year, it wasn’t until the introduction of madMAx’s plotter and official pooling that I felt comfortable making a modest investment. So I’m still quite new but wanted to post a few observations.

Fortunately, I already had an X570-based system in my workshop with a U.2 port, 2x2TB 860 EVOs, etc. running Ubuntu 21.04. I upgraded the CPU to a 3900X, which is ideal for madMAx because it’s the least expensive Ryzen CPU with two CCDs—and therefore twice the memory write bandwidth of e.g. a 3700X or 5800X. I also found good deals on a 128GiB kit of DDR4-3200 CL16 and a 280GB Optane 900P.

The 3900X is not overclocked, per se, but I have the usual PBO, Fmax enhancer, etc. enhancements turned on. The 900P is formatted with f2fs and mounted with nodiscard (Optane does not need to be trimmed); I haven’t tried fsync_mode=nobarrier yet but I don’t expect it to make much of a difference. The 860 EVOs are striped to quickly receive and hold the final plots until they can be copied over to a harvester.

Here are my best times (-r 12 -K 2 -u 256 -v 256):

Phase 1 took 653.426 sec
Phase 2 took 374.992 sec
Phase 3 took 336.229 sec, wrote 21877233523 entries to final plot
Phase 4 took 50.7809 sec, final plot size is 108835864169 bytes
Total plot creation time was 1415.47 sec (23.5912 min)
Copy to /chia-plots/plot-k32-<snip>.plot finished, took 100.213 sec, 1035.73 MB/s avg.

Watching the system during plotting, I’ve noticed that there are some periods of time when the CPU is significantly under-utilized (i.e. <50%) and the 900P isn’t busy reading or writing. My assumption is that the tmpfs is the bottleneck during these moments. My next step is to try a mild RAM overclock to 3600 CL18 to try to alleviate this bottleneck somewhat. This should also help speed things up between the CCDs, CCXs, and the I/O die thanks to the higher FCLK.

The other issue I’m dealing with is that any other I/O during plotting absolutely kills performance. Unfortunately, this includes copying the plots over to the harvester. It makes perfect sense… the NICs and SATA controllers hang off the chipset, along with the U.2 port, so they’re all drinking from the same bucket. If the RAM overclock doesn’t help, I might try moving the 900P so it’s directly attached to the CPU, but I use this system for other things and want to keep it workable when I’m not actively plotting.

I also have an X299-based system with a 7960X at my disposal, so if push comes to shove I can move the RAM and the 900P over there to take advantage of its additional memory channels and PCIe lanes. I don’t see a lot of folks plotting on X299 so that might be interesting to try regardless.

If you made it this far, thanks for reading, and please let me know if you have any questions or suggestions!

Mike

legcramp · August 13, 2021, 7:07pm

Good times that’s around what I get with a 5950x, 128GB DDR4 2666 and a 980 pro 512gb with copying etc. 512/256 bucket.

atomsymbol · August 13, 2021, 9:39pm

It might be possible to improve the 23.6 minutes by modifying the madMax plotter to use mmaped I/O for file reads. Modifying madMax to improve its write performance is harder to achieve, in my experience.

With 128 GiB of memory it would be possible to partially or completely avoid using the filesystem in some phases of the madMax plotter.

hawamahal · August 13, 2021, 9:41pm

I get about 22 minutes with a 5950x, 128gb DDR4 2400Mhz and seagate firecuda 520

[P1] Table 1 took 10.8133 sec
[P1] Table 2 took 72.383 sec, found 4294908693 matches
[P1] Table 3 took 96.8459 sec, found 4294826899 matches
[P1] Table 4 took 119.142 sec, found 4294802102 matches
[P1] Table 5 took 116.864 sec, found 4294700701 matches
[P1] Table 6 took 110.398 sec, found 4294338643 matches
[P1] Table 7 took 77.7619 sec, found 4293759234 matches
Phase 1 took 604.214 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 4.63847 sec
[P2] Table 7 rewrite took 15.205 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 25.5051 sec
[P2] Table 6 rewrite took 40.1182 sec, dropped 581322334 entries (13.5369 %)
[P2] Table 5 scan took 25.1944 sec
[P2] Table 5 rewrite took 39.3965 sec, dropped 762053467 entries (17.744 %)
[P2] Table 4 scan took 25.8621 sec
[P2] Table 4 rewrite took 37.8558 sec, dropped 828885464 entries (19.2997 %)
[P2] Table 3 scan took 25.8207 sec
[P2] Table 3 rewrite took 37.5701 sec, dropped 855088924 entries (19.9097 %)
[P2] Table 2 scan took 25.9261 sec
[P2] Table 2 rewrite took 36.6886 sec, dropped 865548773 entries (20.1529 %)
Phase 2 took 350.032 sec
Wrote plot header with 252 bytes
[P3-1] Table 2 took 27.0352 sec, wrote 3429359920 right entries
[P3-2] Table 2 took 19.9361 sec, wrote 3429359920 left entries, 3429359920 final
[P3-1] Table 3 took 31.4857 sec, wrote 3439737975 right entries
[P3-2] Table 3 took 20.4022 sec, wrote 3439737975 left entries, 3439737975 final
[P3-1] Table 4 took 33.2462 sec, wrote 3465916638 right entries
[P3-2] Table 4 took 20.5513 sec, wrote 3465916638 left entries, 3465916638 final
[P3-1] Table 5 took 33.6825 sec, wrote 3532647234 right entries
[P3-2] Table 5 took 21.2035 sec, wrote 3532647234 left entries, 3532647234 final
[P3-1] Table 6 took 35.2327 sec, wrote 3713016309 right entries
[P3-2] Table 6 took 22.2251 sec, wrote 3713016309 left entries, 3713016309 final
[P3-1] Table 7 took 28.436 sec, wrote 4293759234 right entries
[P3-2] Table 7 took 25.4538 sec, wrote 4293759234 left entries, 4293759234 final
Phase 3 took 321.89 sec, wrote 21874437310 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 39.0808 sec, final plot size is 108819155551 bytes
Total plot creation time was 1315.26 sec (21.921 min)

bambinone · August 14, 2021, 12:13am

As suspected, it didn’t affect my plot times whatsoever.

My 3900X struggled to hit 1800/3600 at four ranks per channel so I had to settle for 1766/3533. Here are the results:

Phase 1 improved by ~17s, or about 2.6%
Phase 2 improved by ~7s, or about 1.9%
Phase 3 improved by ~8s, or about 2.4%
Phase 4 worsened by ~1s, or about 1.9%
Total plot creation time improved by ~31s, or about 2.2%

While the theoretical bandwidth increased by about 10.4%, memory access latency also increased by about 1.9%, so it was basically a wash. That’s a little disappointing. I’m definitely going to try plotting on the 7960X to see how literally doubling the memory bandwidth affects performance.

Well, I think the overclock helped a little… with copying, etc. I’m at ~24.8 minutes per plot. Still not great.

bambinone · August 14, 2021, 2:45am

I think those RAM speeds are really holding you back!

hawamahal · August 14, 2021, 6:23am

I know but it is a compromise, as I will run out of storage very soon, so speed does not matter that much.

jjs · August 15, 2021, 4:26am

i get a little less with a similar setup. Your times are good.

legcramp · August 15, 2021, 7:23pm

Yup pretty much this and I already had 128GB of DDR4-2666 on hand so no point in upgrading to shave off a few minutes.

dctech · August 19, 2021, 9:27pm

FYI, getting 20min (+/- few seconds) on my 5950X, 3600Mhz (Crucial Balistx), 2TB Sabrent TLC Nvme 3.0 in Pop! OS (Ubuntu 21.04) vs 27min in Win 10

antbot · September 5, 2022, 11:56pm

I wonder what is the impact of the 900P vs ramdisk

Have you tried without ramdisk ? (or ramdisk with a cheaper M2 alternative?)

(I have a 3900x and my 980 pro M2 is about to die; should I replace it with a 900P or buy ram and another “cheap” 980 pro?)

xkredr59 · September 6, 2022, 6:56am

How can you tell? My Firecuda 510 is in the 90% TBW range now and I’m wondering what signs I should look for? Will it slow down or can plots turn out to be corrupted?

antbot · September 6, 2022, 10:01am

I know by experience

my last M2 was showing 100 % TBW (cristaldiskinfo) and then it took a couple weeks to die (that was after I noticed the 100% wear, so I don’t know when it reached the 100 %). To make matters worst I had reused it to install a win10 on a pc (that was stupid) and I had to do a fresh install with a new drive.

Now for example, it is alrady at 100 % TBW and after plotting approx 100 plots I have to stop plotting for 2h or so in order to cool it down, since plotting times increase a bit (one minute or so)

xkredr59 · September 6, 2022, 11:58am

Thanks! I’ll keep an eye out for issues with plotting and won’t use it as an OS drive;-)

Bones · September 7, 2022, 12:38pm

From all the reports on the forum i can safely say, many of us have far exceeded m2 expected lifespan ( tbw ) with no detrimental effects on plotting.
Certainly wouldnt use mine to store data i valued though, or indeed mount an OS on it and use it.

antbot · September 7, 2022, 5:17pm

I agree, M2 do exceed life expectancy. Here is a prove of the one I’m using at the moment (while plotting):

I’ll post when it dies

Bones · September 7, 2022, 5:23pm

See here for more info.

antbot · September 8, 2022, 11:28am

I destroyed a 1TB Scan Disk (I bought it new) plotting around 70-100 TB on it (not sure about the exact amount of plots since I used it at the beginning of my Chia adventure last year)
It died a few months later on a pc that had win10 installed on it. Can’t remember the TWB before stopping plotting on it but I’m sure that it passed the warranty limit (I remember the scandiskinfo red color).

bambinone · October 6, 2022, 3:30am

I added a few new disks to my farm just for kicks, so here are some updates.

I’m plotting on two machines right now. The first machine is a 5900X with 64GiB DDR4-3733 CL16 and an Optane 905P 960GB. I’m using BladeBit Disk with -b 64 -a --cache 35G, writing out to an SSD-backed NFSv4 share exported via 10GbE from my harvester. Each plot takes about 24 minutes, and because I’m writing directly to the harvester that includes the network copy. Hopefully that answers @antbot’s question about how it runs without the RAMdisk, although the hardware is upgraded from my original post.

The second machine is a 5950X with 128GiB DDR4-3733 CL18 and an SK hynix Platinum P41 2TB. I’m using madMAx with -r 16 -K 2 -D and a 110GiB RAMdisk, writing out to an SSD-backed NFSv4 share exported via 10GbE from my harvester. Each plot takes about 19.5 minutes, and because I’m writing directly to the harvester that includes the network copy.

I have a script running on my harvester to distribute the incoming plots landing on the fast share to the disks with available space.

Other notes:

Ubuntu 22.04
Both tmpdir SSDs are connected via CPU lanes, not through the PCH
Both tmpdirs are formatted XFS with -m crc=0 and mounted with noatime,discard
The processors in both plotters have PBO2 enabled and have gone through a rigorous curve optimization
The RAM kits in both plotters have been overclocked from DDR4-3200 CL16
I use my plotting machines for lots of other things, so that’s why they’re (relatively) over-built and tuned up
I haven’t really played around too much with buckets, etc… my times seem good compared to what I’ve seen from others and I should be done plotting in a few days

I definitely think BladeBit Disk is worth exploring if you’re plotting in Linux and have less than 128GiB RAM. The other transparent read-write caching options are not great. madMAx is still better than BladeBit Disk if you have 128GiB RAM (by about 2.5 minutes per plot or nine plots per day on my hardware). Optane is probably overkill but I have it so I’m going to use it.

Let me know if you have any questions!

Nathan15038 · February 25, 2024, 10:24am

Wait, wait wait I am very late to this but you’re telling me that it’s not dead yet. It is at 0% with (2.5 PB almost 2.6 PB) of data written and it’s still running strong. Man, that is definitely a beast. I mean, I for sure haven’t even gotten nowhere close to that much bites written on my drive and probably won’t hit that anytime soon. I am now very confident in my drives, especially because I’m kind of using it as my main drive too.