In May this year I started building the Ultimate Single Socket (x86) Chia Plotter.
Here is the specification:
AMD Ryzen Threadripper PRO 3995WX
ASUS Pro WS WRX80E-SAGE SE WIFI
8 x 32GB Corsair Vengeance LPX Black DDR4 DRAM 3200MHz C16 (non ECC)
18 x Corsair MP600 Gen4 PCIe SSD (only 10 in use - AMD RAID limitation)
2 x ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card
EVGA SuperNOVA 1600 T2
At the beginning, I would like to mention that MadMax or BladeBit plotter was not available in May , I used Swar Chia Plot Manager and I was creating 40 plots in parallel. This generated 77 plots per day or 1 plot every 18.7 minutes (CPU load was between 30-65% and temperatures between 60-77’C). Unfortunately, plotting 40 in parallel was not an ideal solution.
The motherboard did not support AMD DOCP DRAM before BIOS version 0504 - that was a nightmare.
After shutting down the computer, it cannot be turned back on - you need to reset the BIOS and set everything up again - seriously ASUS? I reported it twice and didn’t get any reply from ASUS technical support.
I have 18 x SSD NVMe and can only use 10? This because you must use PCIe RAID mode for PCIe to detect all 4 SSDs in PCIe adapters. But the computer will not boot if it detects more than 10 NVMe SSDs because AMD RAID does not support that number of drives. By switching the PCIe RAID mode to 8x/8x you can bypass this limitation, but PCIe adapters will only detect 2 SSDs. Theoretically, it is possible to mount 7 x PCIe M.2 adapters with 2 x SSDs each (in 8x/8m mode) + 3 x SSDs in built-in M.2 ports = total 17 SSDs. But this guy uses the same motherboard and is able to boot with 23 x NVMe drives - how does he do that ??
I have switched to MadMax in parallel.
1 plot = 28 min
2 plots = ~35 min each = ~17.5 min / plot (+25% longer)
4 plots = I need 512GB of RAM but I guess it will ~45 min each = ~11 min / plot
Talk about overkill but 11 min is BladeBit-like …that all must have cost a pretty penny!
Couple questions… is Primo Ramdisk Server 6.5.0 actually x2-x3 faster than any other RAM disk you tried? That’s truly amazing if so!! And how do you attach these external SSDs to the motherboard? I see your ext. heatsink (awesome), but what is the device below it where all wires go? Last, for Madmax and 4 plots parallel, why do you need 512GB RAM? For me, with a 3955RX and two parallel I never see more than ~15GB used?
Now - heehee – what next to go faster still (in Windows)?
I also tested different RAM disk software on my system: ImDisk, OSFMount, Primo. The latter shows best performance in synthetic benchmark, but ImDisk provided best times in real world chia plotting.
3995WX has 64 cores… That’s insane. But you don’t have enough RAM to really saturate it and make use of MadMax, unfortunately.
I would try just 2 MadMax processes with 2 RAM disks, allocate affinity of half of the cores to each. You won’t even need more than two NVME for this setup. You should try running 512 buckets on phase 1-2 and 256 on phase 3-4, with 32 threads per each process (if running out of RAM, reduce it to 16 or so).
Some tips for this setup:
MadMax documentation says it needs 110 GB RAM disk, but in reality you can make 107 GB and it will work
Use Ubuntu WSL v1 (windows system for linux virtual machine) and you will get +15% raw performance
My setup is Ryzen 5900x + 128 GB DDR4 3600 CL16 and I run only 1 MadMax process with 12 threads, and getting plot times consistently sub-26 minutes.
If you are on Win with MadMax, even with emulated Linux, you get 15-30% performance hit from my observation. I’m on 5950X, 128Gb 3600 Balistix, and 1x2TB NVMe 3.0 on Dark Hero which does ~20min per plot under Pop! OS 21.04 (Ubuntu 21.04). Used to do SWAR on multiple NVMes and I’m glad MadMax came along, huge thanks to dev as it saves the life of my NVMe!
Howdy! I’ve been working on the 3995x for awhile now. Given your setup with a few minor tweaks you could get 4x plots parallel on madmax around 31min/37min per set. PM me if you wish too go over my build and compare notes!
For a high end plotter forget nvme, lots of money that are going to waste, in my setup I have a HPE DL-580 G8 with 1TB of DDR3 to act as ramdisk for the temp1 folder, the same motherboard as you, the Asus Sage and a TR Pro 3975 and 256GB (fully manual tuned 3200 ram), the temp2 resides on local ramdisk, between the ramdisk and the plotter I have a direct infiniband connection of 100Gbit, I can plot in 18minutes, I was about to try 2 plots simultaneously but I ran out of space, soon I will be expanding the farm and do more tests but withjj your CPU and 512 you can probably do 4 at a time for most likely 20 min each, giving a beastly output, and most important you can plot an infinite number of plot for less money than all that nvmes
I appreciate the fact that you built this system before madMAx/BladeBit, but it might be time to liquidate and build some cheaper plotters. I’m getting 76 PPD with my bargain-bin X299 system: i9-7960X, 128GB DDR4-3200, Optane 900P 280GB. Those four parts including the motherboard cost about $1500 after tax and shipping. You could build two similar systems complete with PSUs, cheap GPUs, etc. for less than the cost of the 3995WX.
185 PPD still in testing. 5900x build is less than 1500 and easily outperforms 7960, has 4.0 and many other fun upgrades. Point is that optimizing and applying more than just chia itself to the system is the goal. Theres literally nothing you cant do with a nice 39xx build. Price is high but take into account the extensive list of items you will not have to get being you have 128 lanes at x4.0 to add stuff to… networking, jbod, plotter, forks farms, etc all in one package. You’d be lucky to do 10% of what these systems are capable of with whats currently on the market.
My board and CPU cost about $700, which is more-or-less what you’d pay for a half-decent X570 board and 5900X. X299 has 44 or 48 usable PCIe Gen 3.0 lanes and quad-channel memory, X570 has 20 usable PCIe Gen 4.0 lanes and dual-channel memory. I think a Ryzen 5000-series system would make a great plotter and general-use system when you’re done plotting. (I’d have gone that route if I wasn’t doing VFIO.)
To be absolutely clear, I am not saying that Skylake-X is just as good or even comparable to Threadripper (Pro) 3000-series. The 7960X isn’t even in the same galaxy as the 3995WX. My point is that the 7960X (or the 5900X) is about 1/10th the price of the 3995WX, and you can run madMAx with 128GB DDR4 and one NVMe. Or you can throw together a cheap C602 or C612 dual-socket system and run BladeBit with 416+GB DDR3 or DDR4. If you’re building a system for a finite amount of plotting and don’t have other use-cases in mind, the cost per plot of the 3995WX is way too high.
I get this. The tinkering is most of the fun. And the learning opportunities are golden.
I get this, too. I built my X299 workstation with other use-cases in mind, it just happened to be a good fit for madMAx. And it was fun to overclock.
I think you could just as well do all that stuff with 44 PCIe Gen 3.0 lanes, and again, at a fraction of the cost. (As long as you’re not running a whole bunch of NVMe drives, which is no longer necessary for plotting.) But maybe I’m not thinking big enough.
At some point it just comes down to budget and preference. If you can afford a $5K CPU and you’ll get lots of use out of it when you’re done plotting, go nuts. I just hate the perception that you have to spend that much to start plotting. I went with a $400 CPU and bought more hard drives with the $4,600 I saved. To each his or her own…
I understand what your saying. I originally went with the 3955x over other AMD/Intel builds based on the TDP and expansion. Now that i climbed into the 95, xch is childs play to what you can actually accomplish with it. I wouldnt say go out and buy high end 64c cpus just to plot. You can get a k32 a day on a RPi lmao. 20min plots on 2697 v3 headless with 128g+ ddr4 ram is well under 2k.
You both are rational & pretty spot on re: choices between systems. I might add that I found the 3955x underwhelming using the authentic chia plot GUI in Windows. I went from 11 PPD on a Ryzen 3600 to only 30 PPD on the TR 3955 pro, even with 144GB memory. It made more plots possible, but not so very impressive considering the TR was >$2K. Plus too much effort to manage getting even that.
But I’ve grown very fond of the 3955RX now w/Madmax @70 PPD. TR pro is a workhorse, and is stable as a boulder. MM barely used 16GB, so I knocked it down to 64GB. I can punch out chia drives easily, just leave the thing for days on end, plugging away. With its built in 10Gbps network card, I connect to my 3600 farmer no issues (at 2.5Gbps).
The last thing is stealth, it looks like an HP desktop from Staples, low key power, without all the hardware mess and noise. I’m all for that, as is the wife
Seriously: I just joined the forum and read weird consumer grade posts?
Nice you checked benchmark bandwidth. Did you check IOPS and latency during operations for the nvmes? Did you compare ram disk latency?
ASUS boards are still consumer boards for WX.
Seriously: non-ECC RAM in a memory heavy application on crypto keys?
AMD RAID is one of the worst ideas possible: performance, reliability, migration,… Broadcom Raid controller, sw-raid are the way to go, but then with enterprise grade nvmes. Under Linux mdadm or even better a decent sw-raid outperforms this setup without questions asked - by leaps and bounds. Decent sw-raid does 30GB/s read and 15GB/s write on 4x enterprise grade PCIe 4.0 nvmes at ~150 us latency, up to ~400 us (4x Kioxia CM6-V 3.2TB or Samsung PM1735 3.2 TB - the Samsung is bad under sustained load, so the Kioxia is way better). Take 4x Intel P5800X and you get down to ~14 - 130 us with 30GB/s read and write. This “ultimate system” - except the CPU - is a piece of consumer sh…, errr crapola.
Is Windows really the ideal platform to mine? Maximum system performance? Heavily tuned and optimized with optimized code Windows users are lucky to achieve 92% of comparable Linux performance. But thats only for real Windows experts setting up the system and there are no tuning guides for that performance in some weird howtos over the Internet…
To put things in perspective: 24C/48T Epyc Milan under Linux does app. 100 PPD with 4 plot nvmes on classic Chia-Gui at 36 - 44 plots parallel. Considering TR Pro specs, that 64C TR Pro CPU can easily do 200+ PPD, probably even 300 PPD. As told above: with 4 nvme drives you can achieve way higher performance than with your RAM disks. This “Ultimate Single Socket Chia Plotter” on Windows is like beating a dead horse to win the Olympics - not gonna happen.
Best option: maybe madmax plotter with 512GB RAM, running 2 plots fully in ramdisk for tmp and tmp2 may save some of your investments. Still you can put 14 of your 18 nvmes on hold anyway.
So in all due respect, sorry to burst your bubble: besides the CPU I consider your “ultimate build” a fan project - far away from performance reality. I suggest to add some LED stripes.
PS-Edit: All the above numbers are valid as of August 2021.
Seriously: no respect - at all. I’ll suggest you put your holier than thou ‘expertise’ on the shelf troll. When you have something created in this, our reality, to show for yourself, you might be slightly worth listening to, but I’ll say it’s not looking good.