The latest high-end consumer CPU have up to 32 thread with ddr5 ram, and gpu also have 10K cu. The raw performance was there , but the bottleneck is ram capacity. CPU cap at 128G and GPU was 24G max.
So I did a network ram disk design, utilizing thundbolt 4 or 10G ethernet for distribute ram capacity to multiple PC.
The full design and the arc_plot prototype binary was host in github.
If anybody interested in this project, please help me test it and send feedback by github tickets.
Btw, arc_plot also works well by transitional SSD or server ramdisk.
the same topic can be also found @ https://www.reddit.com/r/chia/comments/z2hxdj/proof_of_concept_network_ram_disk_plotting_with/
Does arc plot use gpu to plot? This is important!
If I have a gpu in server with 256G will it work faster than using the network block device?
arc_plot 's goal is a gpu plotter. now still in proof of concept phase, try to figure out what’s the best architecture to implements that. GPU need lots of bandwidth to feed in data, even SSD seems too slow for that. the new plot was designed for consumer level hardware like PC’s , b/c nobody plotting for ever, when we stop , the PC can still used for general daily purpose.
Yes, you definitely can use server too if you already have . lol and if your server running ubuntu 22, you should give arc_plot alpha a try. 256G ram disk on your server will make it fly. lol
Sadly no source on github. Nothing to see here…
just did a benchmark of NRD,
Does that means if using Linux, arc_plot was the fastest one?
Looks like it comes down to hardware, the fastest times still needed a ton of ram (although the pure SSD numbers and 32GB RAM numbers were impressive for what they were IMO!). Short answer, on this system, yes (12.7 minutes in the link).
Question. Why are your “Bladebit 96G cache + SSD” plot times so slow?** Is ur final drive an SSD or HD? Even if a HD they are too slow. My BB+120G +SSD (256B) w/ssd final are 35% less & below your MM 100G+SSD times? My system is roughly 1/2 your spec. Something doesn’t make sense. Your BB Disk times (seemingly) should be somewhere near your arc_plot times, given what you have. Any ideas?
CPU :AMD Ryzen Threadripper 3970X 32-Core
RAM: 32G x 8 ddr4 3200
SSD: Solidigm P41 plus 512G x 2
ubuntu server 22.04
TIME: 1366 Sec
CPU :AMD Ryzen Threadripper Pro 3955WX 16-Core
RAM: 16G x 8 ddr4 3200
SSD: Samsung 980 Pro 1 TB
Windows 10 Pro
TIME (typical) 887 Sec
I see the problem here. My p41 SSD is QLC and yours 980 pro is MLC. the assumption is if ssd out of cache performance well, BB will performance well, but if not , SSD speed slow down from few GB to few hundred MB , BB will also slow down too.
Do you have any QLC to replace 980 pro and retest it to confirm this assumption? I just don’t have 980 pro.
oh , I just realize that your 3955 is WX version, so I got the answer. my thread ripper is 4 channel ddr and yours threadripper pro is 8 channel ddr. plotting is application highly depends on memory performance. so if the cpu performance is same , 8 channel vs 4 channel in theory is 80% higher performance.
so if above assumption is correct, your system running arc_plot expected time should be X / 762=887/1366 , x = around 500 ms.
Couple yrs ago, when QLC came out, I bought a Samsung 2tb 970 QVO because it was cheap. What that was! A flash drive drive in an SSD case w/150MB/sec post-cache write. If ur P41 is like that, I understand lol! I sent mine back. Guess QLC is getting better, but haven’t owned one since.
Do you have a better non-QLC to try?
Interesting about the 4 vs 8 channel. Google shows minor increases - single digit stuff that I found, so not sure about 80% better.
BTW, where does the GPU come into your project? There’s no mention of it that I saw, are better GPUs better or…? Arc_plot network RAM disk concept is something, not sure if extra $$, hdw, & energy is sensible, but it is interesting.
QLC is really good now if everything inside cache, but p41 post cache is like 80MB/sec.
For GPU implementation, my first try on 3090 won’t work well as GPU need even more bandwidth to saturate GPU’s raw power, not only pci-e4 x 16 but also mem used as buffer between cpu<-> gpu. I am still in poc phase. It can be done by just require 256G ram for gpu plotting , but then most people won’t able to use it at all. so , it’s pretty tough challenge if a GPU plotter designed for gaming machine ( in 32G ram range.)
Is that performance related to just the memory bandwidth? Assuming that the memory bandwidth doubles, to expect 80% plotting improvement (almost 2x) would suggest that the plotting process is more or less purely bandwidth bound, and rather there are no calculations done on the CPU.
That would also kind of suggest that when looking at the CPU utilization of a 4 channel CPU, the CPU would be idling for quite a bit waiting for RAM to respond (or could be about 2x slower and still have the same plotting results).
That would also imply that the core count in one line of CPUs really doesn’t matter that much, as the RAM bandwidth is the same (as long as one core can handle the bandwidth, that would be the most price/performance CPU).
If you plot in 256g full ram rig, it’s easy testing then. Set your ram to 1333,2666,3200, then you get the answer.