I was thinking of writing a few lines on my plotting experience with bladebit on a dual Epyc 7513 with 1/2 T of ECC DDR4 configured at 2933 MHz. There is a brief yt video you can watch while one of the fastest K32 plots is achieved on that system.
Bladebit is run with 128 threads, which matches the total amount of hyper threads on that configuration. That server has also 10x 3.2 TB nvme drives installed, configured as 5x raid-0 volumes of 2 drives each (buffer1 through buffer5), but other than that no storage. It has to move the data via 10 GbE network to the actual farmer.
At 5:15 minutes per fully functional K32, and 6:28 minutes (slowest observed) that plotter produces over 220 and up to 280 K32 plots in 24 hours. I am going to use the lower end number for the next explanations so I do not need to keep doing ranges.
220x K32 plots are more or less 22 TiB of plot data that needs to be moved fast enough so the “ex-plotting” local nvme drives, the 5 buffers mentioned earlier, are not running full. To get a daily 22 TiB of newly created plot data from the plotter to the farmer, the average network load needs to be at 267 MiB/s. On my farmer the hard drives are all individually exposed via NFS. If you have mounted all hard drives into the same directory, for example /chia/plots/disk001, /chia/plots/disk002, /chia/plots/disk{nnn}, then the one line added to /etc/exports (/chia/plots 192.168.1.0/16(rw,no_root_squash,async,fsid=0,no_subtree_check,crossmnt
) will do the trick.
One single hard drive cannot sustain storing 267 MiB/s of data, so you need at least 2 - 3 processes that offload plot data from the buffers to the farmer. In the end I came up with a small script that runs in a loop for 5 iterations producing 33 plots into a buffer before it moves on with making the next buffer full while kicking off a background job that moves the plots from the previous buffer to one of the exported farmer hard drives. Turns out that 18 TB hard drives can hold 165x K32 plots, that is 5 x 33 K32 plots…
This amazingly fast plotter, on this same system I managed to produce about 130x K32 using the stock plotter (chiapos) and 170x K32 using mad-max, can only be fully leveraged when you manage to get the newly created plot files fast enough onto the final hard drives, without jamming the plotting process. Bladebit requires about 416 GiB of free RAM to run, it has to first flush the memory buffers to storage before it can create a new plot. While bladebit is operating, it fully loads your system, all cores mostly indicate 100% load, it is then only slowed down by how fast it can store the previously created plot before continuing with a new one.
A closer look to the phases of bladebit and IO activity will actually show, that the process is not only storing the plot to disk at the very end, no, it does so as soon as it can, starting in phase 3, so when it reaches the final point of storing the plot to disk, only another estimated 30 GiB of the final file have to be written, amazing! So it is RAM plotting, but I chose to still use flash for my buffers before moving plots to final HDD storage.
As for testing the newly created plot files, same process as before I started to use the mad-max plotter. I extracted the plot id and memo from a bladebit plot and successfully recreated it using chiapos. Then I ran 20 million check iterations on the plot and found no significant or unexpected differences between a bladebit and chiapos plot. The small differences were discussed in length in a Chia keybase#plotting chat. I think the fair summary is: table entries that exceed the 32 bit address range are dropped, my understanding bladebit and mad-max both do that.
edit: I have meanwhile noticed that using 128 threads on this system of 64 cores / 128 threads, is “trashing” the system to the point where it slowed down network data transfer. Over the days plotting became faster than moving data to the farmer. I am going to restart the plotting process likely with fewer threads, don’t really know how many I need to spare for steady data transfer, I may tray 112 threads next. Meanwhile and since all buffers were full, I suspended the plotting process kill -STOP $(pidof bladebit)
, waited a few hours and resumed the plotting process kill -CONT $(pidof bladebit)
. The theory that bladebit was slowing down the network data transfer was immediately confirmed, as soon as the plotting process was suspended, data transfer rates multiplied 2-3x…