Bladebit queston: how to archive plots to HDDs without interrupting plotting?

fish2010 · December 24, 2021, 11:14pm

I started to use Bladebit recently and I can do 1 plot around 12 mins, but disk copy takes another 10 mins. The system have 36 Xeon cores and 448GB memory. I am getting this message:

Running Phase 1
Generating F1…
Finished F1 generation in 2.23 seconds.
Sorting F1…
Finished F1 sort in 19.22 seconds.
Forward propagating to table 2…

Pairing L/R groups…*
Finished pairing L/R groups in 11.7890 seconds. Created 4294907317 pairs.*
Average of 236.1373 pairs per group.*
Computing Fx…*
Finished computing Fx in 13.0030 seconds.*
Waiting for last plot to finish being written to disk…*

Ideally I would like the copy to be done in the background and never stall plotting. It seems that a 108GB buffer is required, which could be either memory or nvme SSDs.

So if I upgrade the memory up to 576GB, and plot directly to HDDs. Will the plotting be stalled? I have not see any testing done on this.

Another approach can be using nvme buffer which then archives plots to multiple HDDs simultaneously. Maybe Plotman can do the trick or I may need to write a custom rsync script.

I guess I can also use 8~1215K/10K SAS drive in raid0 as a buffer. The transfer speed won’t be as good as NVME. But it should be just good enough, and not too resource-intensive to interfere with plotting.

Besides the above, any other ideas?

enderTown · December 25, 2021, 6:18am

This is definitely tricky - it would be a lot nicer if Bladebit would play nicely with network shares, but I’ve not gotten it to write directly to shares as a destination without random I/O errors. Here’s my workflow:

Bladebit creates plot and writes to local SSD (old Intel 400gb SLC DC SSD - ~400MB/s write speed). This takes a little over 15m total.
As soon as plot is finished, my script moves that plot file to a network share named “Drop.” This share is a RAID 0 array so very fast. Each server has 2 10gb NICs so network speed is not an issue. This transfer takes 5-8m depending on traffic. The important thing is that it is faster than the 15m plot or else plots will start to get backed up. If I tried to write directly to the destination drive, it would take too long because I’d be limited by the destination drive’s transfer rate.
A script on the server sees new plots coming into the RAID 0 “Drop” share and then moves the plot to the appropriate destination farming disk. This script is multi-threaded, so it can copy multiple plots to multiple disks at once - otherwise it would eventually fall behind too. By using this RAID 0 “drop” buffer, I can ingest plots faster and then I have a little more time to get them out onto the destination disks because I can write to several destination disks at once from this buffer.

I’m able to keep 4 BladeBit plotters going this way.

Hope that helps!

fish2010 · December 28, 2021, 3:15am

Thanks for your detailed approach!

Apparently your solution is a lot more complex than what I have in mind. Are you trying to offload the plots to another machine as quick as possible so that the BladeBit can re-gain full system resources?

Today, I tried out the approach using 512GB nvme to archive plots to multiple HDDs simultaneously with a custom-written script. The results are good as expected so far, though stability still remains to be tested.

So I have 2 archive disks set up to receive plots from the nvme drives in turn. Each copy process run simultaneously and takes about 20-25 min to complete. And my plotting time has increased from 12.5min to 15.5min. I am not sure why disk copy can interfere with Bladebit, but this is probably the fastest and stable time I can get.

One of my goal of upgrading to Bladebit is to stop burning nvmes, but it is annoying that it is still required to buffer the plots…