Bladebit disk plotting performance

Chelle · November 3, 2022, 12:02pm

On dual E5-2680 v2 with 128gb ram and t1 = 800gb sata ssd takes 53.4 minuites for me. Using 18 threads(leaving 2 cores free) and 100G cache,128 buckets.
Have ordered more ram will see if 256gb makes any difference.

Fuzeguy · November 3, 2022, 12:04pm

PrimoCache works well. They have a trial.

jack6070 · November 3, 2022, 1:46pm

4pieces 32gb ddr3 stick is $100 .

whosrdaddy · November 3, 2022, 4:07pm

Don’t, BB is not performing well on older platforms.
I have dual 2680v2 with 384GB of ram and get about 50 minutes.
I can do 28 minutes with MM (and copy from SSD to HD in the background)

Jacek · November 4, 2022, 2:25am

For those that are plotting for MMX, I have created a ticket on BB github page to add MMX support (Adding support for MMX · Issue #243 · Chia-Network/bladebit · GitHub). I would say that chances are next to zero (as they didn’t do it so far), but maybe you could chime in there as well.

Also, BB v2 production was released earlier today. Looks like both BB RAM and BB Disk branches should be working in this release.

efIN6FN2UM · November 10, 2022, 5:26pm

I have similar times for MM plots (28~ minutes) on a 2670v2 with 256GB RAM

bladebit is a piece of shit garbage

whosrdaddy · November 10, 2022, 5:28pm

Yes it does not perform well on older hardware for some reason.

Jacek · November 10, 2022, 5:59pm

If you are trying to run it on Linux, you need to potentially use -b 64 for buckets to get closer to MM performance (v2.0.1 on Ubuntu 22.04 2x e5-2695 v2). On that setup both BB Disk and BB RAM are slower than MM.

Fuzeguy · November 10, 2022, 6:13pm

That is really odd… considering how many use old hardware, esp. of the bladebit kind. Wonder what hardware exactly they do test on??

whosrdaddy · November 10, 2022, 6:33pm

Harold developed/tested bladebit at its inception on a AWS (EC2) 128 core arm system.

Fuzeguy · November 10, 2022, 6:52pm

Interesting… Never realized that … " EC2 M1 Mac instances are powered by a combination of Apple silicon Mac mini computers"

Jacek · November 10, 2022, 7:16pm

Take a look at this article / Proof Points section - Announcing Bladebit 2.0 - Chia Network. It kind of shows where the optimization focus is - high end servers, as the rest is more or less a wash with MM.

jack6070 · November 21, 2022, 3:35pm

Adding some data points here.
Over the weekend I finally spent some time figuring out Ubuntu.
HP Z620 DDR3 old system.
Dual Xeon 2660 V2, 110 GB ram disk. mad max 43 minutes a plot.
Bladebit 110GB cache, 63 minutes.

At the beginning of plotting, all 40 threads were utilized near 100%. After some minutes, they fluctuating around 50%.

arc_plot · November 22, 2022, 9:40pm

What kind of SSD you using? If it’s not MLC , the time took from run to run is different. as TLC or QLC the out of cache performance down from few GB/s to MB/s.
you need do " sudo fstrim /path/to/your/ssd/mount/point" to reorganize the SSD before test it again.

Jacek · November 22, 2022, 9:49pm

What kind of improvements you see when running trim? I can’t say that I saw much if any. Also, once plotter is asked to do multiple plots trimming becomes a bit tricky, basically making those test plots just one off runs.

arc_plot · November 22, 2022, 10:33pm

fstrim won’t able to improve performance, it only recover your SSD from slower IO back to normal . lol

Jacek · November 22, 2022, 10:52pm

That was my question. If it not manifesting in anything, then it is not really recovering from slower IO, as that would impact SSD write speeds.

My understanding is that with the amount of data plotter writes to the NVMe, the trim is really not doing much as it basically writes continuous blocks to Flash, thus rendering trim kind of redundant.

arc_plot · November 22, 2022, 11:48pm

first of first , if you do fstrim during plotting , it won’t help anything instead slow you down even more.

then you need know your SSD 's cache size. For MLC it’s ok, as it’s out of cache speed still good enough for plotting. ( some good MLC have 4G DDR4 ram for cache too ). But if you using TLC or QLC, it’s pretty tricky as they use few GB cell simulate MLC ( which is instead of store 3 or 4 bit , those cell only store 2 bit for performance) . If the simulated xLC cache > the amount tmp file plotting need, you plotting wont slow down during plotting. but if not it will. but even those cache was big enough, after few plotting , it still will full. so do fstrim before next round plotting will help the plotting speed keep consistency.
If your SSD 's cache size like xxGB instead of xxxGB, in this case, using multiple SSD as raid0 will solve the problem. but still need do fstrim between each plotting round.
Most linux filesystem like f2fs ,btrfs , ext4 have filesystem based trim. but that’s far from frequency as you plotting just burn your SSD too fast for fs module have time to trim it. so manually call fstrim in your plotting script still best practice.

Jacek · November 23, 2022, 12:24am

Thank you for the info, much appreciated.

I think the MLC, etc. is getting muddier with every new NVMe release. I have Samsung 970 EVO Plus that is advertised by them as 3-bit MLC. However, that 3-bit clarification is basically what TLC is. So, I am not sure where that sucker will fit into your description Although, IMO, this is a (very) good drive.

On the other hand, my understanding of trim is that when the OS decides to delete some data, it marks just that portion as free and moves on. However, from Flash point of view, the organization is in blocks / pages, and regardless of whether just one byte was deleted or rather the whole block, the whole block needs to be erased before the next single byte write (and thus NVMe may decide to use a different block / page for that new write to avoid erasing the whole block - blocks will be marked dirty at a single byte level). This is where the trim kicks in - in my understanding preparing those cells for faster writes in the future. This operation works fine in a normal OS use where there are few bytes deleted here and there between reboots and running trim from time to time has a long-lasting effect.

However, with plotting, the data is not written as a bunch of small (byte size) temp files, but rather as contiguous big chunks (e.g., in case of t1 and MM plotter) covering several Flash blocks. For k32 plot that is ~400 GB worth of data, so two runs basically cover 1 TB NVMe. I guess, running trim between every other plot would help those writes, but my take is that the time needed for trim plus stopping / starting the plotter will be exactly the same as when a new Flash block needs to be written over previously deleted block if not worse. So, really there would be zero gain here or rather a negative impact on the process.

That is my read of how trim works (in conjunction with plotting), and how that may benefit (or not) plotting times. I don’t know much about Flash, so potentially that is just a primitive look at the problem.

Actually, I have never seen any reports that would provide some stats about the impact of having trim or not in high IO setups on NVMes. With plotting, we are potentially charting a new territory, where the old rules may not apply.

SpragClutch · January 20, 2023, 10:53pm

Anyone have any tips in getting good results with Bladbit Disc?

I have a,
Dell T5810, CPU: E5-2683 v4 (16C 32T), 128GB of DDR4 RAM, and a 400 GB Intel S3700 SSD SATA drive for the temp drive.

Running Ubuntu Desktop,

I’m getting 42 min plots with MM (including time took to copy final plot to final destination)

But Bladbit Disc gives me 60 min plots… (including time took to copy final plot to final destination)

I’m thinking of tinkering with Bladbit Disc more since Chia’s soon to be released compression plots is built on Bladbit.

I know I dont have much other information since I’ve only tried making two plots with Bladbit Disc… I’ll have to tinker a bit more with it.

Any advice or tips would be welcomed.

Fall back would be to just stick with MM and go with Madmaxes compressed plots…

Thanks!