NVMe Sustained Writes discussion

I wanted to capture an interesting discussion that happened yesterday morning on the Chia Team public Keybase. It is in screenshots sadly (sorry @codinghorror ) But we’ve been talking about Sustained Writes when purchasing NVMes. You will have to click it to make it bigger I believe. Has some thoughts from @storage_jm.

2 Likes

So… watch a video to learn what I could read in like 30 seconds? Thanks, but no thanks…

TL;DW look at sustained read/write speeds not TBW. I guess that’s good advice?

1 Like

At current speed of netspace growth. Most SSD does not get a chance to wear out before the farm gets unprofitable.

In our experience some NVME drive is worse than premium SATA drives.

Nice summary :grin:

The guys from the Chia team also pointed to the importance of sustained wrote speed in the livestream when trading started.

Another point he makes in the video is that TBW is not a standardized test, so basically it doesn’t tell you all that much about the real durability.

THIS a thousand times. Often I see people running common “off the shelf” benchmarks on SSDs but the plotting workload is quite unique. It’s not difficult or time consuming to run a few plots to determine the optimum number of concurrent plots per drive without overloading them.

And if the results are not satisfactory then it comes down to two reasons:

  1. Not configured correctly
    or
  2. The drives aren’t good enough
2 Likes

I love jm’s remarks about nvme; the good things (easy to find ssd Metadata in /proc) and the bad (docs are written by drive controller engineers and kernel driver developers, and you can tell)

1 Like

The SSD space is plagued with a lot of Dunning–Kruger effects. The problem with the video I had was that he seemed very informed and correctly identifies sustained write bandwidth as the metric that tracks well with plotting performance. Then he goes on to make some large assumptions that are flat out incorrect.

He says something along the lines that all TLC NAND is the same. This could not be further from the truth. TLC NAND can vary from program erase cycles from 1500 all the way up to eTLC (enteprise grade) at 10k cycles. This depends on 3D NAND process, maturity, wafter binning, and controller ECC.

The other thing, which he is close, but doesn’t understand, is WAF and how TBW is rated. All SSD vendors do in fact, use the same measuring stick, and it is called the JESD219 spec from JEDEC. This is the specification that dictates how SSD manufacturers state their rated TBW. Real endurance will depend on WAF. In the Keybase dozens of times, I’ve pointed out that some vendors correctly track percent used to natural endurance using PE cycles and real measured WAF. The cheap models that use the stock firmware from Phison, Marvel, etc., just go off host writes. THIS IS INCORRECT. He points out that Samsung may track with the real WAF - bravo! However, that doesn’t change the fundamental PE cycle and endurance difference.

4 Likes

Read my comment, it most definitely is standardized and real endurance is well understood in proportion to WAF

1 Like

You need both sustained write speed AND high TBW

yes I read it, thanks for clarifying that. Now half way through one of your video’s

1 Like