Slow enterprise nvme in pcie 3

Thomas_hunter · May 4, 2021, 10:13pm

I’m using
Micron 3.2TB Micron 9300 MAX U.2 NVMe Enterprise SSD - MTFDHAL3T2TDR-1AT1ZABYY
and U.2 to PCIE card.

Not sure if I can post a link here but the pcie card is from UK amazon under name " GINTOOYUN U.2 to PCIE Expansion Card，SFF 8639 to PCIE 3.0 x4 Riser Card,PCI-E 3.0 X4 SATA Adapter,For 2.5" U.2 NVME SSD And 2.5" SATA SSD."

I’m not able to do more than 3 plots before the drive throttles. CPU is Intel i9-10900K. No overclock. Plotting with 2 threads per plot. I’m using atop on Ubuntu to see utilization (pic in the link). Nvme is getting to 100% busy just with 3 plots running.

What can I do to use the full performance of the nvme?

useful_Yak · May 5, 2021, 12:10am

I know that NVME can throttle when it overheats - so maybe checking temperature with smart tools is the first thing to do here.

What are the read and write speed that you can get in tests?

to test read:
hdparm – tT --direct /dev/nvme0n1

write:
dd if=/dev/nvme0n1 bs=1M count=10240 of=/dev/null

JustinLloyd · May 5, 2021, 3:38am

Well, for a start, enterprise SSDs require active cooling. Get some airflow going across it. nvme-cli or hddtemp will tell you if you are throttling. Above 70% of OT (operating temp) is cause for concern. On the Micro 9300 that would be around 49C. It probably isn’t throttling at that temp, but you will want to start mitigating.

Not familiar with atop for disk monitoring, but iotop will tell you what is happening in terms of IO utilization, reads & writes, and the --only will show only those processes that are hitting the drive you want to examine.

ioping will tell you your drive latency. Watch for spikes whilst plotting. sar and iostat will give you tps including blocks read/written. Low tps but high latency means you are throttling. High tps and low latency means you are doing good. High tps and high latency means you are really hitting the limit of the drive, which is a good thing because it means you are using all it can offer up, however, if your QD is low it means your drive is being slammed with too many small random requests at once and you will need to start triaging.

Generally the cheap U.2 AIC are straight pass-through with no PLX or anything else. They just talk regular NMVe protocol over the PCIe bus to the CPU. This doesn’t apply to your build (i9 CPU) but if using an older motherboard or a server motherboard at any time, motherboards that often come with strange slot configurations, make sure you actually plugged the drive in to at least a 4x slot and not a 1x slot,

P.S. launch iotop with sudo.

Thomas_hunter · May 12, 2021, 8:28am

Thank you, turns out I was just misreading the htop numbers.

With Iotop I’m hitting between 60 - 100% utilization with 7 concurrent plots. Ioping shows micro-second latencies when idle and up to 200ms when plotting. Read speeds with hdparm are 2000 MB/s but write speed seems slow…

sudo dd if=/dev/nvme0n1 bs=1M count=10240 of=/dev/null

10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 12.4217 s, 864 MB/s

Shouldn’t the write be faster?
And is there a way how I can see utilization in % similarly to iotop but just for the one drive?

JustinLloyd · May 12, 2021, 8:02pm

Hundreds of millisecond latency means that many of your plots are stalling waiting on IO. Sub-millisecond is desirable, but a couple of milliseconds is what you are shooting for.

I would check your read/write speeds against the drive’s datasheet. Also, block size of 1M is kinda small.

That command you posted is for reading the drive.

Some DC drives are made for extensive reading, not writing, you’d need to check that on the datasheet. My Intel DC P3700 drives have pretty good read speeds for five year old U.2 drives over a gen 3 x4, amazing endurance measured in tens of PiB, but the write speeds are lousy. It’s why I put a RAM cache in front of each one and then run them through a Highpoint SSD7180 controller with RAID0.

Not for iotop, no. iostat will give you a device specific snapshot, but there is no utility that will give a single point of reference.

Much like plotting in parallel across multiple cores, or multiple CPUs is better, plotting in parallel across multiple drives is better too.

useful_Yak · May 14, 2021, 7:20pm

You can use
sudo iotop -o
Which will limit only to active threads actually doing I/O. So if you dont run anything except chia plotting on the machine it will be limited to chia processes, and you also can grep output.

Another is useful command iostat that provided aggregated Io stats per disc