How to resolve persistent I/O bottleneck issues leading poor performance in plotting

Hello everybody.
Thanks in advance to everyone who posts voluntary support on the forum! In the quest to learn and solve the problems on my chia farm, all the help was very helpful!

After a lot of work and hours of learning over many weeks, everything is OK on my chia farm, with the exception of the plots/day performance of my plotting machine.

After searching this entire forum for answers, and learning every detail, today I finally found that I have two issues:

  1. I/O performance far below the MINIMUM threshold expected by SSD NVMe (SSD Adata XPG S40G 4TB M.2): his factory spec is 3500MB/s write, on tomshardware blog he explains that this drops for up to 500 MB/s when it is more than half full. In my case, it never manages to cross the 300 MB/s write line.
  2. This I/O rate, which is already low, is not well divided between multiple plots in parallel. The older plots, which arrive in phase 2 or 3, are forgotten, while all the I/O is left with the newer plotting processes in phase 1. Then they all go on bottling up in phases 2 or 3, and accumulate waiting times per I/O total hours in the sum total.

In case that’s not the case for anyone, believe me: I’ve read everything I found of official documentation related to plotting, and unofficial, on this forum and on the web in general.

I’ve tried several things to try to resolve the very high I/O timeouts:

  • Added fstrim every hour.
  • Changed disk mount options in fstab.
  • Monitored the temperature of all HW components.
  • I reduced the RAM space that the system reserves for swapping to almost nothing.
  • And many more attempts that are not even worth mentioning.

Only thing not yet tried:

  • The only thing I confess I haven’t tried yet is to reformat the temporary NVMe disk And maybe replace the current xfs filesystem with ext4.

REQUEST 1: paid assistance:
If there is anyone here interested in providing PAID assisted technical advice to help me with this problem, I would accept the service.

REQUEST 2: voluntary aid:
If the commitment to paid service is not in anyone’s interest, but you want to help voluntarily, all help is welcome and I believe that this is the doubt and difficulty of many other people here too, from what I read in the help request posts !

Some details of my plotting system:
SW:

  • Ubuntu desktop 20 LTS.
  • All the work is being done via CLI, I never even opened Chia’s GUI or the machine.
  • All work is done via SSH on LAN.
  • All plots are made automatically by the plotman tool.
  • Tools I used to monitor resource consumption:
  • iostat.
  • top.
  • iotop.
  • glances.
  • Setup of plots:
  • r=4; b=6000; stagger between plots=50 min; (tmp disk is SSD NVMe).

HW: (inspired by a Tomshardware post which states this could reach 30 plots/day)

  • CPU: Intel Core i9 10850K (10 cores/20 threads, operating at the default clock of 3.6GHz).
  • Motherboard: MSI MEG Z490 Gaming Carbon Wifi.
  • Case Coolers: 5 FAN Corsair AF120 Coolers.
  • CPU Cooler: CoolerMaster Hyper T4 RR-T4-18PK-R1.
  • RAM: 2x32GB Team T Force Zeus 3200 MHz (64 GB RAM totaling).
  • IMPORTANT MOST: 1x SSD Adata XPG S40G 4 TB M.2.
  • OS and all SW are in another SSD: 512 GB Sandisk.
  • PSU: 600W Thermaltake, Smart Series, 80 Plus White.
  • Case: Corsair Obsidian 750D Full Tower.
  • Destination HDDs: External HDD: 8x Seagate Portable 5TB.

There are not RAID between the 2 disks: the average SSD is just for SWs and OS and the NVMe is 100% dedicated for plotting process.

IMPORTANT NOTE: with just ONE plot it finishes the full job in 3.7 hours (13600 s), using 4-8 CPU threads and 8 GB RAM. So, it can indeed make 6 plots/day currently.

BUT with this system tomshward states that it is able to put 10 parallel plotting, and in my case, I cannot put even 2 parallel plotting without losing all performance to I/O infinite waiting. So, it’s supposed to reach 30/day, although with 20/day I would be quite happy already.

IF YOU READ ALL THIS POST, THANK YOU SOO MUCH!!!

Is your SSD Adata XPG S40G 4TB M.2 mounted on PCIe 3.0 or PCIe 4.0?

PCIe 3.0? PCIe 4.0 is twice as fast as PCIe 3.0. PCIe 4.0 has a 16 GT/s data rate, compared to its predecessor’s 8 GT/s. In addition, each PCIe 4.0 lane configuration supports double the bandwidth of PCIe 3.0, maxing out at 32 GB/s in a 16-lane slot, or 64 GB/s with bidirectional travel considered.”

Excellent.

What is your delay set to when parallel plotting?

50 min at minimum. But also there is another trigger. Because I use plotman, even though 50 min has passed IF still there is some plot before 1:4 phase, it will wait.

Copy that.

I looked for the simple solutions but I find none as of yet. My mid range plotter (4 K33s a day) does not compare to your plotter and I cannot help much more with the high-end questions you are asking.

There are many others having high level conversations about issues like these. I would suggest you spend a bit more time searching the Chia (and reddit r/chia) forums and see if you find some answers or good places to ask your questions.

Good luck!

I’m not sure where you got that. From their Adata XPG S40G review:

Like many SSDs, the Spectrix S40G features a write cache to absorb drive writes at high speed, but performance degrades after extended writing. While testing the empty drive, the S40G wrote at a rate of 2.2GB/s for a total of 377GB before its performance degraded. But it displayed extremely inconsistent results after the performance dropped. At the worst it would write at 4-6 MBps for a second or two before it would pick back up, but soon fall again. This resulted in an average speed of roughly 250 MBps after it exhausted the write cache.

He should get a decent MLC Ssd Even with pcie 3 they will ne lightyears faster something llike 970 pro or evo plus, i use the evo plus and Kingston kc1000 both can cater the Chia workload really well.

Another option could be to get 64GB extra RAM, and use the MadMax plotter. I think properly configured it could be contained in the SLC cache and Ram for plotting, meaning not just significant endurance gains on the SSD, but probably on this system pushing out a plot every 20 mins so that could be 72 plots a day.