Question regarding High CPU I/O Waiting

fireboy0526 · April 27, 2021, 12:50pm

Hi there,

I’m a new farmer that have just joined the community. I’ve been following the " [Build a Budget Chia Cryptocurrency Plotting Rig]" guide from Chia Decentral to setup my plotting machine. I’m using CLI commands instead of GUI interface to setup my plotting event. Everything went smoothly and my machine have started farming.

However, after about 3 hours later, when I use glances to check on my system status, at the bottom it shows “High CPU I/O waiting” and “CRITICAL on CPU_IOWAIT”. A little more information from the top of glances indicates that my current system usage:
CPU: around 38% up and down
iowait: 3%-14% bouncing in between.
Mem: 37%

Currently the machine has 5 plotting event going on with 1hr separation in between. The total amount of plotting event was set to 9, which the last plotting event should start after the 9th hour.

Is this High CPU IO Waiting normal? Or is there anything I can tweak to solve this problem?

Bellow is my plotting machine specs:
CPU: Intel i9-10900
MB: Gigabyte Z590 VISION G
RAM: 4 * 16GB D4-3200 Kingston
HDD: 9 * WD 10TB Ultrastar
SSD: 250GB WD Blue (OS installed here), 2TB WD Blue (plotting second temp disk)
M.2: 2 * 2TB WD Black SN750 NVMe
PUS: 850W Corsair Gold

Chia Version: 1.1.2

Any help will be much appreciated.
Thank you.

Edit: added chia version.

baylicious · April 27, 2021, 4:26pm

Could you provide info on the number of plots you are running in parallel and on which drives you have the temp dirs? Is it he m2x2 listed?

fireboy0526 · April 27, 2021, 4:52pm

I have 2 * 2TB nvme and 1 2TB ssd WD blue setup for my plotting.

nvme1 is named as ssd001.
nvme2 is named as ssd002.
ssd WD blue is named as ssdtemp.

I have 9 * 10TB hdd named from hdd001 to hdd009

ssd001 plots for hdd001 to hdd004
ssd002 plots for hdd005 to hdd009

ssdtemp is a temp drive working between ssd001 and ssd002.

Below is two lines from my bash script showing ssd001 and ssd002 plotting.

screen -d -m -S chia1 bash -c 'cd /home/eric/Documents/Chia/chia-blockchain && . ./activate && sleep 0h && chia plots create -k 32 -b 5000 -e -r 4 -u 128 -n 83 -t /mnt/ssd001/temp1 -2 /mnt/ssdtemp -d /mnt/hdd001 |tee /home/eric/Documents/Chia/chialogs/chia1_1_.log'

screen -d -m -S chia5 bash -c 'cd /home/eric/Documents/Chia/chia-blockchain && . ./activate && sleep 4h && chia plots create -k 32 -b 5000 -e -r 4 -u 128 -n 83 -t /mnt/ssd002/temp5 -2 /mnt/ssdtemp -d /mnt/hdd005 |tee /home/eric/Documents/Chia/chialogs/chia5_2_.log'

Here is a screen shot of my glances:
https://imgur.com/P78maz4

Any help will be grateful.
Thank you.

baylicious · April 27, 2021, 5:21pm

Thanks for all the details. One clarification - for the commands you pasted, are those the only 2 or you have multiple of them similar to that?

How many plots are running in parallel on the system at any time?

fireboy0526 · April 27, 2021, 5:46pm

There are a total of 9 commands in one bash script. The only changes with each commands are the m.2 ssd being assigned to plot to different hdd, so nvme1 will plot hdd001 to 004 (4 commands here), and nvme2 will plot from hhd005 to 009 (5 commands here).

There are only these 9 plots running parallel with 1 hours delay to stagger them.

Thanks again

andrew · April 27, 2021, 5:54pm

My strategy for iowait issues has just been to continuously add more ssds, and avoid m.2 slots that go through the chipset. Have you also changed changing the sector size of ssds to the highest possible? And changed the filesystem to xfs?

fireboy0526 · April 27, 2021, 6:19pm

Hi andrew,

I’ve just checked that the m.2 slots that I’ve been using goes through cpu. Also, can you explain a little bit more about “adding more ssd”?

With my hdd, ssd, and nvme, I’ve used the following format command:

hdd & ssd:
sudo mkfs.ext4 -m 0 -T largefile4 -L <drivename> /dev/sda

nvme:
mkfs.xfs /dev/nvme0n1
mount -t xfs -o discard /dev/md0 /mnt/ssd

My question is that: Does the normal ssd (none m.2 ssd) needs to mount with the command mount -t xfs -o discard?

Thank you

Jesion · April 27, 2021, 7:30pm

fireboy0526 if it is not a problem I will join to your topic because I have this same problem with I/O Waiting
Setup:
i7 - 10700k
32GB 3600 G.Skill
Gigabyte Z490 UD
system - ssd1 - kingston hyper X
temp Samsung 1 TB 980
hdd - 2x10 tb WD , 1x 3TB Toshiba

For now i have 2 parrarels plots 4t/2400MB nad after 6% i got few High CPU I/O waiting and CRITICAL on CPU_IOWAIT.

Blueoxx · April 28, 2021, 12:06am

I see you are running with “-e” which means bitfield is disabled. You don’t need to do this anymore. There have been many improvements to bitfield to where its actually faster than no bitfield. When using this you only need to use 3408 RAM with 4 threads.
I see you are using “-2” also. You have two 2TB NVMes so you don’t really need to use the “-2”. I would use the ssdtemp drive as the destination drive for you plotters. From there then transfer to HDDs.

fireboy0526 · April 28, 2021, 12:24am

Not at all, feel free to jump on as we seem to have the same problem.

fireboy0526 · April 28, 2021, 1:26am

Thank you very much for your suggestion. I will be sure to give your suggestion a try. It seems that in patch 1.0.4 they’ve changed the bitfield mode to be faster and smaller. Should save me some extra ram and space while plotting.

fireboy0526 · April 29, 2021, 2:31am

Hello again,

After playing around with my settings a bit more, it seems that I’m still having this high cpu iowait problem going on.

https://chiaforum.com/t/i-was-sick-of-dealing-with-iowait-as-a-bottleneck-for-the-last-week/598
However, reading from that post up above, it seems that adding more nvme could possibly solve the problem. Is this true?

Also, monitoring my iotop, it seems that kworker is causing all the iowait. Is it because of phase 1 of plotting that causes the high usage of kworker?
https://imgur.com/a/pMcLMc1

Thanks again.

Blueoxx · April 29, 2021, 3:06am

Couldn’t remember how iotop displayed data so I had to look it up. In your image, the IO column only tells you the percentage of time a process spends waiting on I/O. So that kworker isn’t causing high IO wait, it is experiencing high IO wait because of all the plotters. 99.9% means it’s waiting for IO 99.9% of the time. But, I’m sure these lines jump around a lot, right? What we can say for certain is yes, some of your plotters are experiencing some IO wait. Having another nvme and raiding it would probably remove the IOwait.

fireboy0526 · April 29, 2021, 3:14am

Thank you so much for you reply.

I would just like to confirm on my last comment.

You said that “Having another nvme and “raiding” it would probably remove the iowait”. My question is must we raid it? Personally I have bad experience with raiding, which is why I want to avoid it if possible.

Blueoxx · April 29, 2021, 12:26pm

You can keep them separate, just split your plotters across them. Raid0 makes it simple to configure plotters.

pwagma · May 22, 2021, 2:49pm

I am having the same issue, a bit a slower cpu but after 2 plots in phase 1, or a total of 3-4 plots its causing HIGH iowait.
I have 3 x 1 tb ssd on raid0, xfs file system.
I am thinking in keeping them separte, 2 for temp and 1 for temp2.
What do you think?
Thanks

mcmara · May 24, 2021, 1:43am

Same here. A lot of these critical alerts:

Using Ubuntu 20.04 on ESXi on a DELL R720XD with the temp directory on a 3x1TB SSD on raid 0 through the PERC H710P raid controller, with the virtual disk attached to the VM as raw disk. Same thing despite the completely different hardware.

Rom1 · June 2, 2021, 1:21pm

Hi,

I have the same issue (IO WAIT blocking) , but not when plotting. I have 8 plot to try and now I’m trying to Sync the full node and it’s a pain due to this IOwait that is nearly freezing the complete machine. Any chance to get around this ? all plots are on another SSD dedicated to that, if that help.

SB1Racing · June 2, 2021, 1:36pm

If you’re just having issues copying plots around then try to use rsync or scp and play with bandwidth limits. It will take longer to copy of course but put less strain on the system.

example to limit bandwidth to 1000KBytes per second:
rsync --bwlimit=1000 /path/to/source /path/to/dest/

example for scp:
scp -l 1000 file user@remote:/path/to/dest/

Rom1 · June 2, 2021, 4:57pm

Hi,

for records, I have solved this IO issue by creating and mounting a tmpfs as ramdisk and moved the blockchain to it.

so I had far less IO issues and increased the Sync speed.

tuto : How to Easily Create RAM Disk on Debian, Ubuntu, Linux Mint, CentOS - LinuxBabe

Thanks