CLI showing CPU 150% but RAM and disk IO fine

Gbrown2036 · April 23, 2021, 2:31am

Any anecdotal experiments done when plotting in parallel (CLI) and it reports a CPU usage of around 150%? Am I being counterproductive going this high? I’m wondering if I should reduce my parallel plots by one to try and get this around 95% even though my memory and disk IO will be even less utilized than they are now.

Intel Core i7-6700K @ 4Gz (8 cores)
32gb DDR4-3000
2TB SSD NVMe M.2

Thanks

dchuk · April 23, 2021, 3:09am

When it shows > 100% CPU, that means it’s leveraging multiple cores to do the plotting. This is normal.

AKplayer911 · April 23, 2021, 3:15am

I do 1 plot and the CPU still shows 100% sometimes, I guess it is normal while plotting.

Gbrown2036 · April 23, 2021, 6:40pm

Thanks. What CPU % do you use or try to target?

login-taken · April 23, 2021, 9:48pm

I’m on a Broadwell too! It hums churning out the most at 1.6 jobs per core (also with thread count increased to 4 from the default 2).

This would be 13 parallel tasks for your 8 cores and that would require 57GB of RAM with the default 128 buckets. You could try to increase this to twice to fit in perhaps, I haven’t done this (decreasing better fills spare memory I have, 2 times less buckets take up twice as much memory). Likewise fitting on your 2TB tmp would require careful scheduling/staggering. I manage to squeeze about 5.5 jobs per TB (tmp2 set to the final drive) but also still get some leftover free space, timings are hard.

Anyways, thrift shop Broadwell can still win at this! So green

Timings are abysmal though (18 hours a task for me).

For years we’ve heard of single core performance staying pretty much constant. It turns out not so with chia. Newest Intel plows a plot in 5 hours even with a NUC as Jeff Atwood shown us. It’s the first time in years I felt the difference and I touch some HPC computers occasionally. I guess everyone’s going for a new rig now, so not so green after all. I think the 64-way Threadripper/EPYC loaded with half a terabyte RAM and 20TB SSD could put out 20TB a day, which would explain some of the insane pre-trade XCH pricings we’ve seen. At that rate people could still chase the runaway netspace and have a 50:50 chance at winning a chia, betting on a pump and dump only four-figure pricing would recoup that hardware and disks. I wonder what a z-architecture mainframe renowned for IO and crypto coprocessors could do.

codinghorror · April 23, 2021, 10:28pm

The best you can do with Broadwell is 18 hours per plot? Is that 2c/4gb defaults?

Nehalem (45nm)
Sandy Bridge (32nm)
Ivy Bridge (22nm)
Haswell (22nm, power consumption improvements)
Broadwell (14nm)
Skylake (14nm, arch change!)
Kaby Lake (refresh)
Kaby Lake R (refresh)
Coffee Lake (introduces core i9, 8+ cores)
Cannon/Ice Lake (10nm)
Tiger Lake (10nm, backported arch, L4 cache)

Ideally the best place to get older stuff would be Skylake, when the big arch change happens.

I’ve been astounded at how fast my i9-9900ks is. 4.5 hours per plot (4c/16gb). Just checked and it is

Total time = 16147.980 seconds. CPU (142.430%) Fri Apr 23 12:29:30 2021

And that’s with nothing special on the disk side… just a regular 970 pro. The plots running on the 970 evo are

Total time = 16393.081 seconds. CPU (140.570%) Fri Apr 23 13:03:42 2021

barely any slower… adding memory does virtually nothing though per my tests, whereas adding cores (+2) bumps plot time 10-15%.

Dan · April 23, 2021, 11:32pm

CPU percent is based on cores. Only phase 1 is multi-threaded, so you won’t see >100% on phases 3-4. I just set my plots to use the same number of cores as the system and it seems to be fine. There are diminishing returns to multi-threading, so I rarely see >300% for one plot.

login-taken · April 23, 2021, 11:54pm

Yup, it took 62437 seconds to claim the latest plot like an old British notary.

No, 4 threads per process.

With 2 threads the CPU wasn’t 100% occupied, while plots weren’t coming out much faster either. I also take more memory (19gb per plot) as it would sit idle otherwise; this doesn’t seem to affect speed at all, the only effect is decreased disk access.

This is exactly right.

Yeah it’s crazy! But parallelism is crazier!

Per THE spreadsheet with people’s measurements (now obsolete with Chia 1.1.1) only the massively parallel loads matter.
https://docs.google.com/spreadsheets/u/0/d/14Iw5drdvNJuKTSh6CQpTwnMM5855MQ46/htmlview

Anyone knows of updated spreadsheets?

Skylake introduced AVX-512, but it wasn’t full rollout I think. Also each generation introduces massive improvements in cryptographic instructions. Though Chia guys could’ve compiled without those flags FOR EQUALITY (I don’t know if they did, so far I’ve only looked at Chia’s Python code, and it was too much Python for me already, this is supposed to be fun!)

I will continue to buy at a thrift shop though. I can see a stock of very incompatible (and illegal to buy) Intel Confidential engineering samples with 20 cores for $50, and a suitable dual socket board for $35, a bit burned in the corner. With godload LRDIMM sticks that are the cheapest because noone buys LRDIMM at thrift shops, and with some very crappy Patriot SATA SSDs that nevertheless appear to have only time limited and infnite TBW warranty I can see this giving 9TB farming land a day (Again how is that for the whole being green pitch? From climate science we know we should scale back agriculture a bit, between marketing and Python code Chia doesn’t look like 3 years in the workings).

Disks are a killer cost and you are exactly right that’s where Chia shovel sellers should focus on, PaaS-ing on their NetApp filers to the Chia hungry masses. I’ve found 16TB drives for unbelieveable $1300 a 10-piece lot on Alibaba but I don’t know it I want to get on this exponential growth curves and phase transitions rollercoaster.

Sorry for writing in too many words.

codinghorror · April 23, 2021, 11:56pm

This is great, not too many words at all! I have another friend telling me old Broadwell (5th gen) servers are doing great, whereas I told him don’t buy anything older than Skylake (6th gen) … so I am trying to reconcile your statements…

login-taken · April 24, 2021, 12:18am

They might be doing great, being servers, because of server core count. And Skylake surely would be superior too but might not come with as many cores (also not every Skylake has AVX-512, though I don’t know for sure if that matters, I’d err on the side of Kaby Lake).

In the spreadsheet we see a dual Xeon 2680 v4 with 14 cores each for total of 28 cores taking a respectable place among single-seat Threadrippers and Xeon Golds despite also taking 17.17 hours to finish a single job (but doing 44 simultaneously).