Don't fall for the Hyper-threading trap

As many will know, Hyper-threading (HT) is a trick to squeeze more performance out of a system. It works quite well on systems that don’t push a CPU too hard and have many many small processes and threads going on because it has twice as many logical CPU’s as actual, physical CPU’s.
But don’t be fooled, those extra “free” logical CPU’s are not CPU’s at all. The basically look for idle moments in a CPU’s pipeline and squeeze extra CPU instructions into those “pauses/gaps”. It is way more complicated than that but this is easier to visualise.

HT is a bad idea in situations when applications need “a full/real CPU” and so in many database applications or other “heavy stuff”, HT is turned off in the BIOS because it is counter productive.

My CPU is a dual Xeon E5-2640 v4 @ 2.40GHz with 10 cores/ 20 threads each. So 20 real cores.
The OS is Ubuntu Server 20.04 LTS.

So I went to see if HT is beneficial or counter productive in plotting. Here is what I found:
Short version:
I had up to 18 plots in parallel with plots taking 11 to 15 hours to complete. This varied wildly.
I went from 25/26 plots a day to 30/31 a day with the exact same hardware.

Long version:
I kept HT enabled in the BIOS and started using CPU Affinity to pin jobs to real CPU’s and not let them use “fake” logical CPU’s.
I also made sure that Jobs don’t cross NUMA zones as that increases latency because a CPU must talk to memory which belongs to the other CPU.
When looking at the CPU numbering, my system shows 40 CPU’s. To find out which Hyperthreading units (logical CPU’s in Windows terminology) are on the same physical core do:
“cat /sys/devices/system/cpu/cpuX/topology/thread_siblings_list”
where X is the CPU number, starting with 0 and ending at 39 on my system.

"cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list"
0,20   (CPU 1, core 1)
1,21   (CPU 1, core 2)
2,22   (CPU 1, core 3)
3,23   (CPU 1, core 4)
4,24   (CPU 1, core 5)
5,25   (CPU 1, core 6)
6,26   (CPU 1, core 7)
7,27   (CPU 1, core 8)
8,28   (CPU 1, core 9)
9,29   (CPU 1, core 10)
10,30  (CPU 2, core 1)
11,31  (CPU 2, core 2)
12,32  (CPU 2, core 3)
13,33  (CPU 2, core 4)
14,34  (CPU 2, core 5)
15,35  (CPU 2, core 6)
16,36  (CPU 2, core 7)
17,37  (CPU 2, core 8)
18,38  (CPU 2, core 9)
19,39  (CPU 2, core 10)

How to interpret:

CPU 20 is actually the "fake" CPU on physical core 0.
CPU 21 is actually the "fake" CPU on physical core 1.
...
...
CPU 39 is actually the "fake" CPU on physical core 19.

As I want to use only “real” CPU’s without other stuff running on a “fake” CPU and slowing the real CPU’s work down (a core will try to load-balance between both HT units), I need to modify my Swar Plotmanager config so that I have 4 jobs (I have 4 NVMe drives, one for each job) and assign a maximum of 5 real/full CPU’s to a job. 4 jobs x 5 threads is 20 CPU’s.
I use 4 threads for stage 1, allow 1 stage 1 plots per job and 4 plots per job. This way, the 5 CPU’s available per job are used to the fullest.

CPU Affinity for plotting jobs:
 NUMA Node 0:
  Job 1:
   0,1,2,3,4
  Job 2:
   5,6,7,8,9
NUMA Node 1:
 Job 3:
  10,11,12,13,14
 Job 4:
  15,16,17,18,19

As you can see, this also binds jobs to stay within a certain NUMA node which, in my dual CPU’s setup are:

NUMA node0 CPU(s):               0-9,20-29
NUMA node1 CPU(s):               10-19,30-39

(use the command “lscpu” to find this out on your system)

Mind you, as I did not disable HT in the BIOS, the other 20 CPU’s are still around despite them being fake. But they are fine for other things an operating system does like Disk and network I/O.

So instead of 18 jobs in parallel (using all 40 logical CPUs / threads) and not using any form of CPU or NUMA affinity, I now have 12 jobs in parallel max. but those jobs consistently finish between 8 and 9 hours instead of 11 to 15 hours. Due to way I staggered them, I now get 30 to 31 plots a day instead of 25/26 on average.

Added bonus: the CPU’s get less hot. They used to be around 55 degrees C all day long. With the new config, they are around 44 degrees. Fan speeds are lower as a consequence and this save a little bit on electricity.

That’s it. That is what I wanted to share with you. Hope you find it useful.
Disclaimer: it took me close to a week of tinkering to get this result. A little bit more here, a little bit less there.
At one point the machine started plotting very consistently and predictable and keeping in mind these CPU’s are quite old, i’m quite pleased with the result.

6 Likes

Nice to see this experiment as I have been warning people about hyerthreads not being ‘complete’ threads here for a while. You definitely do not want to cross NUMA boundaries as your cache coherency will tank.

However, I always wondered if you can keep the hyper-threads for a process tied to their pair core thread, would that not be beneficial? After all, the hypertread is there to prevent the pipelines from stalling.

1 Like

“After all, the hypertread is there to prevent the pipelines from stalling.”

Yes. But I have developed the impression that the plotter does a good job of not leaving too many “gaps” in the pipeline and utilizing a (real) CPU pretty good.

Also, I noticed when looking at the CPU’s turbo behaviour, in the past, with 18 plots chugging away at the same time, my two 2400mhz CPU’s rarely went over 2600mhz (all of them at the same time, as they are all kept busy).
In the new setup, with the “fake/second cpu” doing very little (other stuff but plotting), it turbo’s consistently higher. All CPU’s are between 2700 and 3100mhz now (this CPU can do 3400 max with a small number of HT’s as long as the rest is not doing much) which of course helps as the IPC is higher as a consequence. And the temperature overall is consistently lower despite turbo’ing consistently higher.

(I am consistently using the word consistently a lot…)

2 Likes

One thing to keep in mind, this is a fairly old CPU… Broadwell from 2016. Hyperthreading has improved since then:

Much of what you say is true – you definitely don’t want to think of threads as full CPUs because they absolutely are not. They are kinda helpers… I tell people to think of them as a “half” core.

1 Like

Thank you for sharing this, SteveTheLonelyFarmer, and other replying bros.
I am still not getting the conclusion. Your improvement of plotting efficiency is due to the management? or hyperthreading on/off?

Hyperthreading is good for chia plotting or not?

I am working on an old Xeon E3-1230 last night. I was experimenting plotting with old mechanical hard drive. With hyperthreading off, CPU loading showed a periodical wave. With hyperthreading on, CPU loading appeared more arbitrary, especially at the beginning of stage 1. Right now I don’t have final result to tell hyperthreading ON OFF which is good.

The other observation I had:

Xeon E3-1230 4C/8T. When hyperthreading was off, computer shows 4 CPU. I set plot -r 4. CPU load was 45% or so. After I added another -r 4 plot in parallel one hour later, CPU load was 80% or so.

Does that mean I was doing 8 thread task still underload, with hyperthreading OFF?

I’m no expert, and someone correct me if I’m wrong but if you are potting on a mechanical drive you probably won’t see a difference either way because your bottleneck will be the write speed of the HDD. I have an e3-1240v3 and I can tell you that the hyperthreading helps alot if you’re running 3 or 4 plots in parallel. I came from a 4690k (4c/4t) and the extra threads really do help. If you had more physical cores then it might be worth core pinning or something similar but 4 isn’t enough to really play around with.

what about under windows how to set affinity for only use real processor not fake processor .

Can I just disabled Hyperthreading ? I have dual xeon 2697 … but I don’t know where are the bottlenecks. cpu only about 50% …

Hey bros, last night I configured RAID 0 on two old 7200rpm 500GB SATA hard drive. Performance boosted up. CPU load was high and healthy.

Plotting time is shortened. I got 14 hours on one hard dive before. With two hard RAID 0, look like it can be done in 10 hours.

Computer is an old HP Z210 Xeon. Ten years old.

I think CPU wants to work. We need give her a fast hard drive.