CS3214 Computer Systems

What exactly are hardware (or hyper-) threads?

CPU architects use the term thread in a different way as OS designers in an overloaded meaning and slightly different meaning.

In OS/Systems, a thread is simply a separate flow of control that is maintained in software (using context switching) and is typically part of a process that may contain multiple threads. Threads can be implemented with kernel support (kernel-level threads) or without (user-level threads), or some combination of the two. The OS will multiplex kernel-level threads onto the available CPUs.

In computer architecture, certain CPU designs provide the ability for the CPU hardware to maintain separate flows of control in hardware. This comes at two levels: first, CMP (chip multiprocessors), in which each hardware control flow is implemented on its own part of a chip, which is referred to as a core. CMP processors are also referred to as multicore processors.

Some designs combine this with SMT, or simultaneous multithreading, in which each such core actually supports (typically) two separate control flows. Intel calls this "Hyperthreading." SMT is a much cheaper (from a transistor budget point of view) and less effective feature than CMT.

Computer architects refer to these separate control flows as "threads" (for clarity, I prefer the term "hardware threads" to distinguish them from the software abstraction). So an Intel Xeon provides 2 threads per core. An 16 core CPU thus has 32 hardware threads. A system such as our rlogin nodes where each machine has 2 sockets will provide 64 independent hardware supported control flows.

To the OS, they appear as if it had 64 CPUs available on which to schedule its (software, kernel-level) threads.

How are hardware/hyperthreads implemented under the hood?§

Under the hood, hyperthreads share many CPU resources with their peer that's hosted on the same core. Some functional hardware strands (for instance, registers) are duplicated, but others (e.g. FPUs, caches) are shared and the hyperthreads must take turns. For instance, if one is stalled on a cache miss, the other could use functional units such as ALUs.

An analogy may be to having two side-by-side restaurants in a strip mall. Unbeknownst to the customers of these restaurants, these two restaurants share one kitchen in the back. They have separate entrances, and appear as completely separate restaurants, they have separate hosts perhaps. A setup like that allows more efficient use of the kitchen: if one restaurant has little traffic, the kitchen capacity can be used for the other. If one restaurant uses say the fryer the other can still use the stove, etc. This works well unless both restaurants become busy and require the same resources.

For certain practical applications, hyperthreading provides often a very small speedup and may be turned off, but there are workloads in which it can be beneficial, particularly if their software threads have complementary resource demands.

Finally, we'll note that different cores also share resources with each other on the same CPU. To expand the analogy, having multiple cores would be like having several of these 2 restaurant pairs that share one kitchen in the strip mall - they share, for instance, the parking lot (=memory) that limits how many people can enter and leave the mall as a whole.

OSs, even though they could treat all hyperthreads as fully-fledged CPUs, will typically be aware of the actual CPU topology and try minimize contention. For instance, in a 4 core system with 8 hyperthreads, OS would first try to place RUNNING software threads onto hyperthreads 0 and 2 (that is, far apart from each other, making it less likely that they contend for resources).

When we say CPU/core in shorthand what does that mean?§

In OS/systems, we often use the term CPU/core (or just "core" or just "CPU") to quasi synonymously to refer to the separate units the hardware provides, be they actual CPUs, non-hyperthreaded cores, or hyperthreads running on one core, or some combination thereof.
In general, our software design, particularly with respect to correctness, will not depend on how much many independent units the hardware provides, and the software/hardware interface is such that these units appear functionally equivalent to the single-core, single-hardware-thread CPUs at the beginning of the SMP era.