Components of the CPU

Although we typically view the Central Processing Unit (CPU) as a single component of a larger computer system, the CPU is actually comprised of several different components, including the Control Unit, ALU, and interfaces to memory and I/O devices. These components are generally organized around the von Neumann Architecture.

Page Contents

Video Lecture
Basic Components
Memory Interface
Input and Output
Registers and Cache
Notes and References

Video Lecture

Watch at Internet Archive

Basic Components

At the heart of a modern CPU is the Control Unit, which is responsible for the overall operation of the CPU. This unit still implements the CC as described by von Neumann¹, providing central control signals to the other components of the CPU to make them operate. A central CPU clock is connected to the Control Unit, and it is this clock that determines the speed in cycles per second, or Hertz (Hz), at which CPU operations will be conducted.

The Arithmetic Logic Unit (ALU) is responsible for performing basic calculations, implementing the CA part of the von Neumann Architecture. This component implements the calculator function of the computer system, performing (at a minimum) addition, complementing (negation), and shifting. Hardware-based subtraction and other calculation types may be provided, depending on the CPU architecture. Logical operations involving the bits of the input values are also performed. These bitwise operations perform fundamental gate operations (such as AND, OR, and XOR) on the individual bits the make up the input values.

Importantly, the ALU operates only on integers. For those CPUs that support floating-point operations, a separate Floating-Point Unit (FPU) is provided. It is not necessary for a CPU to include an FPU, since floating-point libraries are available to carry out floating-point computations in software. However, an FPU does provide a significant performance increase when performing floating-point calculations.

Memory Interface

The CPU is connected to the system’s main memory (M in the von Neumann Architecture) via a bus, which is simply a set of circuit board traces providing a shared communications channel between the CPU and the individual memory modules. This bus is essentially a collection of multiple wires shared between the various microchips. Using a combination of clock signals and electrical pulses, the CPU and memory modules communicate over these wires.

Optional dedicated hardware within the CPU may be used to improve memory performance. One such performance-enhancing component is an Address Generation Unit (AGU), which quickly calculates memory addresses in the main memory, removing the need for the ALU to be used each time a memory address calculation is needed. This approach improves performance by enabling the ALU to remain dedicated to the computational workload, instead of having to stop the workload to compute an address.

Another way to improve performance is to implement a Memory Management Unit (MMU) inside the CPU. This component provides hardware translation of virtual memory addresses, which are found inside software applications, to physical memory locations. The MMU works by managing the system page table, which maps virtual memory addresses into physical addresses. Further performance improvements can be realized by caching individual page table entries in a Translation Lookaside Buffer (TLB), which is a small region of extremely fast (and expensive) memory that avoids having to search through the page table to find entries for frequently-used virtual memory addresses.

Input and Output

The final piece of the CPU implements the final two logical parts of the von Neumann Architecture: a way to get input into the CC and CA (or Control Unit and ALU in modern terminology) and a way to record output. In von Neumann’s day, input and output devices were often distinct, since a paper tape reader and a printer (or tape punch) required completely separate hardware. Today, we have hardware devices that can perform both input and output functions. The touch screen on your mobile phone is a good example.

In order to get input into, and output from, the CPU, additional buses are provided. Types and numbers of buses depend on the CPU architecture and model. For example, embedded CPUs may have General Purpose Input/Output (GPIO) pins to which hardware components are directly wired, such as those on the Broadcom ARM CPU powering a Raspberry Pi.² A common type of I/O interface on desktop CPUs is the PCI Express bus.

For increased performance, it is usually desirable for the I/O devices to read from and write to special regions in the system memory directly, bypassing the CPU. This performance hack is called Direct Memory Access (DMA), and it is an alternative to Programmed I/O (PIO), which requires the CPU to perform every I/O read or write. There are some security concerns with permitting hardware devices to access memory without the involvement of the CPU, since a malicious or malfunctioning device could corrupt data in memory regions that it should not be able to access. An Input-Output Memory Management Unit (IOMMU) helps to mitigate this risk by allowing I/O devices to use virtual addresses for DMA transfers. The IOMMU translates the virtual addresses used by a device to physical addresses in main memory.

Registers and Cache

Although the von Neumann Architecture originally used the main memory for intermediate results of computations, significant performance improvements were realized by putting a small amount of memory inside the CPU itself. The CPU registers are extremely fast memory locations that hold the immediate inputs and outputs of running a single CPU instruction. As of late 2020, the newest generation of 64-bit Intel CPUs contain 16 general-purpose registers that can be accessed by software.³ Each register normally holds a single 32-bit or 64-bit value.

Another way to improve performance is to transfer data from main memory into a cache on the CPU, where it will be readily available for use by the ALU or Control Unit. Different caching strategies exist, but caches are normally split into multiple levels, where the fastest level (L1) is usually the smallest (due to cost), and higher levels become larger but somewhat slower. The L1 cache may also take the form of a split cache, in which the CPU instructions and data are separated. Splitting the L1 cache changes the CPU internal design into a Modified Harvard Architecture, since the machine instructions end up taking a separate path from the data.

Notes and References

Michael D. Godfrey. “First Draft Report on the EDVAC by John von Neumann.” IEEE Annals of the History of Computing 15(4): 27-43, 1993. Article on ResearchGate ↩
Raspberry Pi ↩
Intel 64 and IA-32 Architectures Software Developer Manuals ↩

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.