The Memory Hierarchy: From CPU Registers to Long-Term Storage

In earlier chapters, we have seen how the CPU performs operations and how the Control Unit coordinates them. But a CPU alone can’t do much without a place to keep data and instructions. A running program needs numbers to add, addresses to jump to, and instructions to execute, and all of that must live somewhere, and that is memory. Not all memory is created equal. Different parts of the computer store data at different speeds and capacities, depending on how quickly the CPU needs to access it.

Registers, Cache, RAM, and Storage

Computers use several different types of memory and storage. Although all of these hold data, they serve very different roles. You can imagine them as a series of “shelves” at different distances from the CPU.

Registers

Continuing our shelf analogy from earlier, registers are the shelves closest to the CPU, so close in fact that they are part of the CPU itself. Registers are tiny, high-speed storage locations used to hold data while instructions are being executed. They store things like ALU inputs, memory addresses, counters, temporary results, and more.

Registers are extremely small, usually holding between 1 and 8 bytes depending on the CPU architecture. Likewise for the number of registers. Modern CPUs also include special, wider registers (SIMD registers) for processing many values at once, useful for multimedia and scientific computing. When we say a processor is “32-bit” or “64-bit,” we’re usually referring to the size of its general-purpose registers, the amount of data it can work with in a single operation.

Common Register Sizes

8-bit registers: Seen in older or simpler CPUs; hold a single byte.
16-bit registers: Used in early PCs and many microcontrollers.
32-bit registers: Common in older x86 CPUs and ARMv7 processors.
64-bit registers: Found in modern x86-64 and ARM64 CPUs; allow larger numbers and bigger memory addresses.

Common Types of Registers:

Program Counter (PC) / Instruction Pointer (IP): Holds the address of the next instruction to fetch.
Instruction Register (IR): Stores the instruction currently being decoded or executed.
Address Registers: Hold memory addresses used for load/store operations.
Stack Pointer (SP): Points to the top of the stack used for function calls and temporary data.
Data Registers (DR): Store data loaded from memory or I/O devices.
Accumulator (ACC): A special register used for arithmetic; historically very important, still present conceptually.
General-Purpose Registers (GPRs): Flexible registers used for calculations and data manipulation (R0, R1, …).
Status / Flags Register (SR): Stores condition bits (Zero, Carry, Overflow, etc.) that reflect the result of ALU operations.

See x86-64 Assembly Hello World: The Complete Docker-Based Guide if you want to play around with registers yourself.

Cache

CPU cache sits just outside the core, but still very close. Cache is a small, extremely fast type of memory that stores data the CPU is likely to need in the near future. Its main purpose is to reduce the time the CPU spends waiting for data from slower main memory (RAM).

Think of cache as a “middle shelf”: not as fast as registers, but much faster than RAM and far larger than the CPU’s limited registers.

Modern CPUs use several layers of cache:

L1 Cache: The smallest and fastest level, located inside each CPU core. Often split into separate instruction and data caches (L1i and L1d).
L2 Cache: Larger but slightly slower than L1. Still per-core.
L3 Cache: Much larger and slower than L1/L2, commonly shared across all CPU cores.

The deeper the level, the farther it is from the CPU core, and the slower it gets. However, even the slowest cache is far, far faster than RAM.

Caches work using the principle of locality:

Temporal locality – if you use data once, you’ll probably use it again soon.
Spatial locality – if you use data at one address, nearby data is likely needed too.

The CPU automatically loads data into the cache based on these patterns, so the next time it needs that data, it’s already nearby.

RAM (Main Memory)

If registers are the closest shelves and cache is the middle shelf, then RAM (Random Access Memory) is the large set of shelves farther away but still within quick reach. RAM stores the data and programs that the CPU is currently working on. It is much larger than cache, typically measured in gigabytes, but also much slower.

When you open an application, load a file, or run code, the operating system places that data into RAM so the CPU can access it quickly. If data is not in RAM, the CPU would have to fetch it from storage, which is slower.

RAM has several key characteristics:

Volatile: Its contents are lost when the computer powers off.
Random access: Any memory location can be accessed directly and in roughly the same amount of time.
Shared resource: All programs running on your system compete for space in RAM.
Bigger but slower: Tens of nanoseconds (50-100) to access data (compared to the cache’s sub-nanoseconds).

Because RAM is much slower than cache, the CPU relies heavily on caching to avoid waiting. A cache “miss” forces the CPU to fetch data from RAM, and this delay can hurt performance significantly.

RAM exists in different forms and technologies:

DRAM (Dynamic RAM): The most common type. Needs to be refreshed constantly.
SDRAM (Synchronous DRAM): Works in sync with the CPU clock.
DDR (Double Data Rate SDRAM): Modern high-speed family of RAM (DDR3, DDR4, DDR5).

You don’t need to know the hardware details to understand its role: RAM is the CPU’s working area, holding the data and code currently in use.

Storage (HDD, SSD, and More)

If registers are the closest shelves and RAM is the workspace, then storage is the large warehouse where data is kept long-term. Storage is where your computer permanently keeps files, programs, photos, operating system data, and everything else, even when the power is off.

Storage is much slower than RAM, but it is also much larger, typically measured in hundreds of gigabytes or even terabytes.

Common types of storage include:

HDD (Hard Disk Drive): Uses spinning disks and a moving read/write head. Much slower than SSDs but still common for large, inexpensive storage.
SSD (Solid-State Drive): Has no moving parts. Much faster than HDDs, vastly improving boot times, loading times, and responsiveness.
NVMe SSD: A newer type of SSD that connects directly to the CPU via PCIe, offering extremely high read/write speeds.

Storage is non-volatile, meaning data remains even when the computer is turned off, unlike RAM.

Whenever the system needs something that’s not in RAM, it loads it from storage. If RAM fills up, the operating system may even use the storage device as “overflow,” in a process called paging or swapping, though this is much slower.

Storage isn’t designed for speed; it’s designed for capacity and long-term permanence.

How Memory Addressing Works

So far, we’ve discussed different types of memory and storage, but how does the CPU locate specific data within all that memory? This is where memory addressing comes in.

Every byte in RAM has a unique identifier, known as a memory address. You can think of RAM as a long row of tiny mailboxes, each with its own number starting from 0. When the CPU needs data, it provides the address of the mailbox where that data lives.

For example, if the CPU wants the byte stored at address 1000, it sends the number 1000 across the address bus. RAM receives this address, looks it up, and returns the correct data byte.

Addresses Are Just Numbers

Memory addresses are simply binary numbers. A CPU with:

32-bit addressing can theoretically access up to 2³² bytes (4 GB) of memory
64-bit addressing can (in theory) access 2⁶⁴ bytes, an enormous number, far beyond what modern computers actually install

This is one of the big reasons 64-bit CPUs became standard: they can address far more memory than 32-bit systems.

Addressing Words, Not Just Bytes

While every byte has an address, the CPU often works with bigger chunks called words. The word size usually matches the CPU’s register size, which is why “32-bit” and “64-bit” processors get their names, so:

A 32-bit CPU uses 32-bit (4-byte) words
A 64-bit CPU uses 64-bit (8-byte) words

Even though memory is byte-addressable, the CPU might fetch a whole word at once. This is because the CPU usually operates on full words at a time, not single bytes.

Pointers and Addresses

When a program stores a memory address in a variable, that variable is called a pointer. Instead of holding actual data, it holds the location of data. You’ll encounter pointers in low-level programming (like C or assembly), and understanding memory addresses is key to using them correctly.

Sequential Instructions

Instructions stored in memory also have addresses. The Program Counter (PC) holds the address of the next instruction, and the CPU increases it as it executes instructions sequentially, unless a jump or branch changes it.

How the CPU Sends and Receives Addresses

The CPU communicates with memory through two main channels:

Address bus: sends where the data is
Data bus: sends the actual data back and forth

When the CPU wants to read memory:

It places an address on the address bus
Memory responds by putting the requested data on the data bus
The CPU reads that data

When writing, the CPU places both the address and the data on the appropriate buses.

Virtual vs. Physical Memory

Modern operating systems do not let programs access physical RAM directly. Instead, every program sees its own private world of memory called virtual memory. When you see an address in a debugger like 0x00400000, that number is not the true physical RAM location. It is a virtual address that must be translated.

This translation is handled by a small hardware unit inside the CPU called the Memory Management Unit (MMU).

Why have virtual memory at all? Virtual memory gives us:

Protection: One program cannot read or overwrite another program.
Simplicity: Each program sees a nice, clean, continuous block of memory.
Flexibility: Programs can use more memory than physically installed (thanks to paging).
Sharing: System libraries can be shared between programs without copying them.

How the Mapping Works

To avoid translating billions of individual bytes, the OS divides memory into fixed-size blocks called pages, with a typical size of 4 KB. Both virtual memory and physical memory are divided into these 4 KB units.

So instead of mapping:

Virtual → Physical
Byte 14 → Byte 214920
Byte 15 → Byte 214921
Byte 16 → Byte 214922

The system maps at page granularity:

Virtual Page #12 → Physical Page #203
Virtual Page #13 → Physical Page #8
Virtual Page #14 → (not in RAM — stored on disk)

Each program has its own page table, which is a big list that tells the MMU: “If the CPU accesses virtual page X, it actually lives at physical page Y.”

What the MMU Does on Every Memory Access

Every time the CPU touches memory (load, store, fetch instruction), the MMU:

Takes the virtual address
Splits it into:
- a page number
- an offset inside the page
Looks up the page number in the page table
Finds the physical page number
Recombines

physical page number + offset

to form the final physical address
RAM is accessed normally using that physical address.

This happens billions of times per second and is completely invisible to software. Additionally, to make these translations faster, the CPU stores recently used mappings in a tiny cache called the TLB (Translation Lookaside Buffer), so most address translations never require a full page-table lookup.

Translation Example

Virtual address:

0x00403A10

Break into:

Virtual page number = 0x00403
Offset = 0xA10 (the location inside that 4 KB page)

MMU finds:

Virtual page number = 0x00403
Offset inside the page = 0xA10

Final physical address:

Physical Page 0x1A2C + offset 0xA10 = 0x1A2CA10

The CPU now reads or writes that physical address.

What If a Page Isn’t in RAM? (Page Faults)

If the MMU sees that a virtual page isn’t currently loaded in RAM, the CPU triggers a page fault.

The OS then:

Pauses the program
Loads the page from storage (SSD) into RAM
Updates the page table
Resumes the program

This is how the system gives programs more “memory” than you physically have.

Memory Hierarchy

Now that we’ve seen how memory works at a low level, we can look at how all these different types fit together into the overall structure of a computer’s memory system. All these different kinds of memory (registers, cache, RAM, storage) form what’s known as the memory hierarchy. Each level trades capacity for speed:

Registers: Tiny, fastest, directly inside the CPU
L1/L2/L3 Cache: Small, fast, close to the CPU
RAM: Large, much slower
Storage (SSD/HDD): Huge, much slower
External storage/cloud: Gigantic capacity, extremely slow

As we move down the hierarchy:

Speed decreases
Size increases
Cost per byte decreases

The CPU always tries to work with data from the highest (fastest) level available. If the data isn’t there, it must fetch it from a slower level in a process called a memory access, which takes more time the further down the hierarchy it goes.

This hierarchy is the backbone of computer performance. It allows the CPU to operate at high speed without needing gigabytes of expensive, ultra-fast memory.

Much of computer architecture, from cache design to operating systems, exists to make this hierarchy feel as fast as possible to software.