Modern software performance depends heavily on how efficiently memory is accessed, and not just on raw CPU speed. Sequential memory access tends to be much faster than random access, and understanding why requires a deeper look inside the memory subsystem, especially DRAM’s design.
There are two critical factors that influence memory performance:
- Access time per memory access → how long it takes to fetch a single piece of data from memory (this is where DRAM latency plays a role).
- The number of memory accesses → how often your program needs to fetch data, which depends on your choice of data structures, algorithms, data organization, and access patterns. Programs that reuse data or access nearby memory locations require fewer fetches due to CPU caching.
This article focuses solely on the first factor → DRAM access latency → and explains how DRAM is architected → and how different access patterns affect its performance. By grasping these fundamentals, you’ll be better equipped to write performance-aware code.
(We won’t cover CPU caches or virtual memory in this article — those will be discussed in follow-up articles to complete the full memory access picture.)
Anatomy of DRAM Components
- Channels: DRAM modules are divided into multiple channels. Each channel operates independently and has its own dedicated data bus, allowing true parallel data transfers across channels to the Memory Controller.
- Data Bus per Channel: Within each channel, the data bus is shared and serialized. This means only one data transfer can occur at a time from any bank in the channel to the memory controller, even if multiple banks are being accessed in parallel.
- Ranks: A rank is a group of 8 DRAM chips that work together to move 64 bits of data at once over the 64-bit data bus. Each chip adds 8 bits, taken from the corresponding row and column inside its own corresponding memory bank (Bank 3 in the above diagram). This way, all chips access and send their part of the data at the same time.
- Banks: Each rank has several banks, which are smaller sections of memory. Banks can work independently and in parallel with each other, but inside each bank, memory accesses are serialized.
- Rows and Columns: Each bank is organized as a grid of rows and columns. Every intersection of a row and column corresponds to a memory cell holding 1 byte of data, which represents one memory address.
- Row Buffer: Each bank has a row buffer that holds the currently active row’s data for faster access. Only one row per bank can be active at a time. That is why memory accesses within a single bank are serialized.
How DRAM Access Works Internally
When the CPU requests data from memory, it sends a memory address to the Memory Controller. The Memory Controller then translates this memory address into a specific bank, row and column that corresponds to that address.
Note: Each bank in the memory can have only one row active (open) at a time. There are three possible cases when accessing the requested row:
- Row Buffer Conflict (~75 nanoseconds total): If a different row (let’s call it Row X’) is currently active in the bank, the memory controller must first wait a bit (around 30 nanoseconds), then close (precharge) that active row (which takes about 15 nanoseconds), and finally open (activate) the requested row X (which takes another 15 nanoseconds). This process takes the longest time overall.
- Row Buffer Hit (~15 nanoseconds total): If the requested row X is already active, the controller skips the closing and opening steps. This means the access is much faster since no extra delays are involved.
- Row Buffer Miss (~30 nanoseconds total): If no row is currently active in the bank, the memory controller simply activates the requested row X (taking about 15 nanoseconds).
After the correct row is active, the memory controller reads the requested column Y (which takes about 15 nanoseconds). Finally, the bank returns the requested data to the memory controller, which then sends it back to the CPU.
DRAM Bottlenecks and Their Impact on Memory Access
-
Shared Data Bus Bottleneck: As mentioned earlier, only one data transfer can happen at a time per channel, even if multiple banks are active. This bottleneck generally becomes significant during bursty or highly parallel traffic when many banks try to send data simultaneously over the shared data bus per channel.
- Latency impact: Each data transfer takes about 15 nanoseconds (tCAS).
- Sequential access: This bottleneck is typically less noticeable for bursty sequential memory accesses because modern CPUs use caches (L1, L2, L3), virtual memory page caching, and hardware prefetching. These mechanisms significantly reduce the number of DRAM accesses required and enable smoother, continuous data transfers over the shared bus.
- Random access: In bursty random access patterns, memory requests tend to target many banks unpredictably and in parallel often leading to bus contention, queuing, and delays when the bus becomes saturated. Moreover, caching and prefetching techniques don’t work well with random accesses because the requests lack predictable patterns, so the system can’t reduce the number of DRAM accesses effectively, making the bus bottleneck impact even more severe for bursty random access patterns.
-
Bank Access Bottlenecks:
As noted earlier, banks operate in parallel, but within each bank, memory accesses occur sequentially. Additionally, only one row can be active in a bank at any given time. Accessing a different row requires the bank to first precharge (close) the currently active row before activating the new one → a process that incurs significant delay.
-
Latency impact:
- Row buffer hit (same row active): ~15 ns
- Row buffer miss or conflict (need precharge optionally + activate): ~ 30–75 ns
- Sequential access: When memory accesses tend to stay within the same row of a bank for some time, the chances of hitting an already active row (row buffer hit) are high. This minimizes costly precharge and activate cycles, resulting in faster access.
- Random access: Memory requests spread unpredictably across different rows and banks, causing frequent row buffer misses and forcing the bank to precharge and activate rows repeatedly. This serialization significantly increases latency due to longer row switching delays.
-
Latency impact:
Conclusion: Comparing Sequential and Random Memory Access
Sequential memory accesses are generally faster because they align well with hardware optimizations like CPU caches, virtual memory page caching, memory prefetching, even during burst transfers. These mechanisms reduce both the number of DRAM accesses and the time per access by taking advantage of predictable patterns and row buffer hits.
On the other hand, random accesses often suffer from row buffer conflicts, serialized access within banks, and data bus contention, especially during bursty or highly parallel traffic. Caching and prefetching are less effective here, making random patterns appear slower.
However, random access isn’t inherently bad → when combined with smart data structures and algorithms that reduce the total number of memory fetches (like B-trees or hash maps), even high-latency accesses can be masked, leading to efficient overall performance.