The Design of Cost-Effective Stride-Prefetching for Modern Processors

Hassan Al-Sukhni, James Holt, Daniel A. Connors, Mike Snyder, Matt Smittle, Brian Grayson
4th Workshop on Memory Performance Issues (WMPI-2006) February, 2006.
Data prefetching of regular access patterns is an effective mechanism to hide the memory latency for modern microprocessors. However, to be included in an architecture design, prefetching systems must be cost-effective and have little impact to the microarchitecture. For example, while many proposed prefetching systems use the full program counter (PC) to help detect patterns with arbitrary strides, such systems are impractical and prohibitive. To overcome the issues related to using the entire PC for effective prefetching, this paper combines other instruction attributes with a small subset of the PC to help detect the regularity in program data accesses. Such detection is enabled by a finite state machine that resolves data stream allocation, maintains prefetch priorities, and manages prefetch run-ahead. The experimental results suggest that as little as 4 bits of the PC are sufficient to achieve within 1% of the same prefetching effectiveness as using the full PC.

[ PDF ]