|
|
Understanding Cache Interference
Alex Settle.
Ph.D. Dissertation, Department of Electrical and Computer Engineering, University of Colorado.
November,
2006.
|
System performance is increasingly coupled to cache hierarchy design as chip-multiprocessors
(CMPs) increase in core count across generations. Higher core counts require large last level cache
(LLC) capacities to avoid costly off-chip memory bandwidth and the inherent bottleneck of memory
requests from multiple active cores. Currently there are two divisions of thought in CMP cache
design- shared versus private last level cache. At center of the issue is that CMP systems can improve
different workloads: the throughput of multiple independent single-threaded applications and the
high-performance demands of parallel multi-threaded applications. Consequently, maximizing the
improvement of CMP performance in each of these domains requires opposing design concepts. As a
result, it is necessary to investigate the behaviors of both shared and private LLC design models, as
well as investigate an adaptive LLC approach that works for multiple workloads.
This paper proposes a scalable CMP cache hierarchy design that shields applications from inter-
process interference while offering a generous per core last level cache capacity and low round trip
memory latencies. A combination of parallel and serial data mining applications from the Nu-Mine
Bench suite along with scientific workloads from the SpecOMP suite are used to evaluate the cache
hierarchy performance. The results show that the scalable CMP cache hierarchy decreases the average
memory latency of the parallel worklodas by 45% against the private cache configuration and an
average of 15% against the shared cache. In addition, the memory bandwidth is 25% lower than the
private cache bandwidth for parallel applications and 30% lower for the serial workloads.
|
| [ PDF ] |
|