|
|
A Dynamically Reconfigurable Cache for Multithreaded Processors
Alex Settle, Daniel A. Connors, Enric Gibert, Antonio Gonzalez.
Journal of Embedded Computing: Special Issue on Single-Chip Multi-core Architectures.
December,
2005.
|
In order to leverage increasingly many transistors, designers are
moving toward including multiple processor cores on a single chip die,
known as chip multi-processors (CMP). These systems typically include
multi-threaded support, such as simultaneous multithreading (SMT) and
coarse-grain multithreading(CGMT), on each of the processor cores to
enable cost-effective high-throughput execution. Such architectures
are expected in the embedded domain, although their adoption requires
that a number of unique, embedded systems constraints be addressed.
Specifically, issues in the cache and memory system must be adequately
resolved to eliminate the interference between co-active application
threads in the system. Traditionally, the CMP cache hierarchy can
either be shared across the cores or duplicated for each one. The
decision to offer fully shared or fully distributed cache hierarchies
is a design constraint that is driven by both power consumption and
the chip area required for each of the supported cores. At the very
least though, cache hierarchies support sharing for the individual
hardware contexts that run on each core. To date, the majority of
design techniques for improving multithreaded processor execution are
focused on enabling resource utilization through instruction
scheduling and novel pipeline concepts. However, when independent
applications share the cache memory systems, severe performance
penalties can result depending on the characteristics of the
co-scheduled jobs. This penalty can be a major barrier to leveraging
multi-core and multithreaded architectures for the embedded systems
domain since the interference of co-active applications can compromise
the expected system characteristics (e.g., missed real-time
deadlines). In particular, co-scheduled applications compete for
cache resources and combine to create a collective set of memory
requests that cannot be adequately supplied through the use of a
traditional cache system designed for a single-thread processor.
To resolve the CMP issues for the embedded computing domain,
adaptable hardware-based cache allocation systems are needed
to balance the resource demands of each application and improve the overall
throughput of the collective workload.
For several different
workloads of two co-scheduled applications, experimental results
demonstrate speedups of up to 1.47X against a fully-shared two-level cache
hierarchy and on average a 1.10X speedup over the leading cache partitioning
model. Overall, by dynamically managing cache storage for
multiple application threads at runtime, sizable performance levels are
achieved, which can provide chip designers the opportunity to maintain
high performance as cache size budgets are becoming a concern in the CMP
design space.
|
| [ PDF ] |
|