|
|
Characterizing the Use of Program Vulnerability Factors for Studying Transient Fault Tolerance in Multi-core Architectures
Robert Kost, Daniel Connors, Sudeep Pasricha.
Proceedings of the 2009 International Conference on Dependable Systems and Networks (DSN) Workshop on Compiler and Architectural Techniques for Application Reliability and Security (CATARS)
June,
2009.
|
Semiconductor transient faults (soft errors) are a critical design
concern in the reliability of computer systems. Most recent
architecture research is focused on using performance models to
provide Architecture Vulnerability Factor (AVF) estimates of processor
reliability rather than deploying detailed fault injection into
hardware RTL models. While AVF analysis provides support for
investigating new fault tolerant architecture techniques, program
execution characteristics are largely missing from determining periods
of soft error susceptibility. The primary problem with AVF is that
software periods of vulnerability substantially differ from
micro-architecture periods of vulnerability. As research trends
dictate finding ways to selectively enable software-based transient
fault tolerant mechanisms, run-time and off-line experimental
techniques must be guided equally by program behavior and hardware.
To address issues with AVF as well as the efficiency of fault
injection studies, we examine elements of Program Vulnerability Factor
(PVF) in the context of multi-core architectures. PVF has previously
been introduced to consider program behavior in the form of
memory/register vulnerability, however we explore static and profile
based techniques for extending the work. By leveraging PVF we explore
some initial contributions to the area of computer architecture
research. First, we demonstrate that a more efficient fault injection
campaign can be constructed and the outcome of fault injections in
application execution can be accurately predicted. Second, compiler
optimizations can be applied to better understand how the compiler
affects fault susceptibility and program behavior. Finally,
we motivate the need for developing a PVF metric for program
data that is communicated between cores.
|
| [ PDF ] |
|