Loading…

Resilience Articulation Point (RAP): Cross-layer dependability modeling for nanometer system-on-chip resilience

•Our RAP model enables systematic CMOS fault abstraction and error propagation.•RAP assumes that physically induced faults eventually manifest as bit flips.•Higher layer error models can be derived from probabilistic bit flip functions.•SoC designers can optimize system resilience across multiple ab...

Full description

Saved in:
Bibliographic Details
Published in:Microelectronics and reliability 2014-06, Vol.54 (6-7), p.1066-1074
Main Authors: Herkersdorf, Andreas, Aliee, Hananeh, Engel, Michael, Glaß, Michael, Gimmler-Dumont, Christina, Henkel, Jörg, Kleeberger, Veit B., Kochte, Michael A., Kühn, Johannes M., Mueller-Gritschneder, Daniel, Nassif, Sani R., Rauchfuss, Holm, Rosenstiel, Wolfgang, Schlichtmann, Ulf, Shafique, Muhammad, Tahoori, Mehdi B., Teich, Jürgen, Wehn, Norbert, Weis, Christian, Wunderlich, Hans-Joachim
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Our RAP model enables systematic CMOS fault abstraction and error propagation.•RAP assumes that physically induced faults eventually manifest as bit flips.•Higher layer error models can be derived from probabilistic bit flip functions.•SoC designers can optimize system resilience across multiple abstraction levels.•Real-world case studies with SRAM soft-errors in CPU cache and MIMO detector are shown. The Resilience Articulation Point (RAP) model aims at provisioning researchers and developers with a probabilistic fault abstraction and error propagation framework covering all hardware/software layers of a System on Chip. RAP assumes that physically induced faults at the technology or CMOS device layer will eventually manifest themselves as a single or multiple bit flip(s). When probabilistic error functions for specific fault origins are known at the bit or signal level, knowledge about the unit of design and its environment allow the transformation of the bit-related error functions into characteristic higher layer representations, such as error functions for data words, Finite State Machine (FSM) state, macro-interfaces or software variables. Thus, design concerns at higher abstraction layers can be investigated without the necessity to further consider the full details of lower levels of design. This paper introduces the ideas of RAP based on examples of radiation induced soft errors in SRAM cells, voltage variations and sequential CMOS logic. It shows by example how probabilistic bit flips are systematically abstracted and propagated towards higher abstraction levels up to the application software layer, and how RAP can be used to parameterize architecture-level resilience methods.
ISSN:0026-2714
1872-941X
DOI:10.1016/j.microrel.2013.12.012