Analyzing and Reducing Silent Data Corruptions Caused By Soft-Errors

SESSION: Doctoral Showcase - Dissertation Research Showcase

EVENT TYPE: Doctoral Showcase

TIME: 10:30AM - 10:45AM


Presenter(s):Siva Kumar Sastry Hari


Hardware reliability becomes a challenge with technology scaling. Silent Data Corruptions (SDCs) from soft-errors pose a major threat in commodity systems space. Hence significantly reducing the user-visible SDC rate is crucial for low-cost in-field reliability solutions. This thesis proposes a program-centric approach to identify application locations that cause SDCs and convert them to detections using low-cost program-level error detectors. We developed Relyzer to obtain a detailed application resiliency profile by systematically analyzing all application fault-sites without performing time-consuming fault injections on all of them. It employs novel fault pruning techniques to lower the evaluation time by 99.78% for our workloads. Using Relyzer, we obtained and analyzed the comprehensive list of SDC-causing instructions. We then developed program-level error detectors that on average provide a much lower-cost alternative to a state-of-the-art solution for all SDC rate targets. Overall, we provide practical and flexible choice points on the performance vs. reliability trade-off curves.

Chair/Presenter Details:

Yong Chen (Chair) - Texas Tech University

Siva Kumar Sastry Hari - University of Illinois at Urbana-Champaign

