## **Reliability Engineering and Fault Tolerant Computing** 9<sup>th</sup> -13<sup>th</sup> January 2017, Department of CSE, Indian Institute of Technology Patna ## **Overview** This course aims at a thorough understanding the motivations, strengths, limitations, costs and effectiveness of the key hardware fault tolerance approaches that have emerged over the past few decades. While the first part of the course will review test basics of reliability and fault tolerance schemes, including their quantitative evaluation, to facilitate a full understanding of the advanced material to follow, the latter third will focus on discussions of a range of actual fault tolerant architectures that have been delayed in applications as diverse as the space shuttle flight control, in engine control on commercial aircraft, control of nuclear reactors, on-line electronic stock trading exchanges, and mainframe enterprise systems. The course will be presented by Professor Adit Singh, who is an IEEE Fellow and leading expert on electronics system test, reliability and fault tolerance, have worked in this field for nearly 40 years. At various times, he has served as a consultant to most of the major semiconductor companies, and holds international patents in the test field that have been licensed to industry. He has also served as Chair (2007-11) of the IEEE Test Technology Technical Council. Importantly, Dr. Singh has taught dozens of very popular short courses and tutorials worldwide, at conferences and inhouse for industry, system reliability and fault tolerance. Objectives: The primary objectives of the course are to: - 1. Provide a strong fundamental understanding of reliability and fault tolerance: the various kinds of threats to electronic systems; defects, faults, and errors; stochastic modeling of failure/hazard rates and operational system lifetimes; redundancy for fault tolerance; TMR and other static (masking) redundancy approaches; active and hybrid redundancy; time redundancy against intermittent errors; check-pointing, rollback and recovery; examples of redundancy in real systems; reliability modeling of redundant systems. - 2. In-depth discussion of some real fault tolerant architectures: the evolution of the Reliability, Availability and Serviceability (RAS) features in IBM mainframes; the Non-Stop architecture from Tandem/Compaq/HP running the NASDAQ electronic stock exchange; the 5 redundant flight control system in the NASA space shuttles; quad redundant engine controllers on GE aircraft engines, etc. | Modules | The course shall consist of 5 modules each of approximately two hour durations as | |------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | appropriate for the topic, adding up to a total of 10 hours of lectures. | | | <ol> <li>Introduction: Reliability Modeling, Understanding failures -the threat to system reliability</li> </ol> | | | <ol><li>Basic redundancy and fault tolerant approaches</li></ol> | | | 3. Fault tolerance on chip for yield enhancement and power management | | | 4. Fault Tolerant Architectures I: Commercial systems | | | 5. Fault Tolerant Architectures II: Aerospace and life critical systems | | | | | | Number of participants for the course will be limited to fifty. | | You Should | <ul> <li>you are an Mechanical/Electrical/electronics/computer engineer or research scientist</li> </ul> | | Attend If | interested in Reliable Systems | | | <ul> <li>you are a student or faculty from academic institution interested in learning how to</li> </ul> | | | build reliable systems or subsystem or fault tolerant Computing | | Fees | The participation fees for taking the course is as follows: Participants from abroad : US | | | \$250; Industry/ Research Organizations: Rs. 2500; Academic Institutions: Rs. 1000 | | | The above fee include all instructional materials, computer use for tutorials and | | | | | | | | Fees | The participation fees for taking the course is as follows: Participants from abroad : \$250; Industry/ Research Organizations: Rs. 2500; Academic Institutions: Rs. 1000 | ## Reliability Engineering and Fault Tolerant Computing 9<sup>th</sup> -13<sup>th</sup> January 2017, Department of CSE, Indian Institute of Technology Patna Teaching Faculty: Prof. Adit D. Singh received an undergraduate degree from the Indian Institute of Technology (IIT) Kanpur (1976), and the M.S. (1978) and Ph.D. (1982) from Virginia Tech, all in Electrical Engineering. Since September 2002, he has served as James B. Davis Distinguished Professor of Electrical and Computer Engineering at Auburn University, where he directs the VLSI Design and Test Laboratory. Before joining Auburn in 1991, he was Associate, and earlier Assistant, Professor of Electrical and Computer Engineering at the University of Massachusetts in Amherst, and a full time Instructor at Virginia Tech (1978-82). He has also held visiting positions during sabbaticals at major universities, most recently in 2012 serving as "Guest Professor" at the University of Freiburg, Germany. His research program has received extensive support from US National Science Foundation and private industry, and also from international agencies such as the Max Plank Society of Germany, the Fulbright Foundation, the Ministry of Science and Technology in India, and the National Science Council of Taiwan. Dr. Singh's technical interests span all aspects of VLSI technology, in particular, integrated circuit test, reliability and fault tolerance. He is particularly recognized for his pioneering contributions to statistical methods in test and adaptive testing. He has published over two hundred research papers, served as a consultant to many of the largest semiconductor companies around the world, and holds international patents that have been licensed to industry. He has held leadership roles as General Chair/Co-Chair/Program Chair for dozens of international VLSI design and test conferences, including co-founding the annual India based International Conference on VLSI Design (with Professor Vishwani Agrawal) in 1990-91. Most recently he was Program Chair of the 2014 International Conference on VLSI Design, Co-Chair of the 2014-16 Workshop on Reliability Aware Design, and is the Program Chair for the 2015 Asian Test Symposium. He currently also serves on the editorial boards of IEEE Design and Test Magazine and the Journal of Testing and Test Applications (JETTA), and on the Steering and Program Committees of many of the major IEEE international test and design automation conferences. Dr. Singh is also a very popular lecturer. In addition to the dozens of talks and seminars he has presented around the world on his research, he is regularly invited by conferences and industry to conduct short courses on cutting edge technical topics in his specialty. Over the years, he has conducted almost 100 such courses, ranging from half a day to three days in length, in over a dozen different countries, and in-house for many major companies (IBM, Texas Instruments, AMD, National Semiconductor, NXP, Advantest etc.). Dr. Singh has received numerous research and teaching awards. He was elected Fellow of IEEE in 2002 for "contributions to defect based testing and test optimization in VLSI circuits". He is Golden Core member of the IEEE Computer Society. More information visit: http://www.eng.auburn.edu/~adsingh/ ## **Course Coordinators / Host Faculties** Dr. Jimson Mathew and Dr. Arijit Mondal Department of Computer Science and Engineering Indian Institute of Technology Patna, India. Email: {jimson, arijit}@iitp.ac.in