Resources
Textbook Draft
| Chapter | Link | Name |
|---|---|---|
| 1 | Classical Dependability Techniques & Modern Computing Systems: Where and how do they meet? | |
| 2 | Hardware and Software Error Detection and Example Applications | |
| 3 | Processor Level Detection and Recovery | |
| 4 | Data Analysis | |
| 5 | Software Detection | |
| 6 | Reliable Networked and Distributed Systems | |
| 7 | Checkpointing and Rollback Error Recovery | |
| 8 | Checkpointing Large-Scale Systems | |
| 9 | Internals of Fault Injection Techniques | |
| 10 | Safeguarding Current Technologies |
Probability Review
| Item | Link | Name |
|---|---|---|
| 1 | Link | Prof. Hajek's course notes for ECE313 |
| 2 | Link | Prof. Iyer's lecture slides for ECE313 |
Other recommended texts
- D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems - Design and Evaluation, Digital Press, 1998, 3rd edition.
- M. Singhal and N.G. Shivaratri, Advanced Concepts in Operating Systems, McGraw-Hill, 1994.
- D. K. Pradhan, ed., Fault Tolerant Computer System Design, New Jersey: Prentice-Hall, 1996.
- B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison Wesley, 1989.
- M. R. Lyu, ed., McGraw-Hill Handbook of Software Reliability Engineering, McGraw-Hill 1996.
- M. R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, 1995.
- K.P. Birman, Building Secure and Reliable Network Applications, Manning, 1996.
- K.S. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, John Wiley & Sons, 2nd edition, 2002.
- P. Jalote, Fault Tolerance in Distributed Systems, Prentice-Hall, Inc. 1994.
- M. Shooman, Probabilistic Reliability: An Engineering Approach, McGraw-Hill, 1968.