Apr 21, 2026  
2023-2024 Graduate Catalog 
    
2023-2024 Graduate Catalog [ARCHIVED CATALOG]

Add to Portfolio (opens a new window)

CPSC (ECE) 6740 - Fault Tolerance and Reliability in High-Performance Computing

3 Credits (3 Contact Hours)
Survey of current fault tolerance and reliability issues on high-performance computing (HPC) systems. Topics include taxonomy of failures and errors, checkpoint-restart, fault injection techniques, soft error detection schemes, and lossy compression. Students are expected to have completed coursework comparable to ECE 3220 or ECE 3290 before enrolling in this course. It is recommended that students also have completed coursework comparable to ECE 4730/6730 before enrolling. May also be offered as ECE 6740 .



Add to Portfolio (opens a new window)