| |
Apr 21, 2026
|
|
|
|
|
CPSC (ECE) 6740 - Fault Tolerance and Reliability in High-Performance Computing3 Credits (3 Contact Hours) Survey of current fault tolerance and reliability issues on high-performance computing (HPC) systems. Topics include taxonomy of failures and errors, checkpoint-restart, fault injection techniques, soft error detection schemes, and lossy compression. Students are expected to have completed coursework comparable to ECE 3220 or ECE 3290 before enrolling in this course. It is recommended that students also have completed coursework comparable to ECE 4730/6730 before enrolling. May also be offered as ECE 6740 .
Add to Portfolio (opens a new window)
|
|