|
Nov 21, 2024
|
|
|
|
ECE 4740 - Fault Tolerance and Reliability in High-Performance Computing 3 Credits (3 Contact Hours) Survey of current fault tolerance and reliability issues on high-performance computing (HPC) systems. Topics include taxonomy of failures and errors, checkpoint-restart, fault injection techniques, soft error detection schemes, and lossy compression. Preq: ECE 3220 or ECE 3290 . ECE 4730 is recommended, but not required.
This 4000-level course has a 6000-level counterpart. Students should refer to the Graduate Announcements for the 6000-level description and requirements.
Add to Portfolio (opens a new window)
|
|