Apr 23, 2024  
2020-2021 Undergraduate Catalog 
    
2020-2021 Undergraduate Catalog [ARCHIVED CATALOG]

Add to Portfolio (opens a new window)

ECE 4740 - Fault Tolerance and Reliability in High-Performance Computing

3 Credits (3 Contact Hours)
Survey of current fault tolerance and reliability issues on high-performance computing (HPC) systems. Topics include taxonomy of failures and errors, checkpoint-restart, fault injection techniques, soft error detection schemes, and lossy compression. Preq: ECE 3220  or ECE 3290 . ECE 4730  is recommended, but not required.

This 4000-level course has a 6000-level counterpart. Students should refer to the Graduate Announcements for the 6000-level description and requirements.



Add to Portfolio (opens a new window)