Course Websites

CS 598 XU - Reliability of Cloud-Scale Sys

Last offered Spring 2022

Official Description

Subject offerings of new and developing areas of knowledge in computer science intended to augment the existing curriculum. See Class Schedule or departmental course information for topics and prerequisites. Course Information: May be repeated in the same or separate terms if topics vary.

Section Description

The purpose of this course is to teach the principles and practices of reliability engineering in modern "cloud-scale" systems, and expose students to the research of software and system reliability. We will look at how large-scale systems fail in the real world, and we will study the state-of-the-art reliability techniques and practices, including those widely adopted in industry and new ideas proposed by academia. We will be going over the following topics: * Availability * Hardware faults * Software defects * Misconfigurations * Operation mistakes * Network disruptions * Bug detection * Software testing * Failure testing and chaos engineering * Load tests and drain tests * Monitoring * Recovery * Tracing * Diagnosis This is a research-oriented seminar course with a major course project. Students will get familiar with the technical results as well as with the process of doing research in software testing, analysis, and analytics. The aim is to help students start research in th

Related Faculty

TitleSectionCRNTypeHoursTimesDaysLocationInstructor
Reliability of Cloud-Scale SysXU60198S741530 - 1645 T R  2233 Everitt Laboratory Tianyin Xu