Administrivia


About the Class

This course will use real data from several domains (including resiliency and healthcare science) to instill in the students following data-science expertise:

  • Data management
  • Feature engineering by identifying key data characteristics
  • Problem formulation, machine learning models and validation
  • Design, construction, and assessment of end-to-end application workflows by using data-driven insights
  • Generating application domain insights that can improve the understanding of the domain problem

Through this course, students will be able to develop expertise and ability to form intuitions which can be more broadly applicable. Such an expertise will prepare students for data science engineering roles.

Course Description

Many modern application domains require engineers and domain experts to work together in the design, and analysis of heterogeneous datasets often with the objective of automating the decision making (sometimes referred to as actionable intelligence). Extracting the right level of knowledge to generate actionable intelligence from these datasets is a compelling problem.

The proposed course addresses this problem by providing students with an opportunity to build analysis workflows that use data management, feature engineering, supervised and unsupervised learning to derive real-world insights. In this course, students will have an opportunity to work on real-world applications while interacting with domain experts.

This course instills skillsets required for constructing end-to-end real-world analyses workflows through lectures, labs and mini-projects which will allow students to derive domain insights.

The course will use real-world examples (measurement logs from supercomputers, data from large clinical trials) to teach data-management, feature engineering, supervised/unsupervised learning, and testing & validation techniques. Students will gain hands-on implementation experience by completing three mini-projects requiring them to analyze substantial amount of high-fidelity real-world measurement datasets obtained from different application domains such as computer system design and healthcare applications. While each workflow is end-to-end, the student will go deeper into the understanding of the methods as one progresses through these projects. Moreover, students will learn to create quantifiable domain-specific metrics of interest (cost models) and use techniques to quantitatively assess their results.

Students will develop the expertise and an ability to form intuitions which are applicable in other domains.

This course will feature several guest lectures from domain experts who will demonstrate how innovative data-analyses techniques have been transformative and as a result generated significant societal impact.

Course Objectives and Learning Outcomes

After completing this course, students should:

  • Understand how to handle multimodal data from a broad spectrum of areas ranging from autonomous systems to healthcare with a particular focus on application of probabilistic graph models (PGMs) incorporating other ML models such as Neural Networks
  • Recognize and decide what model to use and when depending on data availability and project objectives
  • Ability to implement models on real-world examples using real data
  • Derive insights from data by combining model solutions with domain knowledge
  • Have experience working on real-world applications while interacting with domain experts
  • Have experience in building interpretable machine learning models for impactful societal applications

Prerequisites

Basic probability and basic computer programming skills are essential. ECE 313 or CS 361 (Statistics and Probability), and exposure to basics of scripting languages (such as Python). Knowledge of Operating Systems (e.g., ECE 391), or an equivalent course, is beneficial.

Timeline

There are 45 hours lectures {30 hours classroom lectures, quizzes and presentations & 15 hours hands‐on data analytics lab}, over 15 weeks in the fall semester.

Lecture Electronics Policy

During the lectures, and data analytics labs, cell phones or similar non‐class use of electronics are NOT allowed. If, due to unforeseen circumstances, the student needs access to her/his cell phone, she/he shall inform the instructor in the beginning of the lecture and should sit in a way (typically furthest from the board) not to allow any students behind her/him get disturbed.

Attendance Policy

The attendance to all lectures, and data analytics labs are required. There will be in‐class assignments and class participation is graded. Students are advised to contact both the TAs and the instructor via a private post on Piazza (before the beginning of the lecture) if they are to miss a lecture due to unforeseen circumstances. Instructor and TAs reserve the right to take class attendance. Class attendance includes data analytics lab hours. Students can miss no more than one group activity. Note that group activities are only tentatively scheduled.

DRES Accomodations

DRES requirements must be reported to instructor/TAs by the end of 1st week.

Evaluation

We will compute the final grade for undergraduates (table 1) and graduate students (table 2) using the following tables:

Table 1. Undergraduate Students

Activity Grade Details
Mini-Projects 1, 2, 3 50% (15%, 15%, 20% each)
Midterm and Final 30%
Class Participation 10% May include quizzes
Homework 10%

Table 2. Graduate Students

Activity Grade Details
Mini-Projects 1, 2, 3 30% (5%, 10%, 15% each)
Midterm and Final 25%
Final Project – Semester Long (4 credit hour students only) 25%
Class Participation 10% May include quizzes
Homework 10%
  • 3 credit hours for undergraduate and 4 credit hours for graduates with self‐proposed extensions
    approved by instructor(s) to two of the three min‐projects.
  • Full credit for submissions on time.
  • Late submission policy: 10% will be taken off for every day, prorated (up to 3 days max). 0 credit after that.
  • Groups Policy: Students will form groups (3 persons) for the projects early in week 2, otherwise TAs will form the groups for you
  • While we encourage discussions, submitting identical material is not allowed and will incur appropriate penalties

Mini-Projects

Students work in groups and follow detailed instructions to build end‐to‐end workflows to solve real‐world data science problems. Three mini‐projects in high social impact domains of ranging from autonomous vehicle safety to health analytics.

Graduate Project

(one additional credit requirement for the graduate students; approximately 25-30% of the overall graduate credit)

Graduate students work, in close collaboration with the instructor, on a semester long data‐ science team project of their choice. The project covers various aspects such as problem formulation, finding a dataset, pre‐processing of the data and preliminary analysis, model building and validation and interpretation of results.

In-class Activities

Session organized in the lecture where students solve questions in class in groups while the instructor/TAs help troubleshoot any doubts. These are designed for a deeper understanding of the course material and its application for problem solving.

Total Hours

  • Lectures: 28 * 1.50 hours = 42 hours
  • Discussions: 5 * 1 hour = 5 hours (Optional: slides available Piazza)
  • Final exam: 1 * 3 hours = 3 hours
  • Professor Office hours: 14 * 1 hour = 14 hours (additional hours available via appointment)
  • TA Office hours: 14 * 2 = 28 hours (additional hours by appointment)