Administrivia


About the Class

This course will use real data from several domains (including resiliency and healthcare science) to instill in the students following data-science expertise:

  • Data management
  • Feature engineering by identifying key data characteristics
  • Problem formulation, machine learning models and validation
  • Design, construction, and assessment of end-to-end application workflows by using data-driven insights
  • Generating application domain insights that can improve the understanding of the domain problem

Through this course, students will be able to develop expertise and ability to form intuitions which can be more broadly applicable. Such an expertise will prepare students for data science engineering roles.

Course Description

Many modern application domains require engineers and domain experts to work together in the design, and analysis of heterogeneous datasets often with the objective of automating the decision making (sometimes referred to as actionable intelligence). Extracting the right level of knowledge to generate actionable intelligence from these datasets is a compelling problem. The proposed course addresses this problem by providing students with an opportunity to build analysis workflows that use data management, feature engineering, supervised and unsupervised learning to derive real-world insights. In this course, students will have an opportunity to work on real-world applications while interacting with domain experts.

This course instills skillsets required for constructing end-to-end real-world analyses workflows through lectures, labs and mini-projects which will allow students to derive domain insights. The course will use real-world examples (measurement logs from supercomputers, data from large clinical trials) to teach data-management, feature engineering, supervised/unsupervised learning, and testing & validation techniques. Students will gain hands-on implementation experience by completing three mini-projects requiring them to analyze substantial amount of high-fidelity real-world measurement datasets obtained from different application domains such as computer system design and healthcare applications. While each workflow is end-to-end, the student will go deeper into the understanding of the methods as one progresses through these projects. Moreover, students will learn to create quantifiable domain-specific metrics of interest (cost models) and use techniques to quantitatively assess their results. Students will develop the expertise and an ability to form intuitions which are applicable in other domains.

This course will feature several guest lectures from domain experts who will demonstrate how innovative data-analyses techniques have been transformative and as a result generated significant societal impact.

Prerequisites

Basic probability and basic computer programming skills are essential. ECE 313 or CS 361 (Statistics and Probability), and exposure to basics of scripting languages (such as Python). Knowledge of Operating Systems (e.g., ECE 391), or an equivalent course, is beneficial.

Timeline

There are ~37 hours of required lectures {classroom lectures, quizzes and in-class activities} and ~15 hours of optional discussion sections over 15 weeks in the spring semester.

Piazza

The instructors will use Piazza as a medium to communicate with students regarding important course announcements and logistical updates. Students may also ask questions on Piazza regarding course policy or assignments. Consequently, in order to remain informed, all students in the course should enroll in the “ECE/CS 498DS Data Science and Analytics” course on Piazza. Please use professional language when making posts on Piazza.

Electronics Policy

During the lectures and discussion sections, cell phones or other similar non-class electronics should NOT be used. If, due to unforeseen circumstances, the student needs access to her/his cell phone, she/he shall inform the instructor in the beginning of the lecture and should sit in a way (typically furthest from the board) that doesn’t distract other students.

However, students SHOULD bring their laptops to each lecture in case of a quiz. These laptops should only be taken out and used during the time of the quiz. More details about these quizzes can be found in the “In-Class Quizzes” section below.

Attendance Policy

Student attendance in all lectures is required. Class participation, which includes in-class activities and quizzes, will be evaluated. If a student must miss a lecture due to unforeseen circumstances, he/she should inform the TAs and the instructor via a private Piazza post that is made prior to the start of the missed lecture. Instructor and TAs reserve the right to take class attendance.

DRES Accomodations

DRES requirements must be reported to instructor/TAs by the end of 1st week (1/24/2020)

Course Components

Discussion Sections

Optional discussion sections will be held every Friday from 4-5 PM. Although not strictly required, students are highly encouraged to attend. During these weekly sessions, either (i) extensions to that week’s lecture material may be discussed, or (ii) at least two TAs will be available for general office hours. Refer to Piazza for up-to-date announcements for each week’s discussion section.

In-Class Activities

In-class activities are sessions organized during lectures in which students solve questions in groups while the instructor/TAs help troubleshoot any doubts. These are designed for deeper understanding of the course material and its application to problem solving. There are going to be six in-class activities throughout the semester, and the lowest individual score will be dropped.

In-Class Quizzes

In-class quizzes are short quizzes aimed to test students’ understanding on recently taught concepts and encourage class attendance. These quizzes will be administered during the final 5 minutes of lecture. The quiz dates are unannounced, and the quiz questions will only be released in class. Students are expected to bring laptops to the class and submit answers through Compass. There will be six in-class quizzes throughout the semester, and the lowest individual quiz score will be dropped. Note that there will be no quiz during the week of the engineering career fair.

Homework

Throughout the semester, there will be 6 homework assignments. These assignments/worksheets should be completed within 1 week of release, and will be related to the content of the lectures from around the time of release. Note that homework 0 will assess the students’ understanding of basic probability concepts, and homework 1 will test their understanding of basic programming.

Mini-Projects

Mini-projects enable students to work in groups to build end-to-end workflows that solve real-world data science problems. During the semester, there will be three mini-projects in high social impact domains ranging from autonomous vehicle safety to health analytics.

Final Project

4-credit students work, in close collaboration with the instructor, on a semester-long data-science team project of their choice. The project covers various aspects such as problem formulation, finding a dataset, pre-processing of the data and preliminary analysis, model building and validation and interpretation of results. Note that 3-credit students will not be completing a final project. Proposals for the final project will be collected during the fourth week of the semester.

Evaluation

Final course grades will be computed with the weightages shown in the following table:

Activity Grade Details
Mini-Projects 1, 2, 3 45% MP1: 10%, MP2: 15%, MP3: 20%
Midterm and Final 35% Midterm: 15%, Final: 20%
Final Project (graduate students only) 30%
Class Participation 10%
Homework 10%

Credit policy

There will be two categories of students enrolled in the course – students taking the course for 3 credit hours and students taking the course for 4 credit hours

  • To earn the extra credit hour, 4 credit hour students will complete a final project in addition to all the other assignments that the 3 credit hour students complete.
  • Currently, only graduate students are permitted to register for the 4 credit hour section of the course. Undergraduate students will need to request a special override from the professor in order to register for the 4-credit hour section.
  • Score distributions in the above table for 4-credit hour students will be normalized from 130% to 100%

Late submission policy

10% will be taken off for every day, prorated (up to 3 days max). 0 credit after that.

Groups policy

Students will form groups (3 persons) for the projects early in week 2, otherwise TAs will form the groups for you.

While we encourage discussions, submitting identical material is not allowed and will incur appropriate penalties

Class participation

The 10% class participation grade comes from (i) in-class Activities,(ii)in-class quizzes, and (iii) participation in lectures/office hours/discussion sections/Piazza. The instructor and the TAs reserve the right to track attendance and participation in any of these outlets.