Data Science
DISCOVERY
An open-access data science resource with a mission to create the most valuable Data Science resource available, as a public good from The University of Illinois.

Professor-Guided Lessons and Practice Problems

Learn Data Science with Professors Wade Fagen-Ulmschneider and Karle Flanagan that contain written explanations, example worksheets, practice questions, and more!

MicroProjects: Real Data Science in Under an Hour

Guided, detailed projects that provide "micro" exploration of a new dataset. Each MicroProject is designed to give you a real data science experience in Python in under and hour!

Guides for Common Data Science Techniques

Short, solution-focused examples of common tasks in Data Science. We create several new guides each week, so there is constantly something new!

Datasets for Your Own Projects

Clean, documented, and relevant datasets for Data Science. As we use a new dataset in any of our courses or research, we add the dataset here for you to use!

Learn Data Science!

Module 1: Basics of Data Science with Python

"Basics of Data Science with Python" provides a strong introduction of the field of Data Science. You will understand best practices in designing good, great, and ideal experiments, use Python to load data into DataFrame, and manipulate DataFrames in Python to explore subsets of data.

Module 2: Exploratory Data Analysis

"Exploratory Data Analysis" teaches about the tools and techniques to begin to do exploratory data analysis on real-world datasets. You will learn several methods of analyzing statistical properties of the data and how to calculate and apply these properties using Python. Finally, you will create simple data visualizations showing an overview of the data.

Module 3: Simulation and Distributions

"Simulation and Distributions" provides an exploration into the world of computer simulations. Beginning with simulating simple events, like rolling a dice where the expected outcome is known, you gradually build increasingly complex simulations. You will find many simulations result in common distributions, such as the Normal Distribution, which you will learn has many interesting properties all its own.

Module 4: Prediction and Probability

"Prediction and Probability" begins with a deep-dive into probability and using probabilities to make informed predictions on future events. You will complete dozens of problems on basic probability, explore how to describe dependent probabilistic events, and use Python to make predictions under uncertainty.

Module 5: Polling, Confidence Intervals, and the Normal Distribution

"Polling, Confidence Intervals, and the Normal Distribution" starts with an exploration of different sampling techniques. You will learn how bias and sampling variability can affect the results of surveys. From that, you know how to use expectation and inference as a way to make predictions and decisions under uncertainty.

Module 6: Towards Machine Learning

"Towards Machine Learning" applies all of the foundational knowledge applied in the previous modules to using modern techniques to help computers discover common similarities in data and to predict future outcomes based on previously-seen events. Completion of this and all other modules provides you with the ability to advance to dedicated machine learning courses.