lab_matplot

Mighty MatPlotLib

Due: Feb 20, 23:59 PM

Learning Objectives

  • Introduce Python visualization with matplotlib
  • Practice common plot shapes (dotplot, line plot, bar chart)
  • Review list manipulations and paired list indexing
  • Explore fundamentals of math evaluation and numpy

Submission Instructions

Using the Prairielearn workspace, test and save your solution to the following exercises. (You are welcome to download and work on the files locally but they must be re-uploaded or copied over to the workspace for submission).

Assignment Description

Matplotlib is a library designed to add matlab plotting functionalities in Python. Over time it has grown to a huge library that supports all kinds of visualizations and integration with many data science libraries such as numpy and pandas and other packages such as scikit-learn are built on top of it. In this assignment, we will cover a ‘core’ set of useful visualization types and observe how other packages such as numpy are integrated into the workflow.

Warning: Debugging this assignment will be tricky – the autograder cannot display images. A number of test images have been provided for you to compare against but if you are unsure where your code is going wrong you are strongly encouraged to directly observe the output of your plots and compare against the reference images in the workspace. If you are still unsure what is going wrong, get help from course staff in OH, Discord, Piazza, or email the professor directly!

makeLinePlot()

# INPUT:
# fname, the filename for a PNG to be written
# xList, a list of numeric values representing the x-coordinates for a list of points
# yList, a list of numeric values representing the y-coordinates for a list of points
# linetype, an integer value representing one of four possible line styles
# '0' -- a solid black line
# '1' -- a dashed blue line
# '2' -- a solid red line with x's at data points
# '3' -- a dashed green line with circles at data points
# OUTPUT:
# None, instead, your outputted line graph should be saved to <fname>.
def makeLinePlot(fname, xList, yList, linetype):

Line plots are a staple of publications and being able to plot different styles of lines will make future experiments much easier – including our first mini-project!

For the following exercise, you will be writing a series of hardcoded formats that a user can pick between to plot their input data:

  • Customize the line you add to the graph based off the input of linetype
    • linetype is an integer
    • if it is 0, make a solid black line
    • if it is 1, make a dashed blue line
    • if it is 2, make a solid red line with x style markers
    • if it is 3, make a dashed green line with circles at data points
  • Save the generated figure at fname

makeDotPlot()

# INPUT:
# fname, the filename for a PNG to be written
# Each of the following four lists is identically sized:
# xList, a list of numeric values representing the x-coordinates for a list of points
# yList, a list of numeric values representing the y-coordinates for a list of points
# shape, a list of strings representing the shape the point should be
# The shape should be a 'c', 's', or 't' (the only allowable values)
# Each shape is a patch that uses the provided coordinates and color with the following fixed parameters:
# 'c': Create a Circle patch with radius 0.05
# 's': Create a Rectangle patch with both width and height 0.1 
# 't': Create a RegularPolygon patch with 3 vertices and a radius of 0.07
# color, a list of colors representing the color of the point
# The color will always be a matplotlib interpretable color string as a single character
# OUTPUT:
# None, instead a PNG is saved to file at fname containing the contents of the dot plot
def makeDotPlot(fname, xList, yList, shape, color):

Representing data as a set of 2D points is a common way of visualizing raw data, clustered data, and principal component analyses (among many other use cases). Here we will practice our ability to plot specific paired points while reviewing the fundamentals of list indexing. Your task is to plot each individual point from four paired lists each giving a fraction of the points description.

Note that while the colors will be given matplotlib single-letter codes, the shapes must be hardcoded according to the above descriptions, each using a different patch function.

randomHistogram()

# INPUT:
# fname, a string containing the filename for a PNG to be written
# maxInt, an integer containing the maximum value of the integer being generated
# trials, an integer containing the total number of random numbers being generated
# seed, an optional value for setting the random seed (Default value is None)
# OUTPUT:
# None, instead create a histogram that plots the count for each number from 0 to maxInt
# Every histogram should be colored blue and width rwidth=0.8 (a hist kwark)
# There should be exactly as many bars as the number of possible outputs from the random trials
def randomHistogram(fname, maxInt, trials, seed=None):

Bar charts and histograms are another very useful tool in your visualization toolkit. Here we will visualize the distribution of randint() directly by using a histogram to track the total counts of different values! What happens as the number of trials grows? What does this tell you about the probability distribution of random?

For the following exercise:

  • Generate random numbers trials number of times and store them as a list
  • The format of the histogram is always fixed:
    • the color should be blue
    • the width of each bar should be 0.8
    • there should be exactly one bar for every possible number Hint: the numbers are 0 to maxInt inclusive
  • Save the generated figure at fname

Handling Missing Data

The next pair of functions deal with a common data science problem – when the data you are studying is incomplete in some way! Here we will be exploring one solution to this problem (imputing the missing values from context) in the context of temperature measures over time.

Problem Statement

You are given a list of floating-point numbers representing the average daily temperatures over a period of time. However, some values are missing and are automatically represented by the value 200 as a placeholder to show a value is not there.

Note: 200 is just chosen as an arbritrarily high number that is too high to record; it has no significance and you can assume that all numbers in the list other than 200s will be between 0 and 100

Your task is to replace each 200 value following these rules:

  1. If the 200 value is surrounded by two valid temperatures, replace it with the average of the previous and next day’s temperatures.

  2. If the 200 value appears at the beginning of the list, replace it with the next day’s temperature.

  3. If the 200 value appears at the end of the list, replace it with the previous day’s temperature.

  4. You can assume that there are no consecutive 200 values in the list.

Once you replace the values in the list, make a lineplot of the values using the data values you collected.

Rather than just writing one function to handle the full imputation and plotting – and trying to deal with debugging both simultaneously – we will break down the complex function into two key parts and build each separately:

imputeTemp()

# INPUT:
# A list of temperatures, where each element is either a floating-point number or 200 (meaning it should be replaced).
# OUTPUT:
# A list of temperatures, where every '200' value is replaced according to the provided replacement logic.
def imputeTemp(temperatures):

Given a list of temperatures with potential placeholder values, build a new list which contains the original values when they are present and the imputed values when we encounter a placeholder.

missingTemp()

# INPUT:
# A list of temperatures, where each element is either a floating-point number or 200 (meaning it should be replaced).
# fname, a string containing the filename for a PNG to be written
# OUTPUT:
# None, instead, your outputted line graph should be saved to <fname>.
def missingTemp(temperatures, fname):

Given a list of temperatures with potential placeholder values, plot a 2D line with the x-coordinates being the integer index of each item in the imputed temperature list.

TIP: To make a line plot of one dimensional data, we can re-use our knowledge of one-dimensional lists – notably list(), range() and len().

More directly, we can treat the input dataset as the y-values and create an equivalent xList! First get the length of the input list. Then create a new list with values from 0 to len(data)-1. An example on how this can be done in Python is below:

data = [5, 8, 10, 7, 4] # our 'ylist'
new_xlist = list(range(len(data)))

evaluateExpression()

# INPUT:
# equation, a string input built using numbers, the variable (as letter) x, and standard math operations.
# xList, a list of numeric values representing the x-coordinates for a list of points
# OUTPUT:
# An equal sized list yList consisting of the values of the input equation for each input value to x.
def evaluateExpression(equation, xList):

Being able to evaluate expressions using python’s eval function is very handy. It allows us to apply a string equation on global or local variables. In this instance, we want to evaluate an unknown set of input x-values against any input equation and output an equally sized list of y-values.

animateExpression()

# INPUT:
# fname, the filename for a PNG to be written
# equation, a string input built using numbers, the variable (as letter) x, and standard math operations.
# xList, a list of numeric values representing the x-coordinates for a list of points
# OUTPUT:
# None, instead a PNG is saved that has a frame for each value in xList
def animateExpression(fname, equation, xList):

For this last exercise, we will combine your evaluateExpression() function with an animation, allowing you to visualize any mathematical equation with a single variable x! As we want to be able to see this animation in full without a constantly changing coordinate space, you will be required to set the dimensions of your plot to be fixed – something you have seen but not had to implement until now!

As the FuncAnimation() code is more complex then most of your previous Python experiences, rather than coding from scratch a large amount of the code skeleton has been provided for you. Finish the function by completing the following steps:

  1. When you are ready to begin, remove the ‘return’ line at the top of the function! This is only there to prevent this incomplete function from crashing the autograder.

  2. Define the fixed boundaries of the plot. The function takes as input an xList and has a fixed set of x-dimensions based on the input. However there is no corresponding yList (or fixed y-dimensions). Use what you have learned in this lab to create this list as well as setting the ylim dimensions.

  3. Create the update() function. The update function should add one new data point in every step and is called automatically when FuncAnimation() is run. It is up to you to finish the logic, making sure to set line’s data appropriately. Note that the latter part of this process (return line,) has already been provided!

Note: To be clear, you do not need to change the start or end of the code, which creates the plots, sets the first frame, runs the FuncAnimation(), and saves the output in the appropriate format and location.