lab_fileio

Fearless File IO

Due: Feb 13, 23:59 PM

Learning Objectives

  • Introduce how to read and write to files in Python
  • Use string formatting to create custom file formats
  • Practice using functions inside other functions

Submission Instructions

Using the Prairielearn workspace, test and save your solution to the following exercises. (You are welcome to download and work on the files locally but they must be re-uploaded or copied over to the workspace for submission).

Assignment Description

Reading input files and writing precisely formatted output files is a very important skill in data science exploration. This lab is designed to walk you through how you can write your own file readers and writers for input or output files in any format. You will then be required to use these methods to solve a number of short programming exercises.

Part 1: File IO in Python

The lecture portion of this lab will go over how to open a file in Python, common file IO modes, and how to read and write on an open file. You are encouraged to follow along with the starting Python notebook and complete the single problem at the end as proof of your work.

firstFile()

# INPUT:
# fname, a string storing the name of the file to be written
# OUTPUT:
# None
# Instead the function should create a file with the input file name
# The first line of the file must be:'Hello World'
# The second line must be: 'CS 277'
# There should NOT be a third line (even a blank line)!
None firstFile(fname):

Printing “Hello World!” is a very common first programming exercise across disciplines. However since we never got to write it together, lets do the equivalent now while writing files!

Complete the following exercise such that it will produce the following text:

Hello World
CS 277

HINT: Although the content in the file will be fixed, make sure not to hardcode the filename! The autograder will test multiple files.

Part 2: Coding Exercises

The exercises in this lab are designed to review string formatting, functions, as well as File IO. You may find it helpful to complete the functions in listed order as several of these functions build off each other.

list_of_lines()

# INPUT:
# file, a string containing the relative path of the file being processed
# OUTPUT:
# A list containing the complete collection of substrings formed by splitting on line breaks.
# NOTE:
# The output list should contain each line (even empty lines) in the order they are read (top to bottom)
# To ensure full credit, you should strip whitespace from both sides of each line.
List[String] list_of_lines(file):

Almost any file storing data of interest can be processed as some combination of ‘read each line’ and ‘do something to each line’. In the below example, write code which will take in an arbitrary file of interest, read in its contents, and return a list of strings in the same order as the input lines (one string per line).

The only formatting you need to handle in this function is the removal of whitespace in each of the lines. As a reminder, whitespace are spaces, tabs, and new line characters.

Note: The workbook contains several text files for you to use as well as the output for these text files as worked examples.

list_of_largest()

# INPUT:
# file, a string containing the relative path of the file being processed
# The file stores a collection of grades as space separated numbers
# There will be at least one number on every line
# OUTPUT:
# An integer storing the max of every input number in the file.
int list_of_largest(file):

Brad’s back again with another data structure disaster! As it turns out, every time Brad is asked to manually grade a student’s work he absentmindedly writes it down in a single text file, separating each number by a single space and using a new line every time he grades a new assignment for that student.

The problem is Brad changes his mind frequently so there’s potentially more than one number on each line! Don’t worry – Brad has decided that the only reasonable way to fix this is to assume his most lenient (highest valued) number is the correct one.

Your job is to write a function that will take in some arbitrary filename and return a list of integers storing the most lenient grade in each row. The order of the list matters! The largest number in first row in the file should be stored at index 0, the largest number in the second row should be stored at index 1, etc…

Note: The workbook contains several text files for you to use as well as the output for these text files as worked examples. The input files will always be of the same format (space-separated numbers across many lines).

Warning: You can compare strings using ‘>’ and ‘<’ but it doesnt work like number comparisons!

Hint: If you are struggling, try to break the problem down into sub-problems. If you can’t figure it out, take a look at the lab slides on the web page for some secret hints!

false_matrix()

# INPUT:
# file, a string containing the relative path of the file being processed
# row, an integer storing the index of the line being processed
# col, an integer storing the index of the value in the line being processed
# OUTPUT:
# A string storing the string in the file located at row and col when split by spaces
string false_matrix(file, row, col):

As we proceed into a discussion on lists, we will eventually introduce the concept of a 2-D matrix. As a leadup to this discussion, consider how a file consisting of many lines with many values per line could be considered a matrix.

Given as input a file name and two integers (row and col), return whatever string value is located at exact position in an input file which contains space-separated strings. Both row and col will be 0-based indexing. So row=1 and col=0 means that you should look up the second line and return the first string there.

You can always assume that the row and column values are valid in the input file.

Note: The workbook contains several text files for you to use as well as the output for these text files as worked examples.

zipper_string()

# INPUT:
# s1, the string being 'zippered' into positions 0, 2, 4, 6, etc...
# s2, the string being 'zippered' into positions 1, 3, 5, 7, etc...
# OUTPUT:
# A single string consisting of alternating characters from s1 and s2 separated by spaces
def zipper_string(s1, s2):

Demonstrate your mastery of string parsing with the following exercise! Given two input strings, create a space-separated string of alternating characters from each string.

You can assume that the strings are always equal length (and don’t have to worry about the edge case when one string runs out of characters before the other).

Your final output string should have no unnecessary whitespace (either before or after the last character).

# s1 = "apple", s2="abbey"
'a a p b p b l e e y'

# s1 = "sweep", s2="boast"
's b w o e a e s p t'

# s1 = "lunar", s2="laser"
'l l u a n s a e r r'

# s1 = "ABCDEFG", s2="0123456"
'A 0 B 1 C 2 D 3 E 4 F 5 G 6'

zipper_file()

# INPUT:
# file, a string containing the relative path of the file being written
# zstrings, a list of strings to be zippered together in order
# OUTPUT:
# None
# Instead write to the target file each pair of strings as a zippered line
None zipper_file(file, zstrings):

Given a string filename and a list of strings, create a file that has on each line a zippered string consisting of sequential pairs of strings. So the first line would have strings at index 0 and 1 zippered together, line 2 would have strings 2 and 3, and so on…

You may assume that the input list is always an even length (so every string has a pair).

# zstrings = ["ab", "cd", "ef", "gh", "ij", "kl"]
'''
a c b d
e g f h
i k j l
'''

# zstrings = ["apple", "table", "house", "grass", "stone", "bliss", "glove", "links", "chair", "dance"]
'''
a t p a p b l l e e
h g o r u a s s e s
s b t l o i n s e s
g l l i o n v k e s
c d h a a n i c r e
'''

reformat_date()

# INPUT:
# datestring, a string storing a date in the "MM/DD/YYYY" format
# OUTPUT:
# A string in the "<Month> DD, YYYY" format.
string reformat_date(datestring):

Brad’s back with more class infrastructure problems that he needs you to solve! In this case, Brad has been sitting on a massive list of dates corresponding to office hour visits but he’s completely forgotten how to read the datestring format “MM/DD/YYYY”!

Remind Brad about what month corresponds to which number (from 00 to 12) by taking as input a list of strings (MM/DD/YYYY format) and converting them into English sentences of the form “Office Hours on Month DD, YYYY”.

To be clear on the spelling / conversion, see below:

MM Month
01 January
02 February
03 March
04 April
05 May
06 June
07 July
08 August
09 September
10 October
11 November
12 December

Note: You should not test for the validity of dates. Brad [the autograder] lives in a world where every month has 31 days. :)

Make sure your output precisely matches the required format, such as in the following examples. Notably, the days should remain a two digit string!

# datestring = "01/12/1988"
"January 12, 1988"

# datestring = "03/01/1188"
"March 01, 1188"

calendar_reformat()

# INPUT:
# file, a string storing the input file name
# datelist, a list storing a collection of strings in the "MM/DD/YYYY" format
# OUTPUT:
# None
# Instead write to the input file all the dates as a line separated list of "Month DD, YYYY" format
None calendar_reformat(file, datelist):

As one final example of good function usage, given a file name and a list of strings of the format “MM/DD/YYYY”, write a file containing each item in the list on its own line in the format “Month DD, YYYY”.

# datelist = ['12/30/2329', '11/27/2622', '08/08/1930']
'''
December 30, 2329
November 27, 2622
August 08, 1930

'''