This is an archived copy of a previous semester's site.
Please see the current semester's site.
This MP is all about getting conformable with memory allocation, pointers, and character encodings in C. You will complete several different sets of functions to build up your C programming skills.
In your CS 340 directory, merge the initial starting files with the following commands:
git fetch release
git merge release/mp1 --allow-unrelated-histories -m "Merging release repository"
Throughout this MP, you will be working with emojis. You will begin with some simple functions and then advance to building a small, expandable emoji translator.
In emoji.c
, you will implement five functions (plus helper any functions you chose to add) that work with UTF-8 encoded strings.
In emoji_favorite
, we just want to know an emoji you like.
There are many ways to specify byte sequences in C code. For example, ๐ is the emoji U+1F499
. However, C does not understand the formal UTF-8 notation. Instead, you need to specify it either by:
0xF0 0x9F 0x92 0x99
.str[0] = 0xF0
) or in decimal form (ex: str[0] = 240
).'\x##'
. Even though this is four bytes, when read by C it will only be one byte in memory. The first byte is set with str[0] = '\xF0'
.str = "\xF0\x9F\x92\x99"{.c}
). Each escape code , where h is a hexadecimal digit, specifies one byte so this string contains four bytes1 Note: "\xF0\x9F\x92\x99"[1]
is the byte 0x9F = 159 (or โ97 as a signed char
), not 'x'
= 0x78 = 120.\u
or \U
(ex: str = "\U0001F499"
). Note: \u
escape sequence takes 4 hexadecimal digits i.e.ย where h is a hexadecimal digit. \U
escape sequence takes 8 hexadecimal digits i.e.ย where h is a hexadecimal digit.str = "๐"
).You will want to use different forms at different times.
In emoji_count
, you will count the emojis in a provided string.
For the purpose of this MP, we will consider an emoji to be anything in the inclusive range U+1F000
- U+1FAFF
. (There are some invalid characters in this range and a few early emojis outside of this range, but we want to keep it simple. Feel free to be more accurate, we will only test your code on real emoji within this range and your solution wonโt break if you program a more correct solution.)
In emoji_random_alloc
, you will generate a random emoji each time the function is called! The official manual page rand
outlines how to create a random integer in C, which youโll need to use for your emoji. Make sure that you return a different emoji each time emoji_random_alloc
is called.
Just like in emoji_count
, you can assume the inclusive range U+1F000
- U+1FAFF
or be more specific.
In emoji_invertChar
, you will be inverting an emoji. You are provided with a string in this function. You have to invert only the first character of this string (which may be up to 4 bytes) only if it is an emoji. The inversion is a semantic, so you should invert the meaning of the emoji (NOT flipping or inverting
the bits). You will have to invert ๐
(U+1F60A
) to some sort of sad face (your choice!). In addition to ๐
, you need to invert at least five other emojis of your choice (and itโs okay to do more than just five).
In emoji_invertAll
, you will invert all the characters in the string using your emoji_invertChar
function.
In emoji_invertFile_alloc
, you will read in the contents of a provided file name and invert all the emojis in that file, and return the inverted string.
For the second part of this MP you will create a emoji translation
utility. Unlike emoji_invertChar
, these translations might change the length of the string, meaning youโll need to impement some kind of data structures to support the translation.
In emoji-translate.c
, you will find four functions:
void emoji_init(emoji_t *emoji)
is called to initialize an emoji
object.
void emoji_add_translation(emoji_t *emoji, const unsigned char *source, const unsigned char *translation)
adds a translation to the emoji
object. For example, ๐
might translate to diamond
.
const unsigned char *emoji_translate_file_alloc(emoji_t *emoji, const char *fileName)
must translate the contents of the file specified by fileName
using all translation rules added so far. When matching rules, always choose the longest matching rule based on the length of the source
.
๐=>
diamondhas been added:
"๐s are forever!"
would translate to "diamonds are forever!"
.๐=>
diamondand
๐๐๐=>
Bejeweled,
"๐๐๐ with ๐s"
would translate to "Bejeweled with diamonds"
, not "diamonddiamonddiamond with diamnds"
since the rule using three emoji is the longer than the one with one emoji.void emoji_destroy(emoji_t *emoji)
is called to destroy the emoji
object and any memory allocated by your emoji library.
;
emoji_t emoji(&emoji);
emoji_init
(&emoji, "๐งก", "heart");
emoji_add_translation
// The file on disk contains: "I ๐งก๐ Illinois!"
unsigned char *translation = emoji_translate_file_alloc(&emoji, "tests/txt/simple.txt");
// Translation Output: "I heart๐ Illinois!"
("%s\n");
printf
(&emoji); emoji_destroy
For a program to run correctly and consistently across many machines, the program must have no memory errors. Since every CPU is slightly different, a memory error may result in a program running perfectly fine on one system (perhaps the system initialized the heap memory to 0x00
for you) while may seg fault on another system (perhaps the other system has random/garbage values in the heap memory).
To test for memory correctness, we use commonly used tool called valgrind
. The magic of valgrind
is that it will track every memory access your program makes and report:
malloc
) but not released back the system (via free
),free
โd),You will earn 67% of the credit for the MP for completing the MP. The other 33% of your grade can only be earned if your program passes all of the test cases and runs valgrind clean
. To run valgrind clean
, your must:
make test
,valgrind --leak-check=full --show-leak-kinds=all ./test
,All heap blocks were freed -- no leaks are possible
, showing you freeโd all memory before existing, ANDERROR SUMMARY: 0 errors from 0 contexts
, showing you made no memory errors.Until you run valgrind clean, you should expect differences between the GitHub Action grader and your local run.
Sadly, valgrind
support on macOS is limited. However, Docker provides a Linux environment on macOS:
docker build -t cs340 .
make clean
to clear any old compiled code that may be compiled outside of dockerdocker run --rm -it -v "`pwd`":/mp1 cs340 "make clean"
make test
to compile the test case inside of dockerdocker run --rm -it -v "`pwd`":/mp1 cs340 "make test"
valgrind ./test
to run the test case with valgrind inside of dockerdocker run --rm -it -v "`pwd`":/mp1 cs340 "valgrind ./test"
Alternatively, you can copy your code to your course VM and run valgrind there.
In your solution, you must only modify the following files. Modifications of other files may break things:
emoji.c
emoji.h
emoji-translate.c
emoji-translate.h
make test
../test
and everything should pass!When you are finished working on the MP, you can run a local copy of the same test suite that you will use for grading. To run the test suite:
make test
.valgrind ./test
and everything should pass and run valgrind clean!
Once you have locally passed all the tests, you will need to submit and grade your code. First commit using the standard git commands:
git add -A
git commit -m "MP submission"
git push
The initial grading is done via a manual GitHub Action. You MUST complete this step before the deadline to earn any points for the MP:
ActionTab
mp1 autograding
Run Workflowbutton (located on the blue bar)
Run Workflow
Autograding is given 2/3 weight for this MP.
Valgrind is given 1/3 weight for this MP.