MP 1: Emojis
Overview
MP1 in CS 340 is all about getting conformable with memory allocation, pointers, and character encodings in C. You will complete several different sects of functions to build up your C programming skills.
- You will explore some key differences between C and C++ programs.
- You will create data structures in C.
- You will work at the bit- and byte-level to manipulate multi-byte UTF-8 characters in C-strings.
- You will read files and manipulate data from files in memory.
- You will share with us your favorite emoji! ๐
Initial Files
In your CS 340 directory, merge the initial starting files with the following commands:
git fetch release
git merge release/mp1 --allow-unrelated-histories -m "Merging release repository"
Machine Problem
Throughout this MP, you will be working with emojis! You will begin with some simple functions and then advance to building a small, expandable emoji translator!
Part 1: Simple Emoji Functions ๐
In emoji.c
, you will find your final five functions where you will combine everything you know into working with UTF-8 encoded strings.
-
In
emoji_favorite
, we just want to know your favorite emoji.There are many ways to specify byte sequences in C code. For example, ๐ is the emoji
U+1F499
. However, C does not understand the formal UTF-8 notation. Instead, you need to specify it either by:- Using the values for each byte. For ๐ (U+1F499), the byte sequence is
0xF0 0x9F 0x92 0x99
. - The bytes can be set one by one as numbers in hex form (ex:
str[0] = 0xF0
) or in decimal form (ex:str[0] = 340
). - The bytes can be set one by one as characters, which the character escape
'\x##'
. Even though this is four bytes, when read by C it will only be one byte in memory. The first byte is set withstr[0] = '\xF0'
. - A string can be created using multiple individual character escapes (ex:
str = "\xF0\x9F\x92\x99"
). Even though the example string appears to be a 16 byte string, the string contains only four bytes. - A string can be created using the character escape
\u
or\U
(ex:str = "\U0001F499"
). Note:\u
escape sequence takes 4 hexadecimal digits i.e. \uhhhh where h is a hexadecimal digit.\U
escape sequence takes 8 hexadecimal digits i.e. \Uhhhhhhhh where h is a hexadecimal digit. - A string can be created using the emoji itself (ex:
str = "๐"
).
You will want to use different forms at different times.
- Using the values for each byte. For ๐ (U+1F499), the byte sequence is
-
In
emoji_count
, you will count the emojis in a provided string.For the purpose of this MP, we will consider an emoji to be anything in the inclusive range
U+1F000
-U+1FAFF
. (There are some invalid characters in this range and a few early emojis outside of this range, but we want to keep it simple. Feel free to be more accurate, we will only test your code on real emoji within this range and your solution wonโt break if you program a more correct solution.) -
In
emoji_random_alloc
, you will generate a random emoji each time the function is called! The C documentation for rand outlines how to create a random integer in C, which youโll need to use for your emoji. Make sure that you return a different emoji each timeemoji_random_alloc
is called.Just like in
emoji_count
, you can assume the inclusive rangeU+1F000
-U+1FAFF
or be more specific. -
In
emoji_invertChar
, you will be inverting an emoji. You are provided with a string in this function. You have to invert only the first character of this string (which may be up to 4 bytes) only if it is an emoji. The inversion is a semantic, so you should invert the meaning of the emoji (NOT flipping or โinvertingโ the bits). You will have to invert โ๐โ (U+1F60A
) to some sort of sad face (your choice!). In addition to โ๐โ, you need to invert at least five other emojis of your choice (and itโs okay to do more than just five). -
In
emoji_invertAll
, you will invert all the characters in the string using youremoji_invertChar
function. -
In
emoji_invertFile_alloc
, you will read in the contents of a provided file name and invert all the emojis in that file, and return the inverted string.
Part 2: An Emoji Translator
Sometimes, we may write articles (or entire books) with emojis! For the second part of this MP, you will begin to work with basic C data structures to create a โemoji translationโ utility.
In emoji-translate.c
, you will find four functions:
-
void emoji_init(emoji_t *emoji)
is called to initialize anemoji
object. -
void emoji_add_translation(emoji_t *emoji, const unsigned char *source, const unsigned char *translation)
adds a translation to theemoji
object. For example, โ๐โ might translate to โhappyโ. -
const unsigned char *emoji_translate_file_alloc(emoji_t *emoji, const char *fileName)
must translate the contents of the file specified byfileName
all translation rules added so far. When matching rules, always choose the longest matching rule based on the length of thesource
.- For example, only the rule โ๐โ => โhappyโ has been added:
"I am ๐!"
would translate to"I am happy!"
. - If the rule โ๐โ => โhappyโ and โ๐๐โ => โvery happyโ,
"I am ๐๐!"
would translate to"I am very happy!"
since the rule using two faces is the longer than one face.
- For example, only the rule โ๐โ => โhappyโ has been added:
-
void emoji_destroy(emoji_t *emoji)
is called to destroy theemoji
object and any memory allocated by your emoji library.
Example Usage
emoji_t emoji;
emoji_init(&emoji);
emoji_add_translation(&emoji, "๐งก", "heart");
// The file on disk contains: "I ๐งก๐ Illinois!"
unsigned char *translation = emoji_translate_file_alloc(&emoji, "tests/txt/simple.txt");
// Translation Output: "I heart๐ Illinois!"
printf("%s\n");
emoji_destroy(&emoji);
Part 3: Memory Correctness
For full credit, your MP must run โvalgrind cleanโ. This means that you must:
- Compile all test cases on the command line using
make test
, - Run all test cases using
valgrind --leak-check=full --show-leak-kinds=all ./test
, - This must report the following output:
All heap blocks were freed -- no leaks are possible
macOS Specific Information
Sadly,valgrind
does not work on macOS. However, it does work in Docker running on a Mac:
# Build a light-weight docker: docker build -t cs340 . # Run make clean, make, and run valgrind: docker run --rm -it -v `pwd`:/mp1 cs340 "make clean" docker run --rm -it -v `pwd`:/mp1 cs340 "make" docker run --rm -it -v `pwd`:/mp1 cs340 "valgrind ./test"
Modifiable Files
In your solution, you must only modify the following files. Modifications of other files may break things:
emoji.c
emoji.h
emoji-translate.c
emoji-translate.h
Testing Your Program
- To compile the test suite, run
make test
. - To run your code, run
./test
and everything should pass! ๐
Submit
When you have completed your program, double-check all three parts run without errors and gets the result your expect. When you are ready, submit the code via the following git commands:
git add -A
git commit -m "MP submission"
git push
You can verify your code was successfully submitted by viewing your git repo on github.com.