MP 1: Emojis

Due Date: Completed and turned in via git before September 6, 2022 at 11:59pm
Points: MP 1 is worth 40 points
Semester-Long Details: Programming Environment and MP Policy

Overview

MP1 in CS 340 is all about getting conformable with memory allocation, pointers, and character encodings in C. You will complete several different sects of functions to build up your C programming skills.

  • You will explore some key differences between C and C++ programs.
  • You will create data structures in C.
  • You will work at the bit- and byte-level to manipulate multi-byte UTF-8 characters in C-strings.
  • You will read files and manipulate data from files in memory.
  • You will share with us your favorite emoji! ๐ŸŽ‰

Initial Files

In your CS 340 directory, merge the initial starting files with the following commands:

git fetch release
git merge release/mp1 --allow-unrelated-histories -m "Merging release repository"

Machine Problem

Throughout this MP, you will be working with emojis! You will begin with some simple functions and then advance to building a small, expandable emoji translator!

Part 1: Simple Emoji Functions ๐ŸŽ‰

In emoji.c, you will find your final five functions where you will combine everything you know into working with UTF-8 encoded strings.

  • In emoji_favorite, we just want to know your favorite emoji.

    There are many ways to specify byte sequences in C code. For example, ๐Ÿ’™ is the emoji U+1F499. However, C does not understand the formal UTF-8 notation. Instead, you need to specify it either by:

    • Using the values for each byte. For ๐Ÿ’™ (U+1F499), the byte sequence is 0xF0 0x9F 0x92 0x99.
    • The bytes can be set one by one as numbers in hex form (ex: str[0] = 0xF0) or in decimal form (ex: str[0] = 340).
    • The bytes can be set one by one as characters, which the character escape '\x##'. Even though this is four bytes, when read by C it will only be one byte in memory. The first byte is set with str[0] = '\xF0'.
    • A string can be created using multiple individual character escapes (ex: str = "\xF0\x9F\x92\x99"). Even though the example string appears to be a 16 byte string, the string contains only four bytes.
    • A string can be created using the character escape \u or \U (ex: str = "\U0001F499"). Note: \u escape sequence takes 4 hexadecimal digits i.e. \uhhhh where h is a hexadecimal digit. \U escape sequence takes 8 hexadecimal digits i.e. \Uhhhhhhhh where h is a hexadecimal digit.
    • A string can be created using the emoji itself (ex: str = "๐Ÿ’™").

    You will want to use different forms at different times.

  • In emoji_count, you will count the emojis in a provided string.

    For the purpose of this MP, we will consider an emoji to be anything in the inclusive range U+1F000 - U+1FAFF. (There are some invalid characters in this range and a few early emojis outside of this range, but we want to keep it simple. Feel free to be more accurate, we will only test your code on real emoji within this range and your solution wonโ€™t break if you program a more correct solution.)

  • In emoji_random_alloc, you will generate a random emoji each time the function is called! The C documentation for rand outlines how to create a random integer in C, which youโ€™ll need to use for your emoji. Make sure that you return a different emoji each time emoji_random_alloc is called.

    Just like in emoji_count, you can assume the inclusive range U+1F000 - U+1FAFF or be more specific.

  • In emoji_invertChar, you will be inverting an emoji. You are provided with a string in this function. You have to invert only the first character of this string (which may be up to 4 bytes) only if it is an emoji. The inversion is a semantic, so you should invert the meaning of the emoji (NOT flipping or โ€œinvertingโ€ the bits). You will have to invert โ€œ๐Ÿ˜Šโ€ (U+1F60A) to some sort of sad face (your choice!). In addition to โ€œ๐Ÿ˜Šโ€, you need to invert at least five other emojis of your choice (and itโ€™s okay to do more than just five).

  • In emoji_invertAll, you will invert all the characters in the string using your emoji_invertChar function.

  • In emoji_invertFile_alloc, you will read in the contents of a provided file name and invert all the emojis in that file, and return the inverted string.

Part 2: An Emoji Translator

Sometimes, we may write articles (or entire books) with emojis! For the second part of this MP, you will begin to work with basic C data structures to create a โ€œemoji translationโ€ utility.

In emoji-translate.c, you will find four functions:

  • void emoji_init(emoji_t *emoji) is called to initialize an emoji object.

  • void emoji_add_translation(emoji_t *emoji, const unsigned char *source, const unsigned char *translation) adds a translation to the emoji object. For example, โ€œ๐Ÿ˜Šโ€ might translate to โ€œhappyโ€.

  • const unsigned char *emoji_translate_file_alloc(emoji_t *emoji, const char *fileName) must translate the contents of the file specified by fileName all translation rules added so far. When matching rules, always choose the longest matching rule based on the length of the source.

    • For example, only the rule โ€œ๐Ÿ˜Šโ€ => โ€œhappyโ€ has been added: "I am ๐Ÿ˜Š!" would translate to "I am happy!".
    • If the rule โ€œ๐Ÿ˜Šโ€ => โ€œhappyโ€ and โ€œ๐Ÿ˜Š๐Ÿ˜Šโ€ => โ€œvery happyโ€, "I am ๐Ÿ˜Š๐Ÿ˜Š!" would translate to "I am very happy!" since the rule using two faces is the longer than one face.
  • void emoji_destroy(emoji_t *emoji) is called to destroy the emoji object and any memory allocated by your emoji library.

Example Usage

emoji_t emoji;
emoji_init(&emoji);

emoji_add_translation(&emoji, "๐Ÿงก", "heart");

// The file on disk contains: "I ๐Ÿงก๐Ÿ’™ Illinois!"
unsigned char *translation = emoji_translate_file_alloc(&emoji, "tests/txt/simple.txt");

// Translation Output: "I heart๐Ÿ’™ Illinois!"
printf("%s\n");

emoji_destroy(&emoji);

Part 3: Memory Correctness

For full credit, your MP must run โ€œvalgrind cleanโ€. This means that you must:

  • Compile all test cases on the command line using make test,
  • Run all test cases using valgrind --leak-check=full --show-leak-kinds=all ./test,
  • This must report the following output: All heap blocks were freed -- no leaks are possible

macOS Specific Information

Sadly, valgrind does not work on macOS. However, it does work in Docker running on a Mac:
# Build a light-weight docker:
docker build -t cs340  .

# Run make clean, make, and run valgrind:
docker run --rm -it -v `pwd`:/mp1 cs340 "make clean"
docker run --rm -it -v `pwd`:/mp1 cs340 "make"
docker run --rm -it -v `pwd`:/mp1 cs340 "valgrind ./test"

Modifiable Files

In your solution, you must only modify the following files. Modifications of other files may break things:

  • emoji.c
  • emoji.h
  • emoji-translate.c
  • emoji-translate.h

Testing Your Program

  • To compile the test suite, run make test.
  • To run your code, run ./test and everything should pass! ๐ŸŽ‰

Submit

When you have completed your program, double-check all three parts run without errors and gets the result your expect. When you are ready, submit the code via the following git commands:

git add -A
git commit -m "MP submission"
git push

You can verify your code was successfully submitted by viewing your git repo on github.com.