Machine Lab 4 (Extra Credit): Strings

Overview

Implement strings. A string begins with a double quote ", followed by a (possibly empty) sequence of printable characters and escaped sequences, followed by a closing double quote (").

Background

A printable character is one that would appear on an old fashioned qwerty keyboard on a mechanical typewriter, including the space. Here, we must exclude " and \ because they are given special meaning, as described below. The printable characters include the 52 uppercase and lowercase alphabetics, the ten digits, space, and 30 of the 32 special characters, excluding the " and \.

Note that a string cannot contain an unescaped quote (") because it is used to end the string. However, it can contain the two character sequence representing an escaped quote (\ "). More generally, we use \ to begin an escaped sequence.

Specifically, you must recognize the following two-character sequences that represent escaped characters:

\ \
\ '
\ "
\ t
\ n
\ r
\ b
\ space (where space is the space character).

Each such two-character sequence must be converted into the corresponding single character. For example, the two-character string "\ t" (where the first character is the ASCII code for \, 92) must become the single character '\t' (ASCII character number 9).

Additionally, you must handle the following escaped sequence:

\ ddd

The above escaped sequence is used to escape specific ASCII integer values. ddd represents an integer value between 0 and 255. Your job is to map the integer to its single character value. For example, the escaped character \100 is the character 'd'.

Lastly, to allow splitting long string literals across lines, the sequence \ followed by a newline, followed by any number of spaces and tabs at the beginning of the next line, is ignored inside string literals.

The escape character \ cannot legally precede any other character in a string.

You will probably find it easiest to create a new entry point, possibly taking an argument, to handle the parsing of strings.

You will also find the following function useful:

String.make : int -> char -> string

where String.make n c creates the string consisting of n copies of c. In particular, String.make 1 c converts c from a character to a string.

You may also use char_of_int : int -> char and int_of_string : string -> int.

Testing Your Code

Note that if you test your solution in OCaml, you will have to include extra backslashes to represent strings and escaped characters.

For example:

# get_all_tokens "\"some string\"";;
- : Ml4common.token list = [STRING "some string"]

# get_all_tokens "\" she said, \\\"hello\\\"\"";;
- : Ml4common.token list = [STRING " she said, \"hello\""]

# get_all_tokens "\" \\100 \\001 \"";;
- : Ml4common.token list = [STRING " d \001 "]

# get_all_tokens "\"a line \\n starts here; indent \\t starts here next string\" \"starts here\"";;
- : Ml4common.token list = [STRING "a line \n starts here; indent \t starts here next string"; STRING "starts here"]