Change Log

2024-10-11 13:00 CDT: Updated mp5.zip with several new files. These should remove some SIGILL and related failures. It also changes how part 3 is tested, from the old ./tester "[part=3]" to the new python3 -m pytest.
2024-10-11 13:22 CDT: Updated mp5.zip with revised tests/test-part2.cpp that uses sockets instead of pipes. In principle either one will work, but they are handled by different parts of the kernel so some Docker instances might be happier with one or the other. If this one does not work, you are welcome to try the old tests/test-part2.cpp instead.

The Hyper-Text Transfer Protocol (HTTP) – defined in RFC 2616 and other later specifications – is the fundamental protocol for transferring data on the World Wide Web (i.e. Internet). It is an application-level protocol which through the use of its request methods, error codes and headers is widely used for various different applications.

HTTP is a Client-Server protocol, where the Client sends a request to the Server which processes the request and returns appropriate response. In this MP, you will:

Dive into the HTTP protocol to understand the technical design of HTTP request and response packets.
Write an HTTP server that responds to HTTP requests (from web browsers like Chrome, command line utilities like curl, and anything else that speaks HTTP).
Parse HTTP headers into key-value pairs.
Have a foundational understanding of how libraries parse HTTP requests for building web services in the future.

As with previous specifications we have referenced, many informal summaries an examples are available online; for example Wikipedia and the Mozilla Developers Network both discuss HTTP in more readable ways than the official specification.

ABNF

The HTTP specification, and many other internet specifications as well, use a special metasyntax called Augmented Backus–Naur Form or ABNF to describe how strings are formatted. For example, on page 30 we see

generic-message = start-line
                  *(message-header CRLF)
                  CRLF
                  [ message-body ]

This means A generic-message consists of a start-line, then 0 or moremessage-header CRLF” pairs, then another CRLF, then optionally a message body.” We can look up each of those terms; for example on the same page we see

start-line      = Request-Line | Status-Line

which means A start-line is either a Request-Line or a Status-Line. On page 15 we see

CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
CRLF           = CR LF

which means that each CRLF is what we’d write in C as "\r\n" (or equivalently "\x0d\x0a").

And so on. To fully understand generic-message we’d look up message-header, message-body, Request-Line, Status-Line, and any other symbol we found by doing so.

ABNF has its own Specification, RFC 822 and associated gentler introductions if you’d like to know more.

2 Implementation

To help you organize the parts of building a web server, we’ve split up the implementation into three parts:

Parsing an HTTP request packet for the request and headers (as a string, without needing to worry about the socket code). This will include all of http.c except httprequest_read.
Reading an HTTP request over a socket (the httprequest_read function).
Building a web server using your httprequest code that can be used with Chrome or other web browsers.

2.1 Webserver overview

HTTP sends messages over a socket (or, for some of our tests, other file-like objects). Messages have 3 parts:

A start line that explains what type and version of message it is
Zero or more header lines, each a key-value pair
Optionally, a message body (also called a payload)

A web server (which is what you’re making in this MP) needs to receive and parse HTTP messages of the various Request types and construct and sent HTTP messages of the various Response types.

We’ve split these tasks into several functions. In order of calling, they are

Either server.c’s main (which is complete and needs no changes from you) or a tester file establishes a connection to a client.
Either server.c’s client_thread (which you’ll write in part 3) or a tester file manages that connection and calls httprequest_read with the open connection as an argument.
http.c’s httprequest_read (which you’ll write in part 2) assembles an HTTP message from one or more packets, parses it, and stores it in an HTTPRequet structure defined in http.h by:
1. using http.c’s httprequest_parse_headers to parse the start line and header lines; and
2. using the parsed headers to read the message body.
This information gets stored in the HTTPRequet structure, which you may add to if you wish. Because you may add to it, getter functions are used to access its key details rather than using direct field access, and a destructor function is used to deallocate any resources it has used.
server.c’s client_thread uses the parsed request message to decide what response message to send.

The details of each of these pieces (e.g. how you know that you’ve passed the end of the headers) are defined in RFC 2616.

2.2 Part 1: Parsing an HTTP request

Implement the API provided in http.{c,h}, excluding httprequest_read. The main function in this part is httprequest_parse_headers, which must populate the provided HTTPRequest *req data structure with the contents from char *buffer.

The other functions include:

httprequest_get_action, to get the action verb (also sometimes called method; ex: GET) from the HTTP request,
httprequest_get_path, to get the path (also sometimes called a URI; ex: /) from the HTTP request,
httprequest_get_header, to get a value for a specific header (ex: Host -> localhost), and
httprequest_destroy, to free any memory stored by an HTTPRequest struct

While working on this part:

You can (and will have to) add to the HTTPRequest struct in http.h (possibly also creating other structs in there, too).
You must populate the action, path, version fields in HTTPRequest while parsing the packet as part of httprequest_parse_headers.

2.2.1 Testing Part 1

We have provided a simple test suite to test the correctness of your parsing logic:

In your terminal, type make tester to compile the test suite.
Run ./tester "[part=1]" to run the tests that have been tagged with [part=1] (covering this portion of the MP).

2.3 Part 2: Reading an HTTP request from a socket

Complete httprequest_read. This function is called with a file descriptor where you must read to read the contents of the request.

You should use your httprequest_parse_headers to parse the headers of the request.
The Content-Length header is a special HTTP header that will help you out to read the payload of the request.

We recommend using read not recv when reading from the socket.

read/recv are slow

Each call to read/recv requires suspending your code, switching to the kernel, copying data from the socket to your code’s memory, and then resuming your code: quite a lot of work. Because of this, it is best to read many bytes at a time to minimize how many of these calls you make. This is especailly true of large payloads: once you know the message body is 123456 bytes or whatever, you should call read/recv once with that full buffer to let the kernel handle it all in one go instead of in many separate calls.

2.3.1 Testing Part 2

We have provided a simple test suite to test the correctness of your parsing logic. The first five tests are identical to part 1, except that they’re now delivered via the sockfd file descriptor instead of as a string. The final test tests if your code can read the payload of a requests.

In your terminal, type make tester to compile the test suite.
Run ./tester "[part=2]" to run the tests that have been tagged with [part=2] (covering this portion of the MP).

Some tests may crash (rather than fail normally) if your code does not work properly. This is because some tests read memory that you might not be setting. They will stop crashing when your code is correct.

2.4 Part 3: Building a Web Server

We have provided a partial threaded socket-based web server in server.c. You need to finish implementing the client_thread function.

Our code has a main thread in the function main and spawns worker threads to run client_thread. The main thread is fully complete (no edits needed) and does the following:

Accepts a port number as a command-line argument
Asks the OS to let it listen for incoming messages by asking for a socket, then binding it to that port and asking the OS to listen for incoming connections
Repeatedly
1. accept an incoming connection, which opens a new socket to talk over
2. create a worker thread to speak with the client over that connection
3. detach the worker thread, meaning ignore it’s final results and let it retire when its job is done

You will write the client_thread function in server.c:

You must read an HTTP request from the fd (use your httprequest_read).
You must create an HTTP response to respond to the request.
- If the requested path is /, you should process that request as if the path is /index.html.
- If the file does not exist in your static directory (excluding the /), you must respond with a 404 Not Found response.
- If the file requested does exist, you will respond with a 200 OK packet and:
  - Return the contents of the file as the payload,
  - If the file name ends in .png, the Content-Type header must be set to image/png.
  - If the file name ends in .html, the Content-Type header must be set to text/html.
You can (and probably should, to make it easier for you) close(fd) after responding to the request. This will ensure the browser opens a new socket (and you will have a new thread) when making another request. (You can continue to re-use this same socket and keep the connection alive, but this is not required.)

Web clients may send several HTTP requests over the same connection. They may also send one, wait for the reply without closing the connection, then send the next. Make sure you process each HTTP request as soon as you have the entire request, not waiting for the connection to close.

2.4.1 Testing Part 3

You will test Part 3 using your favorite web browser.

Compile your server with make
Launch your server using ./staticserver 34000 (or any other port number)
Visit http://localhost:34000/ (or http://fa24-cs340-###.cs.illinois.edu:34000/ if running on VM number ###)
- …and http://localhost:34000/microbe.html
- …and http://localhost:34000/340.png

If you can see the pages and images, you just made your first static web server!

There are a few tests that can be run with python3 -m pytest to verify it works by code – but that’s not really the point of this part.