MP5
HTTP Server
Due dates may be found on the schedule.

Change Log

2024-10-11 13:00 CDT
Updated mp5.zip with several new files. These should remove some SIGILL and related failures. It also changes how part 3 is tested, from the old ./tester "[part=3]" to the new python3 -m pytest.
2024-10-11 13:22 CDT
Updated mp5.zip with revised tests/test-part2.cpp that uses sockets instead of pipes. In principle either one will work, but they are handled by different parts of the kernel so some Docker instances might be happier with one or the other. If this one does not work, you are welcome to try the old tests/test-part2.cpp instead.

The Hyper-Text Transfer Protocol (HTTP) – defined in RFC 2616 and other later specifications – is the fundamental protocol for transferring data on the World Wide Web (i.e. Internet). It is an application-level protocol which through the use of its request methods, error codes and headers is widely used for various different applications.

HTTP is a Client-Server protocol, where the Client sends a request to the Server which processes the request and returns appropriate response. In this MP, you will:

As with previous specifications we have referenced, many informal summaries an examples are available online; for example Wikipedia and the Mozilla Developers Network both discuss HTTP in more readable ways than the official specification.

ABNF

The HTTP specification, and many other internet specifications as well, use a special metasyntax called Augmented Backus–Naur Form or ABNF to describe how strings are formatted. For example, on page 30 we see

generic-message = start-line
                  *(message-header CRLF)
                  CRLF
                  [ message-body ]

This means A generic-message consists of a start-line, then 0 or moremessage-header CRLF” pairs, then another CRLF, then optionally a message body.” We can look up each of those terms; for example on the same page we see

start-line      = Request-Line | Status-Line

which means A start-line is either a Request-Line or a Status-Line. On page 15 we see

CR             = <US-ASCII CR, carriage return (13)>
LF             = <US-ASCII LF, linefeed (10)>
CRLF           = CR LF

which means that each CRLF is what we’d write in C as "\r\n" (or equivalently "\x0d\x0a").

And so on. To fully understand generic-message we’d look up message-header, message-body, Request-Line, Status-Line, and any other symbol we found by doing so.

ABNF has its own Specification, RFC 822 and associated gentler introductions if you’d like to know more.

1 Initial Files

mp5.zip contains initial files. You will modify

2 Implementation

To help you organize the parts of building a web server, we’ve split up the implementation into three parts:

  1. Parsing an HTTP request packet for the request and headers (as a string, without needing to worry about the socket code). This will include all of http.c except httprequest_read.
  2. Reading an HTTP request over a socket (the httprequest_read function).
  3. Building a web server using your httprequest code that can be used with Chrome or other web browsers.

2.1 Webserver overview

HTTP sends messages over a socket (or, for some of our tests, other file-like objects). Messages have 3 parts:

  1. A start line that explains what type and version of message it is
  2. Zero or more header lines, each a key-value pair
  3. Optionally, a message body (also called a payload)

A web server (which is what you’re making in this MP) needs to receive and parse HTTP messages of the various Request types and construct and sent HTTP messages of the various Response types.

We’ve split these tasks into several functions. In order of calling, they are

  1. Either server.c’s main (which is complete and needs no changes from you) or a tester file establishes a connection to a client.

  2. Either server.c’s client_thread (which you’ll write in part 3) or a tester file manages that connection and calls httprequest_read with the open connection as an argument.

  3. http.c’s httprequest_read (which you’ll write in part 2) assembles an HTTP message from one or more packets, parses it, and stores it in an HTTPRequet structure defined in http.h by:

    1. using http.c’s httprequest_parse_headers to parse the start line and header lines; and
    2. using the parsed headers to read the message body.

    This information gets stored in the HTTPRequet structure, which you may add to if you wish. Because you may add to it, getter functions are used to access its key details rather than using direct field access, and a destructor function is used to deallocate any resources it has used.

  4. server.c’s client_thread uses the parsed request message to decide what response message to send.

The details of each of these pieces (e.g. how you know that you’ve passed the end of the headers) are defined in RFC 2616.

2.2 Part 1: Parsing an HTTP request

Implement the API provided in http.{c,h}, excluding httprequest_read. The main function in this part is httprequest_parse_headers, which must populate the provided HTTPRequest *req data structure with the contents from char *buffer.

The other functions include:

While working on this part:

2.2.1 Testing Part 1

We have provided a simple test suite to test the correctness of your parsing logic:

  • In your terminal, type make tester to compile the test suite.
  • Run ./tester "[part=1]" to run the tests that have been tagged with [part=1] (covering this portion of the MP).

2.3 Part 2: Reading an HTTP request from a socket

Complete httprequest_read. This function is called with a file descriptor where you must read to read the contents of the request.

We recommend using read not recv when reading from the socket.

read/recv are slow

Each call to read/recv requires suspending your code, switching to the kernel, copying data from the socket to your code’s memory, and then resuming your code: quite a lot of work. Because of this, it is best to read many bytes at a time to minimize how many of these calls you make. This is especailly true of large payloads: once you know the message body is 123456 bytes or whatever, you should call read/recv once with that full buffer to let the kernel handle it all in one go instead of in many separate calls.

2.3.1 Testing Part 2

We have provided a simple test suite to test the correctness of your parsing logic. The first five tests are identical to part 1, except that they’re now delivered via the sockfd file descriptor instead of as a string. The final test tests if your code can read the payload of a requests.

  • In your terminal, type make tester to compile the test suite.
  • Run ./tester "[part=2]" to run the tests that have been tagged with [part=2] (covering this portion of the MP).

Some tests may crash (rather than fail normally) if your code does not work properly. This is because some tests read memory that you might not be setting. They will stop crashing when your code is correct.

2.4 Part 3: Building a Web Server

We have provided a partial threaded socket-based web server in server.c. You need to finish implementing the client_thread function.

Our code has a main thread in the function main and spawns worker threads to run client_thread. The main thread is fully complete (no edits needed) and does the following:

  1. Accepts a port number as a command-line argument
  2. Asks the OS to let it listen for incoming messages by asking for a socket, then binding it to that port and asking the OS to listen for incoming connections
  3. Repeatedly
    1. accept an incoming connection, which opens a new socket to talk over
    2. create a worker thread to speak with the client over that connection
    3. detach the worker thread, meaning ignore it’s final results and let it retire when its job is done

You will write the client_thread function in server.c:

Web clients may send several HTTP requests over the same connection. They may also send one, wait for the reply without closing the connection, then send the next. Make sure you process each HTTP request as soon as you have the entire request, not waiting for the connection to close.

2.4.1 Testing Part 3

You will test Part 3 using your favorite web browser.

If you can see the pages and images, you just made your first static web server!

There are a few tests that can be run with python3 -m pytest to verify it works by code – but that’s not really the point of this part.

3 Submission and Grading

Your code must have no valgrind errors or warnings. You can check this with valgrind ./tester.

Once everything is working you can try running all the tests in summary form through make test or bash tester.sh (both do the same thing). This is also what is run when you submit your code.