Project 6 sysstatd web service FAQ

Project 6 FAQ

This FAQ answers questions students have had in past semesters related to project 6.

Do we really need to use the textbook author's RIO package?
The RIO package provides an I/O layer on top of sockets that provides two core pieces of functionality: handling short reads and buffering of read data. If you don't use the RIO package, you will have to implement how to handle short reads yourselves for a correct implementation. Implementing buffering, though commonly done as a performance optimization that reduces the number of read() system calls, is not necessary for correctness.
How can I verify that my server responds correctly to HTTP requests?
Run it under strace. You may wish to set '-s 1024' to be able to examine all data written/read.

How can I verify that my server handles persistent connections correctly?

A simple trick is to use 'curl -v' and specify the same URL twice, as in:

$ curl -v http://cs3214.cs.vt.edu:9011/loadavg http://cs3214.cs.vt.edu:9011/loadavg
* About to connect() to cs3214.cs.vt.edu port 9011
*   Trying 128.173.41.123... connected
* Connected to cs3214.cs.vt.edu (128.173.41.123) port 9011
...
Connection #0 to host cs3214.cs.vt.edu left intact
* Re-using existing connection! (#0) with host cs3214.cs.vt.edu
...
Connection #0 to host cs3214.cs.vt.edu left intact
* Closing connection #0

Look for the lines shown above.

How do I use valgrind/strace when running under the test harness?
Your server may fail when executing the tests of the test harness in sequence. One way to debug such problems is by using wrappers for valgrind and strace. To do this, create files swrapper and vwrapper like so:
```
::::::::::::::
swrapper
::::::::::::::
#!/bin/sh
#
# invoke strace, passing along any arguments
strace -s 1024 -e network,read,write -ff -o stracelog ./sysstatd $*

::::::::::::::
vwrapper
::::::::::::::
#!/bin/sh
#
# invoke valgrind, passing along any arguments
valgrind ./sysstatd $*
```
Make sure to make those scripts executable (chmod +x ?wrapper), then you can run the test harness via server_unit_test.py -s ./vwrapper -o output to see valgrind output in 'output' (respectively for swrapper).
Does the body of a HTTP response need to be terminated with \r\n?
No. CRLF is used only to separate header lines (and to end the header). The body consists of arbitrary content, which isn't necessarily line-oriented at all.
curl indicates that persistent connections work, but we still fail the test.
A possible reason is that you append additional bytes beyond the number you announced in Content-Length. curl skips whitespace when attempting to read a response. If the additional bytes you send are whitespace (such as \r\n), then curl will hide the problem.
Do we need to use select() (or poll())?
To meet the basic requirements of the assignment, no. The only possible use I could see is if you wanted to implement a time-out feature without issuing a blocking read() call. In this case, you would call select() before read(), giving a read set that consists only of the file descriptor you're intending to read from. If select() times out without the file descriptor having become readable, you treat it as timeout. Otherwise, you perform a single read() call on the file descriptor which you know won't block.

An alternative to that is the use of a timer and a signal, as shown here; although select() is likely the simpler solution.
If our server is supposed to handle multiple clients simultaneously, do we need to bind() one (or multiple) sockets to multiple ports?
No. TCP uses a quadruple (src ip, src port, dst ip, dst port) to identify each connection, so different connections can go to the same dst port as long as the src ip or src port are different. You will obtain a socket that refers to a specific connection (i.e., client with a specific src ip/src port pair) as the return value from the accept() call.
In relay mode, do we need to call bind/listen/accept after we've connected to the relay server?
No. You don't need to (and must not) call anything. After connecting, simply send your prefix, then start handling HTTP/1.1 requests on that same socket.

A TCP connection is bidirectional, and subsequent to the handshake that establishes a connection, the TCP protocol has in fact no memory of who established it (the TCP client) and who accepted it (the TCP server). It's like picking up a phone (or calling someone back on a phone) - once the phone call is in progress, both parties can speak and assume whichever roles are suited to their conversation.
In relay mode, how do we bind to the (random) port assigned by the NAT gateway (hn1.cs.vt.edu)
As discussed in question 9, you don't call bind() at all. When you connect() to the relay server, the OS assigns a port to your connection on the machine from which you're connecting. (It does this for all TCP clients that don't call bind() - the common case.) As the connection is established, the NAT gateway eavesdrops on the connection and records this port in a map alongside the (random) port it assigned for the outside to use. When the outside peer (in our case, the relay server) replies, it'll rewrite (translate) the packet and reinsert the original port number before forwarding the packet so that the so-translated packet can be dispatched to the proper connection when it arrives back at the machine where your server runs.
curl claims to speak HTTP/1.1, but does not send a Connection: close header!
The use of the 'Connection: close' header to announce that no more or requests/responses are sent on an existing TCP connection is a courtesy, but not a requirement in HTTP/1.1. Both client and server are free to close the connection instead. In fact, such closing is needed when a client and/or server opportunistically keeps a HTTP/1.1 connection open only to later realize that nothing more needs to be requested (in the client case) or that the connection needs to be closed, perhaps to avoid running out of file descriptors or ports (in the server case). For details, read Section 8 of RFC 2616, particularly 8.1.4 Practical Considerations.
Why do I get a SIGPIPE signal/why does writing to a socket fail with errno=EPIPE?
According to write(2):
```
       EPIPE  fd  is  connected to a pipe or socket whose reading end is closed.  When this happens
              the writing process will also receive a SIGPIPE  signal.   (Thus,  the  write  return
              value is seen only if the program catches, blocks or ignores this signal.)
```
This means the client has closed its file descriptor and would be unable to read() the data sent through this connection. Note that due to buffering and the internals of the TCP protocol, there may be a delay before EPIPE is returned. When it is, you should stop trying to send data on that connection and close the fd.
If we use our threadpool implementation from project 5, how should we manage the futures returned?
The future abstraction assumes that a concurrent activity has a result that is either used somewhere later in the program, or where the progress of the program depends on the future having completed.

In a network server, this assumption isn't usually true, except for special applications such as perhaps an internal monitoring/diagnostics system that would record the time it took to serve a client. On the contrary, one is usually interested in making sure that all resources allocated while serving one client are deallocated. The fewer such resources, the simpler this job.

For this reason, consider extending the thread pool interface such that tasks can be submitted to the pool that don't return futures on which the user would have to call future_free(). Instead, the corresponding internal objects are deallocated by the thread pool as soon as the task has been processed. Note the analogy to pthread_detach()/pthread_join().
How does the query string syntax work?
If a web page contains a form such as this one:

Enter callback:

which uses this HTML:
```
<form method="get" action="http://cs3214.cs.vt.edu:9011/loadavg">
                    <input name="nocallback" value="false" type="hidden"></input>
    Enter callback: <input name="callback"></input>
                    <input type="submit"></input>
</form>
```
then when the user clicks 'Submit', the browser will bundle up the form data as a series of argument=value pairs that occur in the query string (after the ?) of the request. Try it out. For more information, see here and here. If additional arguments are provided, servers usually ignore them. This provides for greater robustness when the assumptions made in the created HTML form page and on the server have diverged. For instance, users may have created HTML pages before a particular field was deprecated,
Which functions can I use to parse the query string?
strtok_r() is a possible function to parse, as shown in queryparsing.c
The test harness times out, even though I can start my server just fine on the command line.
The test harness will create a TCP connection to your server, but will close it without sending anything (no HTTP request), just to test whether your server started successfully. Tests will start subsequently if this connection was successful. If you server crashes because a client closes a connection without sending anything - which it mustn't - you may experience this situation for all tests. Recall that read() return 0 signals a connection having been closed by the peer.