Introduction
HTTP is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally, how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer. For each request from client to server, there is a sequence of four steps:
- The client opens a TCP connection to the server on port 80, by default; other ports may be specified in the URL.
- The client sends a message to the server requesting the resource at a specified path. The request includes a header, and optionally (depending on the nature of the request) a blank line followed by data for the request.
- The server sends a response to the client. The response begins with a response code, followed by a header full of metadata, a blank line, and the requested document or an error message.
- The server closes the connection.
This is the basic HTTP 1.0 procedure. In HTTP 1.1 and later, multiple requests and responses can be sent in series over a single TCP connection. That is, steps 2 and 3 can repeat multiple times in between steps 1 and 4. Furthermore, in HTTP 1.1, requests and responses can be sent in multiple chunks. This is more scalable. Each request and response has the same basic form: a header line, an HTTP header containing metadata, a blank line, and then a message body. A typical client request looks something like this:
GET /index.html HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:20.0)
Gecko/20100101 Firefox/20.0
Host: en.wikipedia.org
Connection: keep-alive
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
GET requests like this one do not contain a message body, so the request ends with a blank line. The first line is called the request line, and includes a method, a path to a resource, and
the version of HTTP. The method specifies the operation being requested. The GET method asks the server to return a representation of a resource. /index.html is the path to the resource requested from the server. HTTP/1.1 is the version of the protocol that the client understands.
Although the request line is all that is required, a client request usually includes other information as well in a header. Each line takes the following form:
Keyword: Value
Keywords are not case sensitive. Values sometimes are and sometimes aren’t. Both keywords and values should be ASCII only. If a value is too long, you can add a space or tab to the beginning of the next line and continue it. Lines in the header are terminated by a carriage-return linefeed pair. The first keyword in this example is User-Agent, which lets the server know what browser is being used and allows it to send files optimized for the particular browser type. The following line says that the request comes from version 2.4 of the Lynx browser:
User-Agent: Lynx/2.4 libwww/2.1.4
All but the oldest first-generation browsers also include a Host field specifying the server’s name, which allows web servers to distinguish between different named hosts served from the same IP address:
Host: www.cafeaulait.org
The last keyword in this example is Accept, which tells the server the types of data the client can handle (though servers often ignore this). For example, the following line says that the client can handle four MIME media types, corresponding to HTML documents, plain text, and JPEG and GIF images:
Accept: text/html, text/plain, image/gif, image/jpeg
MIME types are classified at two levels: a type and a subtype. The type shows very generally what kind of data is contained: is it a picture, text, or movie? The subtype identifies the specific type of data: GIF image, JPEG image, TIFF image. For example, HTML’s content type is text/html; the type is text, and the subtype is html. The content type for a JPEG image is image/jpeg; the type is image, and the subtype is jpeg. Eight top-level types have been defined:
- text/* for human-readable words
- image/* for pictures
- model/* for 3D models such as VRML files
- audio/* for sound
- video/* for moving pictures, possibly including sound
- application/* for binary data
- message/* for protocol-specific envelopes such as email messages and HTTP responses
- multipart/* for containers of multiple documents and resources
Each of these has many different subtypes. The most current list of registered MIME types is available from http://www.iana.org/assignments/media-types/. In addition, nonstandard custom types and subtypes can be freely defined as long as they begin with x-. For example, Flash files are commonly assigned the type application/x-shockwave-flash. Finally, the request is terminated with a blank line—that is, two carriage return/linefeed pairs, \r\n\r\n.
Once the server sees that blank line, it begins sending its response to the client over the same connection. The response begins with a status line, followed by a header describing the response using the same “name: value” syntax as the request header, a blank line, and the requested resource. A typical successful response looks something like this:
HTTP/1.1 200 OK
Date: Sun, 21 Apr 2013 15:12:46 GMT
Server: Apache
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-length: 115
<html>
<head>
<title>
A Sample HTML file
</title>
</head>
<body>
The rest of the document goes here
</body>
</html>
HTTP Status Codes
Code and Messages | Meaning | HttpURLConnection constant |
---|---|---|
1XX | Informational | N/A |
100 Continue | The server is prepared to accept the request body and the client should send it; allows clients to ask whether the server will accept a request before they send a large amount of data as part of the request. | N/A |
101 Switching Protocols | The server accepts the client’s request in the Upgrade header field to change the application protocol (e.g., from HTTP to WebSockets.) | N/A |
2XX Successful | Request succeeded. | |
200 OK | The most common response code. If the request method was GET or POST, the requested data is contained in the response along with the usual headers. If the request method was HEAD, only the header information is included. | HTTP_OK |
201 Created | The server has created a resource at the URL specified in the body of the response. The client should now attempt to load that URL. This code is only sent in response to POST requests. | HTTP_CREATED |
202 Accepted | This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete, so no response can be returned. However, the server should return an HTML page that explains the situation to the user and provide an estimate of when the request is likely to be completed, and, ideally, a link to a status monitor of some kind. | HTTP_ACCEPTED |
203 Non-authoritative Information | The resource representation was returned from a caching proxy or other local source and is not guaranteed to be up to date. | HTTP_NOT_AUTHORITATIVE |
204 No Content | The server has successfully processed the request but has no information to send back to the client. This is normally the result of a poorly written form-processing program on the server that accepts data but does not return a response to the user. | HTTP_NO_CONTENT |
205 Reset Content | The server has successfully processed the request but has no information to send back to the client. Furthermore, the client should clear the form to which the request is sent. | HTTP_RESET |
206 Partial Content | The server has returned the part of the resource the client requested using the byte range extension to HTTP, rather than the whole document. | HTTP_PARTIAL |
226 IM Used | Response is delta encoded. | N/A |
3XX Redirection | Relocation and redirection. | |
300 Multiple Choices | The server is providing a list of different representations (e.g., PostScript and PDF) for the requested document. | HTTP_MULT_CHOICE |
301 Moved Permanently | The resource has moved to a new URL. The client should automatically load the resource at this URL and update any bookmarks that point to the old URL. | HTTP_MOVED_PERM |
302 Moved Temporarily | The resource is at a new URL temporarily, but its location will change again in the foreseeable future; therefore, bookmarks should not be updated. Sometimes used by proxies that require the user to log in locally before accessing the Web. | HTTP_MOVED_TEMP |
303 See Other | Generally used in response to a POST form request, this code indicates that the user should retrieve a resource from a different URL using GET. | HTTP_SEE_OTHER |
304 Not Modified | The If-Modified-Since header indicates that the client wants the document only if it has been recently updated. This status code is returned if the document has not been updated. In this case, the client should load the document from its cache. | HTTP_NOT_MODIFIED |
305 Use Proxy | The Location header field contains the address of a proxy that will serve the response. | HTTP_USE_PROXY |
307 Temporary Redirect | Similar to 302 but without allowing the HTTP method to change. | N/A |
308 Permanent Redirect | Similar to 301 but without allowing the HTTP method to change. | N/A |
4XX | Client error. | |
400 Bad Request | The client request to the server used improper syntax. This is rather unusual in normal web browsing but more common when debugging custom clients. | HTTP_BAD_REQUEST |
401 Unauthorized | Authorization, generally a username and password, is required to access this page. Either a username and password have not yet been presented or the username and password are invalid. | HTTP_UNAUTHORIZED |
422 Unprocessable Entity | The content type of the request body is recognized, and the body is syntactically correct, but nonetheless the server can’t process it. | N/A |
424 Failed Dependency | Request failed as a result of the failure of a previous request. | N/A |
426 Upgrade Required | Client is using a too old or insecure a version of the HTTP protocol. | N/A |
428 Precondition Required | Request must supply an If-Match header. | N/A |
429 Too Many Requests | The client is being rate limited and should slow down. | N/A |
431 Request Header | Fields Too Large Either the header as a whole is too large, or one particular header field is too large. | N/A |
451 Unavailable | For Legal Reasons Experimental; the server is prohibited by law from servicing the request. | N/A |
5XX | Server error. | |
500 Internal Server Error | An unexpected condition occurred that the server does not know how to handle. | HTTP_SERVER_ERROR HTTP_INTERNAL_ERROR |
501 Not Implemented | The server does not have a feature that is needed to fulfill this request. A server that cannot handle PUT requests might send this response to a client that tried to PUT form data to it. | HTTP_NOT_IMPLEMENTED |
502 Bad Gateway | This code is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request. | HTTP_BAD_GATEWAY |
503 Service Unavailable | The server is temporarily unable to handle the request, perhaps due to overloading or maintenance. | HTTP_UNAVAILABLE |
504 Gateway Timeout | The proxy server did not receive a response from the upstream server within a reasonable amount of time, so it can’t send the desired response to the client. | HTTP_GATEWAY_TIMEOUT |
505 HTTP Version Not Supported | The server does not support the version of HTTP the client is using (e.g., the as-yet-nonexistent HTTP 2.0). | HTTP_VERSION |
507 Insufficient Storage | Server does not have enough space to store the supplied request entity; typically used for POST or PUT. | |
511 Network Authentication Required | The client needs to authenticate to gain network access (e.g., on a hotel wireless network). | N/A |
Regardless of version, a response code from 100 to 199 always indicates an informational response, 200 to 299 always indicates success, 300 to 399 always indicates redirection, 400 to 499 always indicates a client error, and 500 to 599 indicates a server error.
Keep-Alive
HTTP 1.0 opens a new connection for each request. In practice, the time taken to open and close all the connections in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. This is even more problematic for encrypted HTTPS connections using SSL or TLS, because the handshake to set up a secure socket is substantially more work than setting up a regular socket.
In HTTP 1.1 and later, the server doesn’t have to close the socket after it sends its response. It can leave it open and wait for a new request from the client on the same socket. Multiple requests and responses can be sent in series over a single TCP connection. However, the lockstep pattern of a client request followed by a server response remains the same. A client indicates that it’s willing to reuse a socket by including a Connection field in the HTTP request header with the value Keep-Alive:
Connection: Keep-Alive
The URL class transparently supports HTTP Keep-Alive unless explicitly turned off. That is, it will reuse a socket if you connect to the same server again before the server has closed the connection. You can control Java’s use of HTTP Keep-Alive with several system properties:
- Set http.keepAlive to “true or false” to enable/disable HTTP Keep-Alive. (It is enabled by default.)
- Set http.maxConnections to the number of sockets you’re willing to hold open at one time. The default is 5.
- Set http.keepAlive.remainingData to true to let Java clean up after abandoned connections (Java 6 or later). It is false by default.
- Set sun.net.http.errorstream.enableBuffering to true to attempt to buffer the relatively short error streams from 400- and 500-level responses, so the connection can be freed up for reuse sooner. It is false by default.
- Set sun.net.http.errorstream.bufferSize to the number of bytes to use for buffering error streams. The default is 4,096 bytes.
- Set sun.net.http.errorstream.timeout to the number of milliseconds before timing out a read from the error stream. It is 300 milliseconds by default.
The defaults are reasonable, except that you probably do want to set sun.net.http.errorstream.enableBuffering to true unless you want to read the error streams from failed requests.