The Protocol → Explore with me!

Table of Contents

Introduction
HTTP Status Codes
Keep-Alive

Introduction

HTTP is the standard protocol for communication between web browsers and web servers. HTTP specifies how a client and server establish a connection, how the client requests data from the server, how the server responds to that request, and finally, how the connection is closed. HTTP connections use the TCP/IP protocol for data transfer. For each request from client to server, there is a sequence of four steps:

The client opens a TCP connection to the server on port 80, by default; other ports may be specified in the URL.
The client sends a message to the server requesting the resource at a specified path. The request includes a header, and optionally (depending on the nature of the request) a blank line followed by data for the request.
The server sends a response to the client. The response begins with a response code, followed by a header full of metadata, a blank line, and the requested document or an error message.
The server closes the connection.

This is the basic HTTP 1.0 procedure. In HTTP 1.1 and later, multiple requests and responses can be sent in series over a single TCP connection. That is, steps 2 and 3 can repeat multiple times in between steps 1 and 4. Furthermore, in HTTP 1.1, requests and responses can be sent in multiple chunks. This is more scalable. Each request and response has the same basic form: a header line, an HTTP header containing metadata, a blank line, and then a message body. A typical client request looks something like this:

GET /index.html HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:20.0)
Gecko/20100101 Firefox/20.0
Host: en.wikipedia.org
Connection: keep-alive
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

GET requests like this one do not contain a message body, so the request ends with a blank line. The first line is called the request line, and includes a method, a path to a resource, and
the version of HTTP. The method specifies the operation being requested. The GET method asks the server to return a representation of a resource. /index.html is the path to the resource requested from the server. HTTP/1.1 is the version of the protocol that the client understands.

Although the request line is all that is required, a client request usually includes other information as well in a header. Each line takes the following form:

Keyword: Value

Keywords are not case sensitive. Values sometimes are and sometimes aren’t. Both keywords and values should be ASCII only. If a value is too long, you can add a space or tab to the beginning of the next line and continue it. Lines in the header are terminated by a carriage-return linefeed pair. The first keyword in this example is User-Agent, which lets the server know what browser is being used and allows it to send files optimized for the particular browser type. The following line says that the request comes from version 2.4 of the Lynx browser:

User-Agent: Lynx/2.4 libwww/2.1.4

All but the oldest first-generation browsers also include a Host field specifying the server’s name, which allows web servers to distinguish between different named hosts served from the same IP address:

Host: www.cafeaulait.org

The last keyword in this example is Accept, which tells the server the types of data the client can handle (though servers often ignore this). For example, the following line says that the client can handle four MIME media types, corresponding to HTML documents, plain text, and JPEG and GIF images:

Accept: text/html, text/plain, image/gif, image/jpeg

MIME types are classified at two levels: a type and a subtype. The type shows very generally what kind of data is contained: is it a picture, text, or movie? The subtype identifies the specific type of data: GIF image, JPEG image, TIFF image. For example, HTML’s content type is text/html; the type is text, and the subtype is html. The content type for a JPEG image is image/jpeg; the type is image, and the subtype is jpeg. Eight top-level types have been defined:

text/* for human-readable words
image/* for pictures
model/* for 3D models such as VRML files
audio/* for sound
video/* for moving pictures, possibly including sound
application/* for binary data
message/* for protocol-specific envelopes such as email messages and HTTP responses
multipart/* for containers of multiple documents and resources

Each of these has many different subtypes. The most current list of registered MIME types is available from http://www.iana.org/assignments/media-types/. In addition, nonstandard custom types and subtypes can be freely defined as long as they begin with x-. For example, Flash files are commonly assigned the type application/x-shockwave-flash. Finally, the request is terminated with a blank line—that is, two carriage return/linefeed pairs, \r\n\r\n.

Once the server sees that blank line, it begins sending its response to the client over the same connection. The response begins with a status line, followed by a header describing the response using the same “name: value” syntax as the request header, a blank line, and the requested resource. A typical successful response looks something like this:

HTTP/1.1 200 OK
Date: Sun, 21 Apr 2013 15:12:46 GMT
Server: Apache
Connection: close
Content-Type: text/html; charset=ISO-8859-1
Content-length: 115

<html>
   <head>
      <title>
         A Sample HTML file
      </title>
   </head>
   <body>
      The rest of the document goes here
   </body>
</html>

HTTP Status Codes

Code and Messages	Meaning	HttpURLConnection constant
1XX	Informational	N/A
100 Continue	The server is prepared to accept the request body and the client should send it; allows clients to ask whether the server will accept a request before they send a large amount of data as part of the request.	N/A
101 Switching Protocols	The server accepts the client’s request in the Upgrade header field to change the application protocol (e.g., from HTTP to WebSockets.)	N/A
2XX Successful	Request succeeded.
200 OK	The most common response code. If the request method was GET or POST, the requested data is contained in the response along with the usual headers. If the request method was HEAD, only the header information is included.	HTTP_OK
201 Created	The server has created a resource at the URL specified in the body of the response. The client should now attempt to load that URL. This code is only sent in response to POST requests.	HTTP_CREATED
202 Accepted	This rather uncommon response indicates that a request (generally from POST) is being processed, but the processing is not yet complete, so no response can be returned. However, the server should return an HTML page that explains the situation to the user and provide an estimate of when the request is likely to be completed, and, ideally, a link to a status monitor of some kind.	HTTP_ACCEPTED
203 Non-authoritative Information	The resource representation was returned from a caching proxy or other local source and is not guaranteed to be up to date.	HTTP_NOT_AUTHORITATIVE
204 No Content	The server has successfully processed the request but has no information to send back to the client. This is normally the result of a poorly written form-processing program on the server that accepts data but does not return a response to the user.	HTTP_NO_CONTENT
205 Reset Content	The server has successfully processed the request but has no information to send back to the client. Furthermore, the client should clear the form to which the request is sent.	HTTP_RESET
206 Partial Content	The server has returned the part of the resource the client requested using the byte range extension to HTTP, rather than the whole document.	HTTP_PARTIAL
226 IM Used	Response is delta encoded.	N/A
3XX Redirection	Relocation and redirection.
300 Multiple Choices	The server is providing a list of different representations (e.g., PostScript and PDF) for the requested document.	HTTP_MULT_CHOICE
301 Moved Permanently	The resource has moved to a new URL. The client should automatically load the resource at this URL and update any bookmarks that point to the old URL.	HTTP_MOVED_PERM
302 Moved Temporarily	The resource is at a new URL temporarily, but its location will change again in the foreseeable future; therefore, bookmarks should not be updated. Sometimes used by proxies that require the user to log in locally before accessing the Web.	HTTP_MOVED_TEMP
303 See Other	Generally used in response to a POST form request, this code indicates that the user should retrieve a resource from a different URL using GET.	HTTP_SEE_OTHER
304 Not Modified	The If-Modified-Since header indicates that the client wants the document only if it has been recently updated. This status code is returned if the document has not been updated. In this case, the client should load the document from its cache.	HTTP_NOT_MODIFIED
305 Use Proxy	The Location header field contains the address of a proxy that will serve the response.	HTTP_USE_PROXY
307 Temporary Redirect	Similar to 302 but without allowing the HTTP method to change.	N/A
308 Permanent Redirect	Similar to 301 but without allowing the HTTP method to change.	N/A
4XX	Client error.
400 Bad Request	The client request to the server used improper syntax. This is rather unusual in normal web browsing but more common when debugging custom clients.	HTTP_BAD_REQUEST
401 Unauthorized	Authorization, generally a username and password, is required to access this page. Either a username and password have not yet been presented or the username and password are invalid.	HTTP_UNAUTHORIZED
422 Unprocessable Entity	The content type of the request body is recognized, and the body is syntactically correct, but nonetheless the server can’t process it.	N/A
424 Failed Dependency	Request failed as a result of the failure of a previous request.	N/A
426 Upgrade Required	Client is using a too old or insecure a version of the HTTP protocol.	N/A
428 Precondition Required	Request must supply an If-Match header.	N/A
429 Too Many Requests	The client is being rate limited and should slow down.	N/A
431 Request Header	Fields Too Large Either the header as a whole is too large, or one particular header field is too large.	N/A
451 Unavailable	For Legal Reasons Experimental; the server is prohibited by law from servicing the request.	N/A
5XX	Server error.
500 Internal Server Error	An unexpected condition occurred that the server does not know how to handle.	HTTP_SERVER_ERROR HTTP_INTERNAL_ERROR
501 Not Implemented	The server does not have a feature that is needed to fulfill this request. A server that cannot handle PUT requests might send this response to a client that tried to PUT form data to it.	HTTP_NOT_IMPLEMENTED
502 Bad Gateway	This code is applicable only to servers that act as proxies or gateways. It indicates that the proxy received an invalid response from a server it was connecting to in an effort to fulfill the request.	HTTP_BAD_GATEWAY
503 Service Unavailable	The server is temporarily unable to handle the request, perhaps due to overloading or maintenance.	HTTP_UNAVAILABLE
504 Gateway Timeout	The proxy server did not receive a response from the upstream server within a reasonable amount of time, so it can’t send the desired response to the client.	HTTP_GATEWAY_TIMEOUT
505 HTTP Version Not Supported	The server does not support the version of HTTP the client is using (e.g., the as-yet-nonexistent HTTP 2.0).	HTTP_VERSION
507 Insufficient Storage	Server does not have enough space to store the supplied request entity; typically used for POST or PUT.
511 Network Authentication Required	The client needs to authenticate to gain network access (e.g., on a hotel wireless network).	N/A

Regardless of version, a response code from 100 to 199 always indicates an informational response, 200 to 299 always indicates success, 300 to 399 always indicates redirection, 400 to 499 always indicates a client error, and 500 to 599 indicates a server error.

Keep-Alive

HTTP 1.0 opens a new connection for each request. In practice, the time taken to open and close all the connections in a typical web session can outweigh the time taken to transmit the data, especially for sessions with many small documents. This is even more problematic for encrypted HTTPS connections using SSL or TLS, because the handshake to set up a secure socket is substantially more work than setting up a regular socket.

In HTTP 1.1 and later, the server doesn’t have to close the socket after it sends its response. It can leave it open and wait for a new request from the client on the same socket. Multiple requests and responses can be sent in series over a single TCP connection. However, the lockstep pattern of a client request followed by a server response remains the same. A client indicates that it’s willing to reuse a socket by including a Connection field in the HTTP request header with the value Keep-Alive:

Connection: Keep-Alive

The URL class transparently supports HTTP Keep-Alive unless explicitly turned off. That is, it will reuse a socket if you connect to the same server again before the server has closed the connection. You can control Java’s use of HTTP Keep-Alive with several system properties:

Set http.keepAlive to “true or false” to enable/disable HTTP Keep-Alive. (It is enabled by default.)
Set http.maxConnections to the number of sockets you’re willing to hold open at one time. The default is 5.
Set http.keepAlive.remainingData to true to let Java clean up after abandoned connections (Java 6 or later). It is false by default.
Set sun.net.http.errorstream.enableBuffering to true to attempt to buffer the relatively short error streams from 400- and 500-level responses, so the connection can be freed up for reuse sooner. It is false by default.
Set sun.net.http.errorstream.bufferSize to the number of bytes to use for buffering error streams. The default is 4,096 bytes.
Set sun.net.http.errorstream.timeout to the number of milliseconds before timing out a read from the error stream. It is 300 milliseconds by default.

The defaults are reasonable, except that you probably do want to set sun.net.http.errorstream.enableBuffering to true unless you want to read the error streams from failed requests.

Network Programming

The Protocol

Introduction

HTTP Status Codes

Keep-Alive

You may like

Leave a Reply Cancel reply

Network Programming

Introduction

HTTP Status Codes

Keep-Alive

You may like

How can we help?

Leave a Reply Cancel reply