1. Home
  2. Docs
  3. Network Programming
  4. URLs and URIs
  5. The URI Class

The URI Class

Introduction

A URI is a generalization of a URL that includes not only Uniform Resource Locators but also Uniform Resource Names (URNs). Most URIs used in practice are URLs, but most specifications and standards such as XML are defined in terms of URIs. In Java, URIs are represented by the java.net.URI class. This class differs from the java.net.URL class in three important ways:

  • The URI class is purely about identification of resources and parsing of URIs. It provides no methods to retrieve a representation of the resource identified by its URI.
  • The URI class is more conformant to the relevant specifications than the URL class.
  • A URI object can represent a relative URI. The URL class absolutizes all URIs before storing them.

In brief, a URL object is a representation of an application layer protocol for network retrieval, whereas a URI object is purely for string parsing and manipulation. The URI class has no network retrieval capabilities. The URL class has some string parsing methods, such as getFile() and getRef(), but many of these are broken and don’t always behave exactly as the relevant specifications say they should. Normally, you should use the URL class when you want to download the content at a URL and the URI class when you want to use the URL for identification rather than retrieval, for instance, to represent an XML namespace. When you need to do both, you may convert from a URI to a URL with the toURL() method, and from a URL to a URI using the toURI() method.

Constructing a URI

URIs are built from strings. You can either pass the entire URI to the constructor in a single string, or the individual pieces:

public URI(String uri) throws URISyntaxException

public URI(String scheme, String schemeSpecificPart, String fragment) throws URISyntaxException

public URI(String scheme, String host, String path, String fragment) throws URISyntaxException

public URI(String scheme, String authority, String path, String query, String fragment) throws URISyntaxException

public URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment) throws URISyntaxException

URI(String uri)

This constructor creates a new URI object from any convenient string.

URI voice = new URI("tel:+1-800-9988-9938");
URI web = new URI("http://www.xml.com/pub/a/2003/09/17/stax.html#id=_hbc");
URI book = new URI("urn:isbn:1-565-92870-9");

URI(String scheme, String schemeSpecificPart, String fragment)

This constructor that takes a scheme specific part is mostly used for nonhierarchical URIs. The scheme is the URI’s protocol, such as http, urn, tel, and so forth. It must be composed exclusively of ASCII letters and digits and the three punctuation characters +, -, and .. It must begin with a letter. Passing null for this argument omits the scheme, thus creating a relative URI. For example:

URI absolute = new URI("http", "//www.ibiblio.org" , null);
URI relative = new URI(null, "/javafaq/index.shtml", "today")

URI(String scheme, String host, String path, String fragment)

This constructor is used for hierarchical URIs such as http and ftp URLs. The host and path together (separated by a /) form the scheme-specific part for this URI. For example:

URI today= new URI("http", "www.ibiblio.org", "/javafaq/index.html", "today");

This produces the URI http://www.ibiblio.org/javafaq/index.html#today.

URI(String scheme, String authority, String path, String query, String fragment)

This constructor is basically the same as the third, with the addition of a query string. For example:

URI today = new URI("http", "www.ibiblio.org", "/javafaq/index.html", "referrer=cnet&date=2014-02-23", "today");

As usual, any unescapable syntax errors cause a URISyntaxException to be thrown and null can be passed to omit any of the arguments.

URI(String scheme, String userInfo, String host, int port, String path, String query, String fragment)

This constructor is the master hierarchical URI constructor that the previous two invoke. It divides the authority into separate user info, host, and port parts, each of which has its own syntax rules.

URI styles = new URI("ftp", "anonymous:elharo@ibiblio.org", "ftp.oreilly.com", 21, "/pub/stylesheet", null, null);

However, the resulting URI still has to follow all the usual rules for URIs; and again null can be passed for any argument to omit it from the result.

The Parts of the URI

A URI reference has up to three parts: a scheme, a scheme-specific part, and a fragment identifier. The general format is:

scheme:scheme-specific-part:fragment

If the scheme is omitted, the URI reference is relative. If the fragment identifier is omitted, the URI reference is a pure URI. The URI class has getter methods that return these three parts of each URI object. The getRawFoo() methods return the encoded forms of the parts of the URI, while the equivalent getFoo() methods first decode any percentescaped characters and then return the decoded part:

public String getScheme()

public String getSchemeSpecificPart()

public String getRawSchemeSpecificPart()

public String getFragment()

public String getRawFragment()

These methods all return null if the particular URI object does not have the relevant component: for example, a relative URI without a scheme or an http URI without a fragment identifier.

A URI that has a scheme is an absolute URI. A URI without a scheme is relative. The isAbsolute() method returns true if the URI is absolute, false if it’s relative:

public boolean isAbsolute()

The details of the scheme-specific part vary depending on the type of the scheme. For example, in a tel URL, the scheme-specific part has the syntax of a telephone number. However, in many useful URIs, including the very common file and http URLs, the scheme-specific part has a particular hierarchical format divided into an authority, a path, and a query string. The authority is further divided into user info, host, and port. The isOpaque() method returns false if the URI is hierarchical, true if it’s not hierarchical—that is, if it’s opaque:

public boolean isOpaque()

If the URI is opaque, all you can get is the scheme, scheme-specific part, and fragment identifier. However, if the URI is hierarchical, there are getter methods for all the different parts of a hierarchical URI:

public String getAuthority()
public String getFragment()
public String getHost()
public String getPath()
public String getPort()
public String getQuery()
public String getUserInfo()

These methods all return the decoded parts; in other words, percent escapes, such as %3C, are changed into the characters they represent, such as <. If you want the raw, encoded parts of the URI, there are five parallel getRaw_Foo_() methods:

public String getRawAuthority()
public String getRawFragment()
public String getRawPath()
public String getRawQuery()
public String getRawUserInfo()

Remember the URI class differs from the URI specification in that non-ASCII characters such as é and ü are never percent escaped in the first place, and thus will still be present in the strings returned by the getRawFoo() methods unless the strings originally used to construct the URI object were encoded. For various technical reasons that don’t have a lot of practical impact, Java can’t always initially detect syntax errors in the authority component. The immediate symptom of this failing is normally an inability to return the individual parts of the authority, port, host, and user info. In this event, you can call parseServerAuthority() to force the authority to be reparsed:

public URI parseServerAuthority() throws URISyntaxException

The original URI does not change (URI objects are immutable), but the URI returned will have separate authority parts for user info, host, and port. If the authority cannot be parsed, a URISyntaxException is thrown.

Resolving Relative URIs

The URI class has three methods for converting back and forth between relative and absolute URIs:

public URI resolve(URI uri)
public URI resolve(String uri)
public URI relativize(URI uri)

The resolve() methods compare the uri argument to this URI and use it to construct a new URI object that wraps an absolute URI. For example, consider these three lines of code:

URI absolute = new URI("http://www.example.com/");
URI relative = new URI("images/logo.png");
URI resolved = absolute.resolve(relative);

After they’ve executed, resolved contains the absolute URI http://www.example.com/images/logo.png. If the invoking URI does not contain an absolute URI itself, the resolve() method resolves as much of the URI as it can and returns a new relative URI object as a result. For example, take these three statements:

URI top = new URI("javafaq/books/");
URI resolved = top.resolve("jnp3/examples/07/index.html");

After they’ve executed, resolved now contains the relative URI javafaq/books/jnp3/examples/07/index.html with no scheme or authority. It’s also possible to reverse this procedure; that is, to go from an absolute URI to a relative one. The relativize() method creates a new URI object from the uri argument that is relative to the invoking URI. The argument is not changed. For example:

URI absolute = new URI("http://www.example.com/images/logo.png");
URI top = new URI("http://www.example.com/");
URI relative = top.relativize(absolute);

The URI object relative now contains the relative URI images/logo.png.

Equality and Comparison

URIs are tested for equality pretty much as you’d expect. It’s not quite direct string comparison. Equal URIs must both either be hierarchical or opaque. The scheme and authority parts are compared without considering case. That is, http and HTTP are the same scheme, and www.example.com is the same authority as www.EXAMPLE.com. The rest of the URI is case sensitive, except for hexadecimal digits used to escape illegal characters. Escapes are not decoded before comparing. http://www.example.com/A and http://www.example.com/%41 are unequal URIs. The hashCode() method is consistent with equals. Equal URIs do have the same hash code and unequal URIs are fairly unlikely to share the same hash code. URI implements Comparable, and thus URIs can be ordered. The ordering is based on string comparison of the individual parts, in this sequence:

  1. If the schemes are different, the schemes are compared, without considering case.
  2. Otherwise, if the schemes are the same, a hierarchical URI is considered to be less than an opaque URI with the same scheme.
  3. If both URIs are opaque URIs, they’re ordered according to their scheme-specific parts.
  4. If both the scheme and the opaque scheme-specific parts are equal, the URIs are compared by their fragments.
  5. If both URIs are hierarchical, they’re ordered according to their authority components, which are themselves ordered according to user info, host, and port, in that order. Hosts are case insensitive.
  6. If the schemes and the authorities are equal, the path is used to distinguish them.
  7. If the paths are also equal, the query strings are compared.
  8. If the query strings are equal, the fragments are compared.

URIs are not comparable to any type except themselves. Comparing a URI to anything except another URI causes a ClassCastException.

String Representation

Two methods convert URI objects to strings, toString() and toASCIIString():

public String toString()
public String toASCIIString()

The toString() method returns an unencoded string form of the URI (i.e., characters like é and \ are not percent escaped). Therefore, the result of calling this method is not guaranteed to be a syntactically correct URI, though it is in fact a syntactically correct IRI. This form is sometimes useful for display to human beings, but usually not for retrieval.

The toASCIIString() method returns an encoded string form of the URI. Characters like é and \ are always percent escaped whether or not they were originally escaped. This is the string form of the URI you should use most of the time. Even if the form returned by toString() is more legible for humans, they may still copy and paste it into areas that are not expecting an illegal URI. toASCIIString() always returns a syntactically correct URI.

Was this article helpful to you? Yes No

How can we help?

Leave a Reply

Your email address will not be published. Required fields are marked *