uri.java
来自「This is a resource based on j2me embedde」· Java 代码 · 共 1,579 行 · 第 1/5 页
JAVA
1,579 行
/* * @(#)URI.java 1.41 06/10/10 * * Copyright 1990-2008 Sun Microsystems, Inc. All Rights Reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER * * This program is free software; you can redistribute it and/or * modify it under the terms of the GNU General Public License version * 2 only, as published by the Free Software Foundation. * * This program is distributed in the hope that it will be useful, but * WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * General Public License version 2 for more details (a copy is * included at /legal/license.txt). * * You should have received a copy of the GNU General Public License * version 2 along with this work; if not, write to the Free Software * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA * 02110-1301 USA * * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa * Clara, CA 95054 or visit www.sun.com if you need additional * information or have any questions. */package java.net;import java.io.IOException;import java.io.InvalidObjectException;import java.io.ObjectInputStream;import java.io.ObjectOutputStream;import java.io.Serializable;import java.io.UnsupportedEncodingException;import sun.text.Normalizer;import java.lang.Character; // for javadocimport java.lang.NullPointerException; // for javadoc/** * Represents a Uniform Resource Identifier (URI) reference. * * <p> An instance of this class represents a URI reference as defined by <a * href="http://www.ietf.org/rfc/rfc2396.txt""><i>RFC 2396: Uniform * Resource Identifiers (URI): Generic Syntax</i></a>, amended by <a * href="http://www.ietf.org/rfc/rfc2732.txt"><i>RFC 2732: Format for * Literal IPv6 Addresses in URLs</i></a> and with the minor deviations noted * below. This class provides constructors for creating URI instances from * their components or by parsing their string forms, methods for accessing the * various components of an instance, and methods for normalizing, resolving, * and relativizing URI instances. Instances of this class are immutable. * * * <h4> URI syntax and components </h4> * * At the highest level a URI reference (hereinafter simply "URI") in string * form has the syntax * * <blockquote> * [<i>scheme</i><tt><b>:</b></tt><i></i>]<i>scheme-specific-part</i>[<tt><b>#</b></tt><i>fragment</i>] * </blockquote> * * where square brackets [...] delineate optional components and the characters * <tt><b>:</b></tt> and <tt><b>#</b></tt> stand for themselves. * * <p> An <i>absolute</i> URI specifies a scheme; a URI that is not absolute is * said to be <i>relative</i>. URIs are also classified according to whether * they are <i>opaque</i> or <i>hierarchical</i>. * * <p> An <i>opaque</i> URI is an absolute URI whose scheme-specific part does * not begin with a slash character (<tt>'/'</tt>). Opaque URIs are not * subject to further parsing. Some examples of opaque URIs are: * * <blockquote><table cellpadding=0 cellspacing=0 summary="layout"> * <tr><td><tt>mailto:java-net@java.sun.com</tt><td></tr> * <tr><td><tt>news:comp.lang.java</tt><td></tr> * <tr><td><tt>urn:isbn:096139210x</tt></td></tr> * </table></blockquote> * * <p> A <i>hierarchical</i> URI is either an absolute URI whose * scheme-specific part begins with a slash character, or a relative URI, that * is, a URI that does not specify a scheme. Some examples of hierarchical * URIs are: * * <blockquote> * <tt>http://java.sun.com/j2se/1.3/</tt><br> * <tt>docs/guide/collections/designfaq.html#28</tt></br> * <tt>../../../demo/jfc/SwingSet2/src/SwingSet2.java</tt></br> * <tt>file:///~/calendar</tt> * </blockquote> * * <p> A hierarchical URI is subject to further parsing according to the syntax * * <blockquote> * [<i>scheme</i><tt><b>:</b></tt>][<tt><b>//</b></tt><i>authority</i>][<i>path</i>][<tt><b>?</b></tt><i>query</i>][<tt><b>#</b></tt><i>fragment</i>] * </blockquote> * * where the characters <tt><b>:</b></tt>, <tt><b>/</b></tt>, * <tt><b>?</b></tt>, and <tt><b>#</b></tt> stand for themselves. The * scheme-specific part of a hierarchical URI consists of the characters * between the scheme and fragment components. * * <p> The authority component of a hierarchical URI is, if specified, either * <i>server-based</i> or <i>registry-based</i>. A server-based authority * parses according to the familiar syntax * * <blockquote> * [<i>user-info</i><tt><b>@</b></tt>]<i>host</i>[<tt><b>:</b></tt><i>port</i>] * </blockquote> * * where the characters <tt><b>@</b></tt> and <tt><b>:</b></tt> stand for * themselves. Nearly all URI schemes currently in use are server-based. An * authority component that does not parse in this way is considered to be * registry-based. * * <p> The path component of a hierarchical URI is itself said to be absolute * if it begins with a slash character (<tt>'/'</tt>); otherwise it is * relative. The path of a hierarchical URI that is either absolute or * specifies an authority is always absolute. * * <p> All told, then, a URI instance has the following nine components: * * <blockquote><table summary="Describes the components of a URI:scheme,scheme-specific-part,authority,user-info,host,port,path,query,fragment"> * <tr><th><i>Component</i></th><th><i>Type</i></th></tr> * <tr><td>scheme</td><td><tt>String</tt></td></tr> * <tr><td>scheme-specific-part </td><td><tt>String</tt></td></tr> * <tr><td>authority</td><td><tt>String</tt></td></tr> * <tr><td>user-info</td><td><tt>String</tt></td></tr> * <tr><td>host</td><td><tt>String</tt></td></tr> * <tr><td>port</td><td><tt>int</tt></td></tr> * <tr><td>path</td><td><tt>String</tt></td></tr> * <tr><td>query</td><td><tt>String</tt></td></tr> * <tr><td>fragment</td><td><tt>String</tt></td></tr> * </table></blockquote> * * In a given instance any particular component is either <i>undefined</i> or * <i>defined</i> with a distinct value. Undefined string components are * represented by <tt>null</tt>, while undefined integer components are * represented by <tt>-1</tt>. A string component may be defined to have the * empty string as its value; this is not equivalent to that component being * undefined. * * <p> Whether a particular component is or is not defined in an instance * depends upon the type of the URI being represented. An absolute URI has a * scheme component. An opaque URI has a scheme, a scheme-specific part, and * possibly a fragment, but has no other components. A hierarchical URI always * has a path (though it may be empty) and a scheme-specific-part (which at * least contains the path), and may have any of the other components. If the * authority component is present and is server-based then the host component * will be defined and the user-information and port components may be defined. * * * <h4> Operations on URI instances </h4> * * The key operations supported by this class are those of * <i>normalization</i>, <i>resolution</i>, and <i>relativization</i>. * * <p> <i>Normalization</i> is the process of removing unnecessary <tt>"."</tt> * and <tt>".."</tt> segments from the path component of a hierarchical URI. * Each <tt>"."</tt> segment is simply removed. A <tt>".."</tt> segment is * removed only if it is preceded by a non-<tt>".."</tt> segment. * Normalization has no effect upon opaque URIs. * * <p> <i>Resolution</i> is the process of resolving one URI against another, * <i>base</i> URI. The resulting URI is constructed from components of both * URIs in the manner specified by RFC 2396, taking components from the * base URI for those not specified in the original. For hierarchical URIs, * the path of the original is resolved against the path of the base and then * normalized. The result, for example, of resolving * * <blockquote> * <tt>docs/guide/collections/designfaq.html#28 </tt>(1) * </blockquote> * * against the base URI <tt>http://java.sun.com/j2se/1.3/</tt> is the result * URI * * <blockquote> * <tt>http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28</tt> * </blockquote> * * Resolving the relative URI * * <blockquote> * <tt>../../../demo/jfc/SwingSet2/src/SwingSet2.java </tt>(2) * </blockquote> * * against this result yields, in turn, * * <blockquote> * <tt>http://java.sun.com/j2se/1.3/demo/jfc/SwingSet2/src/SwingSet2.java</tt> * </blockquote> * * Resolution of both absolute and relative URIs, and of both absolute and * relative paths in the case of hierarchical URIs, is supported. Resolving * the URI <tt>file:///~calendar</tt> against any other URI simply yields the * original URI, since it is absolute. Resolving the relative URI (2) above * against the relative base URI (1) yields the normalized, but still relative, * URI * * <blockquote> * <tt>demo/jfc/SwingSet2/src/SwingSet2.java</tt> * </blockquote> * * <p> <i>Relativization</i>, finally, is the inverse of resolution: For any * two normalized URIs <i>u</i> and <i>v</i>, * * <blockquote> * <i>u</i><tt>.relativize(</tt><i>u</i><tt>.resolve(</tt><i>v</i><tt>)).equals(</tt><i>v</i><tt>)</tt> and<br> * <i>u</i><tt>.resolve(</tt><i>u</i><tt>.relativize(</tt><i>v</i><tt>)).equals(</tt><i>v</i><tt>)</tt> .<br> * </blockquote> * * This operation is often useful when constructing a document containing URIs * that must be made relative to the base URI of the document wherever * possible. For example, relativizing the URI * * <blockquote> * <tt>http://java.sun.com/j2se/1.3/docs/guide/index.html</tt> * </blockquote> * * against the base URI * * <blockquote> * <tt>http://java.sun.com/j2se/1.3</tt> * </blockquote> * * yields the relative URI <tt>docs/guide/index.html</tt>. * * * <h4> Character categories </h4> * * RFC 2396 specifies precisely which characters are permitted in the * various components of a URI reference. The following categories, most of * which are taken from that specification, are used below to describe these * constraints: * * <blockquote><table cellspacing=2 summary="Describes categories alpha,digit,alphanum,unreserved,punct,reserved,escaped,and other"> * <tr><th valign=top><i>alpha</i></th> * <td>The US-ASCII alphabetic characters, * <tt>'A'</tt> through <tt>'Z'</tt> * and <tt>'a'</tt> through <tt>'z'</tt></td></tr> * <tr><th valign=top><i>digit</i></th> * <td>The US-ASCII decimal digit characters, * <tt>'0'</tt> through <tt>'9'</tt></td></tr> * <tr><th valign=top><i>alphanum</i></th> * <td>All <i>alpha</i> and <i>digit</i> characters</td></tr> * <tr><th valign=top><i>unreserved</i> </th> * <td>All <i>alphanum</i> characters together with those in the string * <tt>"_-!.~'()*"</tt></td></tr> * <tr><th valign=top><i>punct</i></th> * <td>The characters in the string <tt>",;:$&+="</tt></td></tr> * <tr><th valign=top><i>reserved</i></th> * <td>All <i>punct</i> characters together with those in the string * <tt>"?/[]@"</tt></td></tr> * <tr><th valign=top><i>escaped</i></th> * <td>Escaped octets, that is, triplets consisting of the percent * character (<tt>'%'</tt>) followed by two hexadecimal digits * (<tt>'0'</tt>-<tt>'9'</tt>, <tt>'A'</tt>-<tt>'F'</tt>, and * <tt>'a'</tt>-<tt>'f'</tt>)</td></tr> * <tr><th valign=top><i>other</i></th> * <td>The Unicode characters that are not in the US-ASCII character set, * are not control characters (according to the {@link * java.lang.Character#isISOControl(char) Character.isISOControl} * method), and are not space characters (according to the {@link * java.lang.Character#isSpaceChar(char) Character.isSpaceChar} * method) (<b><i>Deviation from RFC 2396</b>, which is * limited to US-ASCII)</td></tr> * </table></blockquote> * * <p><a name="legal-chars"> The set of all legal URI characters consists of * the <i>unreserved</i>, <i>reserved</i>, <i>escaped</i>, and <i>other</i> * characters. * * * <h4> Escaped octets, quotation, encoding, and decoding </h4> * * RFC 2396 allows escaped octets to appear in the user-info, path, query, and * fragment components. Escaping serves two purposes in URIs: * * <ul> * * <li><p> To <i>encode</i> non-US-ASCII characters when a URI is required to * conform strictly to RFC 2396 by not containing any <i>other</i> * characters. </p></li> * * <li><p> To <i>quote</i> characters that are otherwise illegal in a * component. The user-info, path, query, and fragment components differ * slightly in terms of which characters are considered legal and illegal. * </p></li> * * </ul> * * These purposes are served in this class by three related operations: * * <ul> * * <li><p><a name="encode"> A character is <i>encoded</i> by replacing it * with the sequence of escaped octets that represent that character in the * UTF-8 character set. The Euro currency symbol (<tt>'\u20AC'</tt>), * for example, is encoded as <tt>"%E2%82%AC"</tt>. <i>(<b>Deviation from * RFC 2396</b>, which does not specify any particular character * set.)</i> </li></p> * * <li><p><a name="quote"> An illegal character is <i>quoted</i> simply by * encoding it. The space character, for example, is quoted by replacing it * with <tt>"%20"</tt>. UTF-8 contains US-ASCII, hence for US-ASCII * characters this transformation has exactly the effect required by * RFC 2396. * * <li><p><a name="decode"> A sequence of escaped octets is <i>decoded</i> by * replacing it with the sequence of characters that it represents in the * UTF-8 character set. UTF-8 contains US-ASCII, hence decoding has the * effect of de-quoting any quoted US-ASCII characters as well as that of * decoding any encoded non-US-ASCII characters. If a <a * href="http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/CharsetDecoder.html#ce">decoding error</a> occurs
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?