📄 uri.java
字号:
/* URI.java -- An URI class Copyright (C) 2002, 2004, 2005, 2006 Free Software Foundation, Inc.This file is part of GNU Classpath.GNU Classpath is free software; you can redistribute it and/or modifyit under the terms of the GNU General Public License as published bythe Free Software Foundation; either version 2, or (at your option)any later version.GNU Classpath is distributed in the hope that it will be useful, butWITHOUT ANY WARRANTY; without even the implied warranty ofMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNUGeneral Public License for more details.You should have received a copy of the GNU General Public Licensealong with GNU Classpath; see the file COPYING. If not, write to theFree Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA02110-1301 USA.Linking this library statically or dynamically with other modules ismaking a combined work based on this library. Thus, the terms andconditions of the GNU General Public License cover the wholecombination.As a special exception, the copyright holders of this library give youpermission to link this library with independent modules to produce anexecutable, regardless of the license terms of these independentmodules, and to copy and distribute the resulting executable underterms of your choice, provided that you also meet, for each linkedindependent module, the terms and conditions of the license of thatmodule. An independent module is a module which is not derived fromor based on this library. If you modify this library, you may extendthis exception to your version of the library, but you are notobligated to do so. If you do not wish to do so, delete thisexception statement from your version. */package java.net;import java.io.IOException;import java.io.ObjectInputStream;import java.io.ObjectOutputStream;import java.io.Serializable;import java.util.regex.Matcher;import java.util.regex.Pattern;/** * <p> * A URI instance represents that defined by * <a href="http://www.ietf.org/rfc/rfc3986.txt">RFC3986</a>, * with some deviations. * </p> * <p> * At its highest level, a URI consists of: * </p> * <code>[<em>scheme</em><strong>:</strong>]<em>scheme-specific-part</em> * [<strong>#</strong><em>fragment</em>]</code> * </p> * <p> * where <strong>#</strong> and <strong>:</strong> are literal characters, * and those parts enclosed in square brackets are optional. * </p> * <p> * There are two main types of URI. An <em>opaque</em> URI is one * which just consists of the above three parts, and is not further * defined. An example of such a URI would be <em>mailto:</em> URI. * In contrast, <em>hierarchical</em> URIs give further definition * to the scheme-specific part, so as represent some part of a hierarchical * structure. * </p> * <p> * <code>[<strong>//</strong><em>authority</em>][<em>path</em>] * [<strong>?</strong><em>query</em>]</code> * </p> * <p> * with <strong>/</strong> and <strong>?</strong> being literal characters. * When server-based, the authority section is further subdivided into: * </p> * <p> * <code>[<em>user-info</em><strong>@</strong>]<em>host</em> * [<strong>:</strong><em>port</em>]</code> * </p> * <p> * with <strong>@</strong> and <strong>:</strong> as literal characters. * Authority sections that are not server-based are said to be registry-based. * </p> * <p> * Hierarchical URIs can be either relative or absolute. Absolute URIs * always start with a `<strong>/</strong>', while relative URIs don't * specify a scheme. Opaque URIs are always absolute. * </p> * <p> * Each part of the URI may have one of three states: undefined, empty * or containing some content. The former two of these are represented * by <code>null</code> and the empty string in Java, respectively. * The scheme-specific part may never be undefined. It also follows from * this that the path sub-part may also not be undefined, so as to ensure * the former. * </p> * <h2>Character Escaping and Quoting</h2> * <p> * The characters that can be used within a valid URI are restricted. * There are two main classes of characters which can't be used as is * within the URI: * </p> * <ol> * <li><strong>Characters outside the US-ASCII character set</strong>. * These have to be <strong>escaped</strong> in order to create * an RFC-compliant URI; this means replacing the character with the * appropriate hexadecimal value, preceded by a `%'.</li> * <li><strong>Illegal characters</strong> (e.g. space characters, * control characters) are quoted, which results in them being encoded * in the same way as non-US-ASCII characters.</li> * </ol> * <p> * The set of valid characters differs depending on the section of the URI: * </p> * <ul> * <li><strong>Scheme</strong>: Must be an alphanumeric, `-', `.' or '+'.</li> * <li><strong>Authority</strong>:Composed of the username, host, port, `@' * and `:'.</li> * <li><strong>Username</strong>: Allows unreserved or percent-encoded * characters, sub-delimiters and `:'.</li> * <li><strong>Host</strong>: Allows unreserved or percent-encoded * characters, sub-delimiters and square brackets (`[' and `]') for IPv6 * addresses.</li> * <li><strong>Port</strong>: Digits only.</li> * <li><strong>Path</strong>: Allows the path characters and `/'. * <li><strong>Query</strong>: Allows the path characters, `?' and '/'. * <li><strong>Fragment</strong>: Allows the path characters, `?' and '/'. * </ul> * <p> * These definitions reference the following sets of characters: * </p> * <ul> * <li><strong>Unreserved characters</strong>: The alphanumerics plus * `-', `.', `_', and `~'.</li> * <li><strong>Sub-delimiters</strong>: `!', `$', `&', `(', `)', `*', * `+', `,', `;', `=' and the single-quote itself.</li> * <li><strong>Path characters</strong>: Unreserved and percent-encoded * characters and the sub-delimiters along with `@' and `:'.</li> * </ul> * <p> * The constructors and accessor methods allow the use and retrieval of * URI components which contain non-US-ASCII characters directly. * They are only escaped when the <code>toASCIIString()</code> method * is used. In contrast, illegal characters are always quoted, with the * exception of the return values of the non-raw accessors. * </p> * * @author Ito Kazumitsu (ito.kazumitsu@hitachi-cable.co.jp) * @author Dalibor Topic (robilad@kaffe.org) * @author Michael Koch (konqueror@gmx.de) * @author Andrew John Hughes (gnu_andrew@member.fsf.org) * @since 1.4 */public final class URI implements Comparable, Serializable{ /** * For serialization compatability. */ static final long serialVersionUID = -6052424284110960213L; /** * Regular expression for parsing URIs. * * Taken from RFC 2396, Appendix B. * This expression doesn't parse IPv6 addresses. */ private static final String URI_REGEXP = "^(([^:/?#]+):)?((//([^/?#]*))?([^?#]*)(\\?([^#]*))?)?(#(.*))?"; /** * Regular expression for parsing the authority segment. */ private static final String AUTHORITY_REGEXP = "(([^?#]*)@)?([^?#:]*)(:([0-9]*))?"; /** * Valid characters (taken from rfc2396/3986) */ private static final String RFC2396_DIGIT = "0123456789"; private static final String RFC2396_LOWALPHA = "abcdefghijklmnopqrstuvwxyz"; private static final String RFC2396_UPALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; private static final String RFC2396_ALPHA = RFC2396_LOWALPHA + RFC2396_UPALPHA; private static final String RFC2396_ALPHANUM = RFC2396_DIGIT + RFC2396_ALPHA; private static final String RFC3986_UNRESERVED = RFC2396_ALPHANUM + "-._~"; private static final String RFC3986_SUBDELIMS = "!$&'()*+,;="; private static final String RFC3986_REG_NAME = RFC3986_UNRESERVED + RFC3986_SUBDELIMS + "%"; private static final String RFC3986_PCHAR = RFC3986_UNRESERVED + RFC3986_SUBDELIMS + ":@%"; private static final String RFC3986_SEGMENT = RFC3986_PCHAR; private static final String RFC3986_PATH_SEGMENTS = RFC3986_SEGMENT + "/"; private static final String RFC3986_SSP = RFC3986_PCHAR + "?/"; private static final String RFC3986_HOST = RFC3986_REG_NAME + "[]"; private static final String RFC3986_USERINFO = RFC3986_REG_NAME + ":"; /** * Index of scheme component in parsed URI. */ private static final int SCHEME_GROUP = 2; /** * Index of scheme-specific-part in parsed URI. */ private static final int SCHEME_SPEC_PART_GROUP = 3; /** * Index of authority component in parsed URI. */ private static final int AUTHORITY_GROUP = 5; /** * Index of path component in parsed URI. */ private static final int PATH_GROUP = 6; /** * Index of query component in parsed URI. */ private static final int QUERY_GROUP = 8; /** * Index of fragment component in parsed URI. */ private static final int FRAGMENT_GROUP = 10; /** * Index of userinfo component in parsed authority section. */ private static final int AUTHORITY_USERINFO_GROUP = 2; /** * Index of host component in parsed authority section. */ private static final int AUTHORITY_HOST_GROUP = 3; /** * Index of port component in parsed authority section. */ private static final int AUTHORITY_PORT_GROUP = 5; /** * The compiled version of the URI regular expression. */ private static final Pattern URI_PATTERN; /** * The compiled version of the authority regular expression. */ private static final Pattern AUTHORITY_PATTERN; /** * The set of valid hexadecimal characters. */ private static final String HEX = "0123456789ABCDEF"; private transient String scheme; private transient String rawSchemeSpecificPart; private transient String schemeSpecificPart; private transient String rawAuthority; private transient String authority; private transient String rawUserInfo; private transient String userInfo; private transient String rawHost; private transient String host; private transient int port = -1; private transient String rawPath; private transient String path; private transient String rawQuery; private transient String query; private transient String rawFragment; private transient String fragment; private String string; /** * Static initializer to pre-compile the regular expressions. */ static { URI_PATTERN = Pattern.compile(URI_REGEXP); AUTHORITY_PATTERN = Pattern.compile(AUTHORITY_REGEXP); } private void readObject(ObjectInputStream is) throws ClassNotFoundException, IOException { this.string = (String) is.readObject(); try { parseURI(this.string); } catch (URISyntaxException x) { // Should not happen. throw new RuntimeException(x); } } private void writeObject(ObjectOutputStream os) throws IOException { if (string == null) string = toString(); os.writeObject(string); } /** * <p> * Returns the string content of the specified group of the supplied * matcher. The returned value is modified according to the following: * </p> * <ul> * <li>If the resulting string has a length greater than 0, then * that string is returned.</li> * <li>If a string of zero length, is matched, then the content * of the preceding group is considered. If this is also an empty * string, then <code>null</code> is returned to indicate an undefined * value. Otherwise, the value is truly the empty string and this is * the returned value.</li> * </ul> * <p> * This method is used for matching against all parts of the URI * that may be either undefined or empty (i.e. all those but the * scheme-specific part and the path). In each case, the preceding * group is the content of the original group, along with some * additional distinguishing feature. For example, the preceding * group for the query includes the preceding question mark, * while that of the fragment includes the hash symbol. The presence * of these features enables disambiguation between the two cases * of a completely unspecified value and a simple non-existant value. * The scheme differs in that it will never return an empty string; * the delimiter follows the scheme rather than preceding it, so * it becomes part of the following section. The same is true * of the user information. * </p> * * @param match the matcher, which contains the results of the URI * matched against the URI regular expression. * @return either the matched content, <code>null</code> for undefined * values, or an empty string for a URI part with empty content. */ private static String getURIGroup(Matcher match, int group) { String matched = match.group(group); if (matched == null || matched.length() == 0) { String prevMatched = match.group(group -1); if (prevMatched == null || prevMatched.length() == 0) return null; else return ""; } return matched; } /** * Sets fields of this URI by parsing the given string. * * @param str The string to parse * * @exception URISyntaxException If the given string violates RFC 2396 */ private void parseURI(String str) throws URISyntaxException { Matcher matcher = URI_PATTERN.matcher(str); if (matcher.matches()) { scheme = getURIGroup(matcher, SCHEME_GROUP); rawSchemeSpecificPart = matcher.group(SCHEME_SPEC_PART_GROUP); schemeSpecificPart = unquote(rawSchemeSpecificPart); if (!isOpaque()) { rawAuthority = getURIGroup(matcher, AUTHORITY_GROUP); rawPath = matcher.group(PATH_GROUP); rawQuery = getURIGroup(matcher, QUERY_GROUP); } rawFragment = getURIGroup(matcher, FRAGMENT_GROUP); } else throw new URISyntaxException(str, "doesn't match URI regular expression"); parseServerAuthority(); // We must eagerly unquote the parts, because this is the only time // we may throw an exception. authority = unquote(rawAuthority); userInfo = unquote(rawUserInfo); host = unquote(rawHost); path = unquote(rawPath); query = unquote(rawQuery); fragment = unquote(rawFragment); } /** * Unquote "%" + hex quotes characters * * @param str The string to unquote or null. * * @return The unquoted string or null if str was null. * * @exception URISyntaxException If the given string contains invalid * escape sequences. */ private static String unquote(String str) throws URISyntaxException { if (str == null) return null; byte[] buf = new byte[str.length()]; int pos = 0; for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); if (c == '%') { if (i + 2 >= str.length()) throw new URISyntaxException(str, "Invalid quoted character"); int hi = Character.digit(str.charAt(++i), 16); int lo = Character.digit(str.charAt(++i), 16); if (lo < 0 || hi < 0) throw new URISyntaxException(str, "Invalid quoted character"); buf[pos++] = (byte) (hi * 16 + lo); } else buf[pos++] = (byte) c; } try { return new String(buf, 0, pos, "utf-8"); } catch (java.io.UnsupportedEncodingException x2) { throw (Error) new InternalError().initCause(x2); } } /** * Quote characters illegal in URIs in given string. * * Replace illegal characters by encoding their UTF-8 * representation as "%" + hex code for each resulting * UTF-8 character. * * @param str The string to quote * * @return The quoted string. */ private static String quote(String str) { return quote(str, RFC3986_SSP); } /** * Quote characters illegal in URI authorities in given string. * * Replace illegal characters by encoding their UTF-8 * representation as "%" + hex code for each resulting * UTF-8 character. * * @param str The string to quote * * @return The quoted string. */ private static String quoteAuthority(String str) { // Technically, we should be using RFC2396_AUTHORITY, but // it contains no additional characters. return quote(str, RFC3986_REG_NAME); } /** * Quotes the characters in the supplied string that are not part of * the specified set of legal characters. * * @param str the string to quote * @param legalCharacters the set of legal characters * * @return the quoted string.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -