📄 rfc2396.txt
字号:
3.2. Authority Component Many URI schemes include a top hierarchical element for a naming authority, such that the namespace defined by the remainder of the URI is governed by that authority. This authority component is typically defined by an Internet-based server or a scheme-specific registry of naming authorities. authority = server | reg_name The authority component is preceded by a double slash "//" and is terminated by the next slash "/", question-mark "?", or by the end of the URI. Within the authority component, the characters ";", ":", "@", "?", and "/" are reserved.Berners-Lee, et. al. Standards Track [Page 12]RFC 2396 URI Generic Syntax August 1998 An authority component is not required for a URI scheme to make use of relative references. A base URI without an authority component implies that any relative reference will also be without an authority component.3.2.1. Registry-based Naming Authority The structure of a registry-based naming authority is specific to the URI scheme, but constrained to the allowed characters for an authority component. reg_name = 1*( unreserved | escaped | "$" | "," | ";" | ":" | "@" | "&" | "=" | "+" )3.2.2. Server-based Naming Authority URL schemes that involve the direct use of an IP-based protocol to a specified server on the Internet use a common syntax for the server component of the URI's scheme-specific data: <userinfo>@<host>:<port> where <userinfo> may consist of a user name and, optionally, scheme- specific information about how to gain authorization to access the server. The parts "<userinfo>@" and ":<port>" may be omitted. server = [ [ userinfo "@" ] hostport ] The user information, if present, is followed by a commercial at-sign "@". userinfo = *( unreserved | escaped | ";" | ":" | "&" | "=" | "+" | "$" | "," ) Some URL schemes use the format "user:password" in the userinfo field. This practice is NOT RECOMMENDED, because the passing of authentication information in clear text (such as URI) has proven to be a security risk in almost every case where it has been used. The host is a domain name of a network host, or its IPv4 address as a set of four decimal digit groups separated by ".". Literal IPv6 addresses are not supported. hostport = host [ ":" port ] host = hostname | IPv4address hostname = *( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanumBerners-Lee, et. al. Standards Track [Page 13]RFC 2396 URI Generic Syntax August 1998 IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit port = *digit Hostnames take the form described in Section 3 of [RFC1034] and Section 2.1 of [RFC1123]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumeric character and possibly also containing "-" characters. The rightmost domain label of a fully qualified domain name will never start with a digit, thus syntactically distinguishing domain names from IPv4 addresses, and may be followed by a single "." if it is necessary to distinguish between the complete domain name and any local domain. To actually be "Uniform" as a resource locator, a URL hostname should be a fully qualified domain name. In practice, however, the host component may be a local domain literal. Note: A suitable representation for including a literal IPv6 address as the host part of a URL is desired, but has not yet been determined or implemented in practice. The port is the network port number for the server. Most schemes designate protocols that have a default port number. Another port number may optionally be supplied, in decimal, separated from the host by a colon. If the port is omitted, the default port number is assumed.3.3. Path Component The path component contains data, specific to the authority (or the scheme if there is no authority component), identifying the resource within the scope of that scheme and authority. path = [ abs_path | opaque_part ] path_segments = segment *( "/" segment ) segment = *pchar *( ";" param ) param = *pchar pchar = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | "," The path may consist of a sequence of path segments separated by a single slash "/" character. Within a path segment, the characters "/", ";", "=", and "?" are reserved. Each path segment may include a sequence of parameters, indicated by the semicolon ";" character. The parameters are not significant to the parsing of relative references.Berners-Lee, et. al. Standards Track [Page 14]RFC 2396 URI Generic Syntax August 19983.4. Query Component The query component is a string of information to be interpreted by the resource. query = *uric Within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.4. URI References The term "URI-reference" is used here to denote the common usage of a resource identifier. A URI reference may be absolute or relative, and may have additional information attached in the form of a fragment identifier. However, "the URI" that results from such a reference includes only the absolute URI after the fragment identifier (if any) is removed and after any relative URI is resolved to its absolute form. Although it is possible to limit the discussion of URI syntax and semantics to that of the absolute result, most usage of URI is within general URI references, and it is impossible to obtain the URI from such a reference without also parsing the fragment and resolving the relative form. URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ] The syntax for relative URI is a shortened form of that for absolute URI, where some prefix of the URI is missing and certain path components ("." and "..") have a special meaning when, and only when, interpreting a relative path. The relative URI syntax is defined in Section 5.4.1. Fragment Identifier When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch ("#") character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI. fragment = *uric The semantics of a fragment identifier is a property of the data resulting from a retrieval action, regardless of the type of URI used in the reference. Therefore, the format and interpretation of fragment identifiers is dependent on the media type [RFC2046] of the retrieval result. The character restrictions described in Section 2Berners-Lee, et. al. Standards Track [Page 15]RFC 2396 URI Generic Syntax August 1998 for URI also apply to the fragment in a URI-reference. Individual media types may define additional restrictions or structure within the fragment for specifying different types of "partial views" that can be identified within that media type. A fragment identifier is only meaningful when a URI reference is intended for retrieval and the result of that retrieval is a document for which the identified fragment is consistently defined.4.2. Same-document References A URI reference that does not contain a URI is a reference to the current document. In other words, an empty URI reference within a document is interpreted as a reference to the start of that document, and a reference containing only a fragment identifier is a reference to the identified fragment of that document. Traversal of such a reference should not result in an additional retrieval action. However, if the URI reference occurs in a context that is always intended to result in a new request, as in the case of HTML's FORM element, then an empty URI reference represents the base URI of the current document and should be replaced by that URI when transformed into a request.4.3. Parsing a URI Reference A URI reference is typically parsed according to the four main components and fragment identifier in order to determine what components are present and whether the reference is relative or absolute. The individual components are then parsed for their subparts and, if not opaque, to verify their validity. Although the BNF defines what is allowed in each component, it is ambiguous in terms of differentiating between an authority component and a path component that begins with two slash characters. The greedy algorithm is used for disambiguation: the left-most matching rule soaks up as much of the URI reference string as it is capable of matching. In other words, the authority component wins. Readers familiar with regular expressions should see Appendix B for a concrete parsing example and test oracle.5. Relative URI References It is often the case that a group or "tree" of documents has been constructed to serve a common purpose; the vast majority of URI in these documents point to resources within the tree rather thanBerners-Lee, et. al. Standards Track [Page 16]RFC 2396 URI Generic Syntax August 1998 outside of it. Similarly, documents located at a particular site are much more likely to refer to other resources at that site than to resources at remote sites. Relative addressing of URI allows document trees to be partially independent of their location and access scheme. For instance, it is possible for a single set of hypertext documents to be simultaneously accessible and traversable via each of the "file", "http", and "ftp" schemes if the documents refer to each other using relative URI. Furthermore, such document trees can be moved, as a whole, without changing any of the relative references. Experience within the WWW has demonstrated that the ability to perform relative referencing is necessary for the long-term usability of embedded URI. The syntax for relative URI takes advantage of the <hier_part> syntax of <absoluteURI> (Section 3) in order to express a reference that is relative to the namespace of another hierarchical URI. relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ] A relative reference beginning with two slash characters is termed a network-path reference, as defined by <net_path> in Section 3. Such references are rarely used. A relative reference beginning with a single slash character is termed an absolute-path reference, as defined by <abs_path> in Section 3. A relative reference that does not begin with a scheme name or a slash character is termed a relative-path reference. rel_path = rel_segment [ abs_path ] rel_segment = 1*( unreserved | escaped | ";" | "@" | "&" | "=" | "+" | "$" | "," ) Within a relative-path reference, the complete path segments "." and ".." have special meanings: "the current hierarchy level" and "the level above this hierarchy level", respectively. Although this is very similar to their use within Unix-based filesystems to indicate directory levels, these path components are only considered special when resolving a relative-path reference to its absolute form (Section 5.2). Authors should be aware that a path segment which contains a colon character cannot be used as the first segment of a relative URI path (e.g., "this:that"), because it would be mistaken for a scheme name.Berners-Lee, et. al. Standards Track [Page 17]RFC 2396 URI Generic Syntax August 1998 It is therefore necessary to precede such segments with other segments (e.g., "./this:that") in order for them to be referenced as a relative path. It is not necessary for all URI within a given scheme to be restricted to the <hier_part> syntax, since the hierarchical properties of that syntax are only necessary when relative URI are used within a particular document. Documents can only make use of relative URI when their base URI fits within the <hier_part> syntax. It is assumed that any document which contains a relative reference will also have a base URI that obeys the syntax. In other words, relative URI cannot be used within a document that has an unsuitable base URI. Some URI schemes do not allow a hierarchical syntax matching the <hier_part> syntax, and thus cannot use relative references.5.1. Establishing a Base URI The term "relative URI" implies that there exists some absolute "base URI" against which the relative reference is applied. Indeed, the base URI is necessary to define the semantics of any relative URI reference; without it, a relative reference is meaningless. In order for relative URI to be usable within a document, the base URI of that document must be known to the parser.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -