rfc2396.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 1,584 行 · 第 1/5 页

TXT
1,584
字号

3.1. Scheme Component

   Just as there are many different methods of access to resources,
   there are a variety of schemes for identifying such resources.  The
   URI syntax consists of a sequence of components separated by reserved
   characters, with the first component defining the semantics for the
   remainder of the URI string.

   Scheme names consist of a sequence of characters beginning with a
   lower case letter and followed by any combination of lower case
   letters, digits, plus ("+"), period ("."), or hyphen ("-").  For
   resiliency, programs interpreting URI should treat upper case letters
   as equivalent to lower case in scheme names (e.g., allow "HTTP" as
   well as "http").

      scheme        = alpha *( alpha | digit | "+" | "-" | "." )

   Relative URI references are distinguished from absolute URI in that
   they do not begin with a scheme name.  Instead, the scheme is
   inherited from the base URI, as described in Section 5.2.

3.2. Authority Component

   Many URI schemes include a top hierarchical element for a naming
   authority, such that the namespace defined by the remainder of the
   URI is governed by that authority.  This authority component is
   typically defined by an Internet-based server or a scheme-specific
   registry of naming authorities.

      authority     = server | reg_name

   The authority component is preceded by a double slash "//" and is
   terminated by the next slash "/", question-mark "?", or by the end of
   the URI.  Within the authority component, the characters ";", ":",
   "@", "?", and "/" are reserved.



Berners-Lee, et. al.        Standards Track                    [Page 12]

RFC 2396                   URI Generic Syntax                August 1998


   An authority component is not required for a URI scheme to make use
   of relative references.  A base URI without an authority component
   implies that any relative reference will also be without an authority
   component.

3.2.1. Registry-based Naming Authority

   The structure of a registry-based naming authority is specific to the
   URI scheme, but constrained to the allowed characters for an
   authority component.

      reg_name      = 1*( unreserved | escaped | "$" | "," |
                          ";" | ":" | "@" | "&" | "=" | "+" )

3.2.2. Server-based Naming Authority

   URL schemes that involve the direct use of an IP-based protocol to a
   specified server on the Internet use a common syntax for the server
   component of the URI's scheme-specific data:

      <userinfo>@<host>:<port>

   where <userinfo> may consist of a user name and, optionally, scheme-
   specific information about how to gain authorization to access the
   server.  The parts "<userinfo>@" and ":<port>" may be omitted.

      server        = [ [ userinfo "@" ] hostport ]

   The user information, if present, is followed by a commercial at-sign
   "@".

      userinfo      = *( unreserved | escaped |
                         ";" | ":" | "&" | "=" | "+" | "$" | "," )

   Some URL schemes use the format "user:password" in the userinfo
   field. This practice is NOT RECOMMENDED, because the passing of
   authentication information in clear text (such as URI) has proven to
   be a security risk in almost every case where it has been used.

   The host is a domain name of a network host, or its IPv4 address as a
   set of four decimal digit groups separated by ".".  Literal IPv6
   addresses are not supported.

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum



Berners-Lee, et. al.        Standards Track                    [Page 13]

RFC 2396                   URI Generic Syntax                August 1998


      IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
      port          = *digit

   Hostnames take the form described in Section 3 of [RFC1034] and
   Section 2.1 of [RFC1123]: a sequence of domain labels separated by
   ".", each domain label starting and ending with an alphanumeric
   character and possibly also containing "-" characters.  The rightmost
   domain label of a fully qualified domain name will never start with a
   digit, thus syntactically distinguishing domain names from IPv4
   addresses, and may be followed by a single "." if it is necessary to
   distinguish between the complete domain name and any local domain.
   To actually be "Uniform" as a resource locator, a URL hostname should
   be a fully qualified domain name.  In practice, however, the host
   component may be a local domain literal.

      Note: A suitable representation for including a literal IPv6
      address as the host part of a URL is desired, but has not yet been
      determined or implemented in practice.

   The port is the network port number for the server.  Most schemes
   designate protocols that have a default port number.  Another port
   number may optionally be supplied, in decimal, separated from the
   host by a colon.  If the port is omitted, the default port number is
   assumed.

3.3. Path Component

   The path component contains data, specific to the authority (or the
   scheme if there is no authority component), identifying the resource
   within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative
   references.





Berners-Lee, et. al.        Standards Track                    [Page 14]

RFC 2396                   URI Generic Syntax                August 1998


3.4. Query Component

   The query component is a string of information to be interpreted by
   the resource.

      query         = *uric

   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.

4. URI References

   The term "URI-reference" is used here to denote the common usage of a
   resource identifier.  A URI reference may be absolute or relative,
   and may have additional information attached in the form of a
   fragment identifier.  However, "the URI" that results from such a
   reference includes only the absolute URI after the fragment
   identifier (if any) is removed and after any relative URI is resolved
   to its absolute form.  Although it is possible to limit the
   discussion of URI syntax and semantics to that of the absolute
   result, most usage of URI is within general URI references, and it is
   impossible to obtain the URI from such a reference without also
   parsing the fragment and resolving the relative form.

      URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

   The syntax for relative URI is a shortened form of that for absolute
   URI, where some prefix of the URI is missing and certain path
   components ("." and "..") have a special meaning when, and only when,
   interpreting a relative path.  The relative URI syntax is defined in
   Section 5.

4.1. Fragment Identifier

   When a URI reference is used to perform a retrieval action on the
   identified resource, the optional fragment identifier, separated from
   the URI by a crosshatch ("#") character, consists of additional
   reference information to be interpreted by the user agent after the
   retrieval action has been successfully completed.  As such, it is not
   part of a URI, but is often used in conjunction with a URI.

      fragment      = *uric

   The semantics of a fragment identifier is a property of the data
   resulting from a retrieval action, regardless of the type of URI used
   in the reference.  Therefore, the format and interpretation of
   fragment identifiers is dependent on the media type [RFC2046] of the
   retrieval result.  The character restrictions described in Section 2



Berners-Lee, et. al.        Standards Track                    [Page 15]

RFC 2396                   URI Generic Syntax                August 1998


   for URI also apply to the fragment in a URI-reference.  Individual
   media types may define additional restrictions or structure within
   the fragment for specifying different types of "partial views" that
   can be identified within that media type.

   A fragment identifier is only meaningful when a URI reference is
   intended for retrieval and the result of that retrieval is a document
   for which the identified fragment is consistently defined.

4.2. Same-document References

   A URI reference that does not contain a URI is a reference to the
   current document.  In other words, an empty URI reference within a
   document is interpreted as a reference to the start of that document,
   and a reference containing only a fragment identifier is a reference
   to the identified fragment of that document.  Traversal of such a
   reference should not result in an additional retrieval action.
   However, if the URI reference occurs in a context that is always
   intended to result in a new request, as in the case of HTML's FORM
   element, then an empty URI reference represents the base URI of the
   current document and should be replaced by that URI when transformed
   into a request.

4.3. Parsing a URI Reference

   A URI reference is typically parsed according to the four main
   components and fragment identifier in order to determine what
   components are present and whether the reference is relative or
   absolute.  The individual components are then parsed for their
   subparts and, if not opaque, to verify their validity.

   Although the BNF defines what is allowed in each component, it is
   ambiguous in terms of differentiating between an authority component
   and a path component that begins with two slash characters.  The
   greedy algorithm is used for disambiguation: the left-most matching
   rule soaks up as much of the URI reference string as it is capable of
   matching.  In other words, the authority component wins.

   Readers familiar with regular expressions should see Appendix B for a
   concrete parsing example and test oracle.

5. Relative URI References

   It is often the case that a group or "tree" of documents has been
   constructed to serve a common purpose; the vast majority of URI in
   these documents point to resources within the tree rather than





Berners-Lee, et. al.        Standards Track                    [Page 16]

RFC 2396                   URI Generic Syntax                August 1998


   outside of it.  Similarly, documents located at a particular site are
   much more likely to refer to other resources at that site than to
   resources at remote sites.

   Relative addressing of URI allows document trees to be partially
   independent of their location and access scheme.  For instance, it is
   possible for a single set of hypertext documents to be simultaneously
   accessible and traversable via each of the "file", "http", and "ftp"
   schemes if the documents refer to each other using relative URI.
   Furthermore, such document trees can be moved, as a whole, without
   changing any of the relative references.  Experience within the WWW
   has demonstrated that the ability to perform relative referencing is
   necessary for the long-term usability of embedded URI.

   The syntax for relative URI takes advantage of the <hier_part> syntax
   of <absoluteURI> (Section 3) in order to express a reference that is
   relative to the namespace of another hierarchical URI.

      relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

   A relative reference beginning with two slash characters is termed a
   network-path reference, as defined by <net_path> in Section 3.  Such
   references are rarely used.

   A relative reference beginning with a single slash character is
   termed an absolute-path reference, as defined by <abs_path> in
   Section 3.

   A relative reference that does not begin with a scheme name or a
   slash character is termed a relative-path reference.

      rel_path      = rel_segment [ abs_path ]

      rel_segment   = 1*( unreserved | escaped |
                          ";" | "@" | "&" | "=" | "+" | "$" | "," )

   Within a relative-path reference, the complete path segments "." and
   ".." have special meanings: "the current hierarchy level" and "the
   level above this hierarchy level", respectively.  Although this is
   very similar to their use within Unix-based filesystems to indicate
   directory levels, these path components are only considered special
   when resolving a relative-path reference to its absolute form
   (Section 5.2).

   Authors should be aware that a path segment which contains a colon
   character cannot be used as the first segment of a relative URI path
   (e.g., "this:that"), because it would be mistaken for a scheme name.


⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?