rfc2396.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 1,584 行 · 第 1/5 页

TXT
1,584
字号


Berners-Lee, et. al.        Standards Track                    [Page 17]

RFC 2396                   URI Generic Syntax                August 1998


   It is therefore necessary to precede such segments with other
   segments (e.g., "./this:that") in order for them to be referenced as
   a relative path.

   It is not necessary for all URI within a given scheme to be
   restricted to the <hier_part> syntax, since the hierarchical
   properties of that syntax are only necessary when relative URI are
   used within a particular document.  Documents can only make use of
   relative URI when their base URI fits within the <hier_part> syntax.
   It is assumed that any document which contains a relative reference
   will also have a base URI that obeys the syntax.  In other words,
   relative URI cannot be used within a document that has an unsuitable
   base URI.

   Some URI schemes do not allow a hierarchical syntax matching the
   <hier_part> syntax, and thus cannot use relative references.

5.1. Establishing a Base URI

   The term "relative URI" implies that there exists some absolute "base
   URI" against which the relative reference is applied.  Indeed, the
   base URI is necessary to define the semantics of any relative URI
   reference; without it, a relative reference is meaningless.  In order
   for relative URI to be usable within a document, the base URI of that
   document must be known to the parser.

   The base URI of a document can be established in one of four ways,
   listed below in order of precedence.  The order of precedence can be
   thought of in terms of layers, where the innermost defined base URI
   has the highest precedence.  This can be visualized graphically as:

      .----------------------------------------------------------.
      |  .----------------------------------------------------.  |
      |  |  .----------------------------------------------.  |  |
      |  |  |  .----------------------------------------.  |  |  |
      |  |  |  |  .----------------------------------.  |  |  |  |
      |  |  |  |  |       <relative_reference>       |  |  |  |  |
      |  |  |  |  `----------------------------------'  |  |  |  |
      |  |  |  | (5.1.1) Base URI embedded in the       |  |  |  |
      |  |  |  |         document's content             |  |  |  |
      |  |  |  `----------------------------------------'  |  |  |
      |  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
      |  |  |         (message, document, or none).        |  |  |
      |  |  `----------------------------------------------'  |  |
      |  | (5.1.3) URI used to retrieve the entity            |  |
      |  `----------------------------------------------------'  |
      | (5.1.4) Default Base URI is application-dependent        |
      `----------------------------------------------------------'



Berners-Lee, et. al.        Standards Track                    [Page 18]

RFC 2396                   URI Generic Syntax                August 1998


5.1.1. Base URI within Document Content

   Within certain document media types, the base URI of the document can
   be embedded within the content itself such that it can be readily
   obtained by a parser.  This can be useful for descriptive documents,
   such as tables of content, which may be transmitted to others through
   protocols other than their usual retrieval context (e.g., E-Mail or
   USENET news).

   It is beyond the scope of this document to specify how, for each
   media type, the base URI can be embedded.  It is assumed that user
   agents manipulating such media types will be able to obtain the
   appropriate syntax from that media type's specification.  An example
   of how the base URI can be embedded in the Hypertext Markup Language
   (HTML) [RFC1866] is provided in Appendix D.

   A mechanism for embedding the base URI within MIME container types
   (e.g., the message and multipart types) is defined by MHTML
   [RFC2110].  Protocols that do not use the MIME message header syntax,
   but which do allow some form of tagged metainformation to be included
   within messages, may define their own syntax for defining the base
   URI as part of a message.

5.1.2. Base URI from the Encapsulating Entity

   If no base URI is embedded, the base URI of a document is defined by
   the document's retrieval context.  For a document that is enclosed
   within another entity (such as a message or another document), the
   retrieval context is that entity; thus, the default base URI of the
   document is the base URI of the entity in which the document is
   encapsulated.

5.1.3. Base URI from the Retrieval URI

   If no base URI is embedded and the document is not encapsulated
   within some other entity (e.g., the top level of a composite entity),
   then, if a URI was used to retrieve the base document, that URI shall
   be considered the base URI.  Note that if the retrieval was the
   result of a redirected request, the last URI used (i.e., that which
   resulted in the actual retrieval of the document) is the base URI.

5.1.4. Default Base URI

   If none of the conditions described in Sections 5.1.1--5.1.3 apply,
   then the base URI is defined by the context of the application.
   Since this definition is necessarily application-dependent, failing





Berners-Lee, et. al.        Standards Track                    [Page 19]

RFC 2396                   URI Generic Syntax                August 1998


   to define the base URI using one of the other methods may result in
   the same content being interpreted differently by different types of
   application.

   It is the responsibility of the distributor(s) of a document
   containing relative URI to ensure that the base URI for that document
   can be established.  It must be emphasized that relative URI cannot
   be used reliably in situations where the document's base URI is not
   well-defined.

5.2. Resolving Relative References to Absolute Form

   This section describes an example algorithm for resolving URI
   references that might be relative to a given base URI.

   The base URI is established according to the rules of Section 5.1 and
   parsed into the four main components as described in Section 3.  Note
   that only the scheme component is required to be present in the base
   URI; the other components may be empty or undefined.  A component is
   undefined if its preceding separator does not appear in the URI
   reference; the path component is never undefined, though it may be
   empty.  The base URI's query component is not used by the resolution
   algorithm and may be discarded.

   For each URI reference, the following steps are performed in order:

   1) The URI reference is parsed into the potential four components and
      fragment identifier, as described in Section 4.3.

   2) If the path component is empty and the scheme, authority, and
      query components are undefined, then it is a reference to the
      current document and we are done.  Otherwise, the reference URI's
      query and fragment components are defined as found (or not found)
      within the URI reference and not inherited from the base URI.

   3) If the scheme component is defined, indicating that the reference
      starts with a scheme name, then the reference is interpreted as an
      absolute URI and we are done.  Otherwise, the reference URI's
      scheme is inherited from the base URI's scheme component.

      Due to a loophole in prior specifications [RFC1630], some parsers
      allow the scheme name to be present in a relative URI if it is the
      same as the base URI scheme.  Unfortunately, this can conflict
      with the correct parsing of non-hierarchical URI.  For backwards
      compatibility, an implementation may work around such references
      by removing the scheme if it matches that of the base URI and the
      scheme is known to always use the <hier_part> syntax.  The parser




Berners-Lee, et. al.        Standards Track                    [Page 20]

RFC 2396                   URI Generic Syntax                August 1998


      can then continue with the steps below for the remainder of the
      reference components.  Validating parsers should mark such a
      misformed relative reference as an error.

   4) If the authority component is defined, then the reference is a
      network-path and we skip to step 7.  Otherwise, the reference
      URI's authority is inherited from the base URI's authority
      component, which will also be undefined if the URI scheme does not
      use an authority component.

   5) If the path component begins with a slash character ("/"), then
      the reference is an absolute-path and we skip to step 7.

   6) If this step is reached, then we are resolving a relative-path
      reference.  The relative path needs to be merged with the base
      URI's path.  Although there are many ways to do this, we will
      describe a simple method using a separate string buffer.

      a) All but the last segment of the base URI's path component is
         copied to the buffer.  In other words, any characters after the
         last (right-most) slash character, if any, are excluded.

      b) The reference's path component is appended to the buffer
         string.

      c) All occurrences of "./", where "." is a complete path segment,
         are removed from the buffer string.

      d) If the buffer string ends with "." as a complete path segment,
         that "." is removed.

      e) All occurrences of "<segment>/../", where <segment> is a
         complete path segment not equal to "..", are removed from the
         buffer string.  Removal of these path segments is performed
         iteratively, removing the leftmost matching pattern on each
         iteration, until no matching pattern remains.

      f) If the buffer string ends with "<segment>/..", where <segment>
         is a complete path segment not equal to "..", that
         "<segment>/.." is removed.

      g) If the resulting buffer string still begins with one or more
         complete path segments of "..", then the reference is
         considered to be in error.  Implementations may handle this
         error by retaining these components in the resolved path (i.e.,
         treating them as part of the final URI), by removing them from
         the resolved path (i.e., discarding relative levels above the
         root), or by avoiding traversal of the reference.



Berners-Lee, et. al.        Standards Track                    [Page 21]

RFC 2396                   URI Generic Syntax                August 1998


      h) The remaining buffer string is the reference URI's new path
         component.

   7) The resulting URI components, including any inherited from the
      base URI, are recombined to give the absolute form of the URI
      reference.  Using pseudocode, this would be

         result = ""

         if scheme is defined then
             append scheme to result
             append ":" to result

         if authority is defined then
             append "//" to result
             append authority to result

         append path to result

         if query is defined then
             append "?" to result
             append query to result

         if fragment is defined then
             append "#" to result
             append fragment to result

         return result

      Note that we must be careful to preserve the distinction between a
      component that is undefined, meaning that its separator was not
      present in the reference, and a component that is empty, meaning
      that the separator was present and was immediately followed by the
      next component separator or the end of the reference.

   The above algorithm is intended to provide an example by which the
   output of implementations can be tested -- implementation of the
   algorithm itself is not required.  For example, some systems may find
   it more efficient to implement step 6 as a pair of segment stacks
   being merged, rather than as a series of string pattern replacements.

      Note: Some WWW client applications will fail to separate the
      reference's query component from its path component before merging
      the base and reference paths in step 6 above.  This may result in
      a loss of information if the query component contains the strings
      "/../" or "/./".

   Resolution examples are provided in Appendix C.



Berners-Lee, et. al.        Standards Track                    [Page 22]

RFC 2396                   URI Generic Syntax                August 1998


6. URI Normalization and Equivalence

   In many cases, different URI strings may actually identify the
   identical resource. For example, the host names used in URL are
   actually case insensitive, and the URL <http://www.XEROX.com> is
   equivalent to <http://www.xerox.com>. In general, the rules for
   equivalence and definition of a normal form, if any, are scheme
   dependent. When a scheme uses elements of the common syntax, it will
   also use the common syntax equivalence rules, namely that the scheme
   and hostname are case insensitive and a URL with an explicit ":port",
   where the port is the default for the scheme, is equivalent to one
   where the port is elided.

7. Security Considerations

   A URI does not in itself pose a security threat.  Users should beware
   that there is no general guarantee that a URL, which at one time
   located a given resource, will continue to do so.  Nor is there any
   guarantee that a URL will not locate a different resource at some
   later point in time, due to the lack of any constraint on how a given
   authority apportions its namespace.  Such a guarantee can only be
   obtained from the person(s) controlling that namespace and the
   resource in question.  A specific URI scheme may include additional
   semantics, such as name persistence, if those semantics are required
   of all naming authorities for that scheme.

   It is sometimes possible to construct a URL such that an attempt to
   perform a seemingly harmless, idempotent operation, such as the
   retrieval of an entity associated with the resource, will in fact
   cause a possibly damaging remote operation to occur.  The unsafe URL

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?