rfc2396.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 1,584 行 · 第 1/5 页
TXT
1,584 行
Berners-Lee, et. al. Standards Track [Page 17]
RFC 2396 URI Generic Syntax August 1998
It is therefore necessary to precede such segments with other
segments (e.g., "./this:that") in order for them to be referenced as
a relative path.
It is not necessary for all URI within a given scheme to be
restricted to the <hier_part> syntax, since the hierarchical
properties of that syntax are only necessary when relative URI are
used within a particular document. Documents can only make use of
relative URI when their base URI fits within the <hier_part> syntax.
It is assumed that any document which contains a relative reference
will also have a base URI that obeys the syntax. In other words,
relative URI cannot be used within a document that has an unsuitable
base URI.
Some URI schemes do not allow a hierarchical syntax matching the
<hier_part> syntax, and thus cannot use relative references.
5.1. Establishing a Base URI
The term "relative URI" implies that there exists some absolute "base
URI" against which the relative reference is applied. Indeed, the
base URI is necessary to define the semantics of any relative URI
reference; without it, a relative reference is meaningless. In order
for relative URI to be usable within a document, the base URI of that
document must be known to the parser.
The base URI of a document can be established in one of four ways,
listed below in order of precedence. The order of precedence can be
thought of in terms of layers, where the innermost defined base URI
has the highest precedence. This can be visualized graphically as:
.----------------------------------------------------------.
| .----------------------------------------------------. |
| | .----------------------------------------------. | |
| | | .----------------------------------------. | | |
| | | | .----------------------------------. | | | |
| | | | | <relative_reference> | | | | |
| | | | `----------------------------------' | | | |
| | | | (5.1.1) Base URI embedded in the | | | |
| | | | document's content | | | |
| | | `----------------------------------------' | | |
| | | (5.1.2) Base URI of the encapsulating entity | | |
| | | (message, document, or none). | | |
| | `----------------------------------------------' | |
| | (5.1.3) URI used to retrieve the entity | |
| `----------------------------------------------------' |
| (5.1.4) Default Base URI is application-dependent |
`----------------------------------------------------------'
Berners-Lee, et. al. Standards Track [Page 18]
RFC 2396 URI Generic Syntax August 1998
5.1.1. Base URI within Document Content
Within certain document media types, the base URI of the document can
be embedded within the content itself such that it can be readily
obtained by a parser. This can be useful for descriptive documents,
such as tables of content, which may be transmitted to others through
protocols other than their usual retrieval context (e.g., E-Mail or
USENET news).
It is beyond the scope of this document to specify how, for each
media type, the base URI can be embedded. It is assumed that user
agents manipulating such media types will be able to obtain the
appropriate syntax from that media type's specification. An example
of how the base URI can be embedded in the Hypertext Markup Language
(HTML) [RFC1866] is provided in Appendix D.
A mechanism for embedding the base URI within MIME container types
(e.g., the message and multipart types) is defined by MHTML
[RFC2110]. Protocols that do not use the MIME message header syntax,
but which do allow some form of tagged metainformation to be included
within messages, may define their own syntax for defining the base
URI as part of a message.
5.1.2. Base URI from the Encapsulating Entity
If no base URI is embedded, the base URI of a document is defined by
the document's retrieval context. For a document that is enclosed
within another entity (such as a message or another document), the
retrieval context is that entity; thus, the default base URI of the
document is the base URI of the entity in which the document is
encapsulated.
5.1.3. Base URI from the Retrieval URI
If no base URI is embedded and the document is not encapsulated
within some other entity (e.g., the top level of a composite entity),
then, if a URI was used to retrieve the base document, that URI shall
be considered the base URI. Note that if the retrieval was the
result of a redirected request, the last URI used (i.e., that which
resulted in the actual retrieval of the document) is the base URI.
5.1.4. Default Base URI
If none of the conditions described in Sections 5.1.1--5.1.3 apply,
then the base URI is defined by the context of the application.
Since this definition is necessarily application-dependent, failing
Berners-Lee, et. al. Standards Track [Page 19]
RFC 2396 URI Generic Syntax August 1998
to define the base URI using one of the other methods may result in
the same content being interpreted differently by different types of
application.
It is the responsibility of the distributor(s) of a document
containing relative URI to ensure that the base URI for that document
can be established. It must be emphasized that relative URI cannot
be used reliably in situations where the document's base URI is not
well-defined.
5.2. Resolving Relative References to Absolute Form
This section describes an example algorithm for resolving URI
references that might be relative to a given base URI.
The base URI is established according to the rules of Section 5.1 and
parsed into the four main components as described in Section 3. Note
that only the scheme component is required to be present in the base
URI; the other components may be empty or undefined. A component is
undefined if its preceding separator does not appear in the URI
reference; the path component is never undefined, though it may be
empty. The base URI's query component is not used by the resolution
algorithm and may be discarded.
For each URI reference, the following steps are performed in order:
1) The URI reference is parsed into the potential four components and
fragment identifier, as described in Section 4.3.
2) If the path component is empty and the scheme, authority, and
query components are undefined, then it is a reference to the
current document and we are done. Otherwise, the reference URI's
query and fragment components are defined as found (or not found)
within the URI reference and not inherited from the base URI.
3) If the scheme component is defined, indicating that the reference
starts with a scheme name, then the reference is interpreted as an
absolute URI and we are done. Otherwise, the reference URI's
scheme is inherited from the base URI's scheme component.
Due to a loophole in prior specifications [RFC1630], some parsers
allow the scheme name to be present in a relative URI if it is the
same as the base URI scheme. Unfortunately, this can conflict
with the correct parsing of non-hierarchical URI. For backwards
compatibility, an implementation may work around such references
by removing the scheme if it matches that of the base URI and the
scheme is known to always use the <hier_part> syntax. The parser
Berners-Lee, et. al. Standards Track [Page 20]
RFC 2396 URI Generic Syntax August 1998
can then continue with the steps below for the remainder of the
reference components. Validating parsers should mark such a
misformed relative reference as an error.
4) If the authority component is defined, then the reference is a
network-path and we skip to step 7. Otherwise, the reference
URI's authority is inherited from the base URI's authority
component, which will also be undefined if the URI scheme does not
use an authority component.
5) If the path component begins with a slash character ("/"), then
the reference is an absolute-path and we skip to step 7.
6) If this step is reached, then we are resolving a relative-path
reference. The relative path needs to be merged with the base
URI's path. Although there are many ways to do this, we will
describe a simple method using a separate string buffer.
a) All but the last segment of the base URI's path component is
copied to the buffer. In other words, any characters after the
last (right-most) slash character, if any, are excluded.
b) The reference's path component is appended to the buffer
string.
c) All occurrences of "./", where "." is a complete path segment,
are removed from the buffer string.
d) If the buffer string ends with "." as a complete path segment,
that "." is removed.
e) All occurrences of "<segment>/../", where <segment> is a
complete path segment not equal to "..", are removed from the
buffer string. Removal of these path segments is performed
iteratively, removing the leftmost matching pattern on each
iteration, until no matching pattern remains.
f) If the buffer string ends with "<segment>/..", where <segment>
is a complete path segment not equal to "..", that
"<segment>/.." is removed.
g) If the resulting buffer string still begins with one or more
complete path segments of "..", then the reference is
considered to be in error. Implementations may handle this
error by retaining these components in the resolved path (i.e.,
treating them as part of the final URI), by removing them from
the resolved path (i.e., discarding relative levels above the
root), or by avoiding traversal of the reference.
Berners-Lee, et. al. Standards Track [Page 21]
RFC 2396 URI Generic Syntax August 1998
h) The remaining buffer string is the reference URI's new path
component.
7) The resulting URI components, including any inherited from the
base URI, are recombined to give the absolute form of the URI
reference. Using pseudocode, this would be
result = ""
if scheme is defined then
append scheme to result
append ":" to result
if authority is defined then
append "//" to result
append authority to result
append path to result
if query is defined then
append "?" to result
append query to result
if fragment is defined then
append "#" to result
append fragment to result
return result
Note that we must be careful to preserve the distinction between a
component that is undefined, meaning that its separator was not
present in the reference, and a component that is empty, meaning
that the separator was present and was immediately followed by the
next component separator or the end of the reference.
The above algorithm is intended to provide an example by which the
output of implementations can be tested -- implementation of the
algorithm itself is not required. For example, some systems may find
it more efficient to implement step 6 as a pair of segment stacks
being merged, rather than as a series of string pattern replacements.
Note: Some WWW client applications will fail to separate the
reference's query component from its path component before merging
the base and reference paths in step 6 above. This may result in
a loss of information if the query component contains the strings
"/../" or "/./".
Resolution examples are provided in Appendix C.
Berners-Lee, et. al. Standards Track [Page 22]
RFC 2396 URI Generic Syntax August 1998
6. URI Normalization and Equivalence
In many cases, different URI strings may actually identify the
identical resource. For example, the host names used in URL are
actually case insensitive, and the URL <http://www.XEROX.com> is
equivalent to <http://www.xerox.com>. In general, the rules for
equivalence and definition of a normal form, if any, are scheme
dependent. When a scheme uses elements of the common syntax, it will
also use the common syntax equivalence rules, namely that the scheme
and hostname are case insensitive and a URL with an explicit ":port",
where the port is the default for the scheme, is equivalent to one
where the port is elided.
7. Security Considerations
A URI does not in itself pose a security threat. Users should beware
that there is no general guarantee that a URL, which at one time
located a given resource, will continue to do so. Nor is there any
guarantee that a URL will not locate a different resource at some
later point in time, due to the lack of any constraint on how a given
authority apportions its namespace. Such a guarantee can only be
obtained from the person(s) controlling that namespace and the
resource in question. A specific URI scheme may include additional
semantics, such as name persistence, if those semantics are required
of all naming authorities for that scheme.
It is sometimes possible to construct a URL such that an attempt to
perform a seemingly harmless, idempotent operation, such as the
retrieval of an entity associated with the resource, will in fact
cause a possibly damaging remote operation to occur. The unsafe URL
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?