rfc2396.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 1,584 行 · 第 1/5 页
TXT
1,584 行
3.1. Scheme Component
Just as there are many different methods of access to resources,
there are a variety of schemes for identifying such resources. The
URI syntax consists of a sequence of components separated by reserved
characters, with the first component defining the semantics for the
remainder of the URI string.
Scheme names consist of a sequence of characters beginning with a
lower case letter and followed by any combination of lower case
letters, digits, plus ("+"), period ("."), or hyphen ("-"). For
resiliency, programs interpreting URI should treat upper case letters
as equivalent to lower case in scheme names (e.g., allow "HTTP" as
well as "http").
scheme = alpha *( alpha | digit | "+" | "-" | "." )
Relative URI references are distinguished from absolute URI in that
they do not begin with a scheme name. Instead, the scheme is
inherited from the base URI, as described in Section 5.2.
3.2. Authority Component
Many URI schemes include a top hierarchical element for a naming
authority, such that the namespace defined by the remainder of the
URI is governed by that authority. This authority component is
typically defined by an Internet-based server or a scheme-specific
registry of naming authorities.
authority = server | reg_name
The authority component is preceded by a double slash "//" and is
terminated by the next slash "/", question-mark "?", or by the end of
the URI. Within the authority component, the characters ";", ":",
"@", "?", and "/" are reserved.
Berners-Lee, et. al. Standards Track [Page 12]
RFC 2396 URI Generic Syntax August 1998
An authority component is not required for a URI scheme to make use
of relative references. A base URI without an authority component
implies that any relative reference will also be without an authority
component.
3.2.1. Registry-based Naming Authority
The structure of a registry-based naming authority is specific to the
URI scheme, but constrained to the allowed characters for an
authority component.
reg_name = 1*( unreserved | escaped | "$" | "," |
";" | ":" | "@" | "&" | "=" | "+" )
3.2.2. Server-based Naming Authority
URL schemes that involve the direct use of an IP-based protocol to a
specified server on the Internet use a common syntax for the server
component of the URI's scheme-specific data:
<userinfo>@<host>:<port>
where <userinfo> may consist of a user name and, optionally, scheme-
specific information about how to gain authorization to access the
server. The parts "<userinfo>@" and ":<port>" may be omitted.
server = [ [ userinfo "@" ] hostport ]
The user information, if present, is followed by a commercial at-sign
"@".
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | "+" | "$" | "," )
Some URL schemes use the format "user:password" in the userinfo
field. This practice is NOT RECOMMENDED, because the passing of
authentication information in clear text (such as URI) has proven to
be a security risk in almost every case where it has been used.
The host is a domain name of a network host, or its IPv4 address as a
set of four decimal digit groups separated by ".". Literal IPv6
addresses are not supported.
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
Berners-Lee, et. al. Standards Track [Page 13]
RFC 2396 URI Generic Syntax August 1998
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
Hostnames take the form described in Section 3 of [RFC1034] and
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
".", each domain label starting and ending with an alphanumeric
character and possibly also containing "-" characters. The rightmost
domain label of a fully qualified domain name will never start with a
digit, thus syntactically distinguishing domain names from IPv4
addresses, and may be followed by a single "." if it is necessary to
distinguish between the complete domain name and any local domain.
To actually be "Uniform" as a resource locator, a URL hostname should
be a fully qualified domain name. In practice, however, the host
component may be a local domain literal.
Note: A suitable representation for including a literal IPv6
address as the host part of a URL is desired, but has not yet been
determined or implemented in practice.
The port is the network port number for the server. Most schemes
designate protocols that have a default port number. Another port
number may optionally be supplied, in decimal, separated from the
host by a colon. If the port is omitted, the default port number is
assumed.
3.3. Path Component
The path component contains data, specific to the authority (or the
scheme if there is no authority component), identifying the resource
within the scope of that scheme and authority.
path = [ abs_path | opaque_part ]
path_segments = segment *( "/" segment )
segment = *pchar *( ";" param )
param = *pchar
pchar = unreserved | escaped |
":" | "@" | "&" | "=" | "+" | "$" | ","
The path may consist of a sequence of path segments separated by a
single slash "/" character. Within a path segment, the characters
"/", ";", "=", and "?" are reserved. Each path segment may include a
sequence of parameters, indicated by the semicolon ";" character.
The parameters are not significant to the parsing of relative
references.
Berners-Lee, et. al. Standards Track [Page 14]
RFC 2396 URI Generic Syntax August 1998
3.4. Query Component
The query component is a string of information to be interpreted by
the resource.
query = *uric
Within a query component, the characters ";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$" are reserved.
4. URI References
The term "URI-reference" is used here to denote the common usage of a
resource identifier. A URI reference may be absolute or relative,
and may have additional information attached in the form of a
fragment identifier. However, "the URI" that results from such a
reference includes only the absolute URI after the fragment
identifier (if any) is removed and after any relative URI is resolved
to its absolute form. Although it is possible to limit the
discussion of URI syntax and semantics to that of the absolute
result, most usage of URI is within general URI references, and it is
impossible to obtain the URI from such a reference without also
parsing the fragment and resolving the relative form.
URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]
The syntax for relative URI is a shortened form of that for absolute
URI, where some prefix of the URI is missing and certain path
components ("." and "..") have a special meaning when, and only when,
interpreting a relative path. The relative URI syntax is defined in
Section 5.
4.1. Fragment Identifier
When a URI reference is used to perform a retrieval action on the
identified resource, the optional fragment identifier, separated from
the URI by a crosshatch ("#") character, consists of additional
reference information to be interpreted by the user agent after the
retrieval action has been successfully completed. As such, it is not
part of a URI, but is often used in conjunction with a URI.
fragment = *uric
The semantics of a fragment identifier is a property of the data
resulting from a retrieval action, regardless of the type of URI used
in the reference. Therefore, the format and interpretation of
fragment identifiers is dependent on the media type [RFC2046] of the
retrieval result. The character restrictions described in Section 2
Berners-Lee, et. al. Standards Track [Page 15]
RFC 2396 URI Generic Syntax August 1998
for URI also apply to the fragment in a URI-reference. Individual
media types may define additional restrictions or structure within
the fragment for specifying different types of "partial views" that
can be identified within that media type.
A fragment identifier is only meaningful when a URI reference is
intended for retrieval and the result of that retrieval is a document
for which the identified fragment is consistently defined.
4.2. Same-document References
A URI reference that does not contain a URI is a reference to the
current document. In other words, an empty URI reference within a
document is interpreted as a reference to the start of that document,
and a reference containing only a fragment identifier is a reference
to the identified fragment of that document. Traversal of such a
reference should not result in an additional retrieval action.
However, if the URI reference occurs in a context that is always
intended to result in a new request, as in the case of HTML's FORM
element, then an empty URI reference represents the base URI of the
current document and should be replaced by that URI when transformed
into a request.
4.3. Parsing a URI Reference
A URI reference is typically parsed according to the four main
components and fragment identifier in order to determine what
components are present and whether the reference is relative or
absolute. The individual components are then parsed for their
subparts and, if not opaque, to verify their validity.
Although the BNF defines what is allowed in each component, it is
ambiguous in terms of differentiating between an authority component
and a path component that begins with two slash characters. The
greedy algorithm is used for disambiguation: the left-most matching
rule soaks up as much of the URI reference string as it is capable of
matching. In other words, the authority component wins.
Readers familiar with regular expressions should see Appendix B for a
concrete parsing example and test oracle.
5. Relative URI References
It is often the case that a group or "tree" of documents has been
constructed to serve a common purpose; the vast majority of URI in
these documents point to resources within the tree rather than
Berners-Lee, et. al. Standards Track [Page 16]
RFC 2396 URI Generic Syntax August 1998
outside of it. Similarly, documents located at a particular site are
much more likely to refer to other resources at that site than to
resources at remote sites.
Relative addressing of URI allows document trees to be partially
independent of their location and access scheme. For instance, it is
possible for a single set of hypertext documents to be simultaneously
accessible and traversable via each of the "file", "http", and "ftp"
schemes if the documents refer to each other using relative URI.
Furthermore, such document trees can be moved, as a whole, without
changing any of the relative references. Experience within the WWW
has demonstrated that the ability to perform relative referencing is
necessary for the long-term usability of embedded URI.
The syntax for relative URI takes advantage of the <hier_part> syntax
of <absoluteURI> (Section 3) in order to express a reference that is
relative to the namespace of another hierarchical URI.
relativeURI = ( net_path | abs_path | rel_path ) [ "?" query ]
A relative reference beginning with two slash characters is termed a
network-path reference, as defined by <net_path> in Section 3. Such
references are rarely used.
A relative reference beginning with a single slash character is
termed an absolute-path reference, as defined by <abs_path> in
Section 3.
A relative reference that does not begin with a scheme name or a
slash character is termed a relative-path reference.
rel_path = rel_segment [ abs_path ]
rel_segment = 1*( unreserved | escaped |
";" | "@" | "&" | "=" | "+" | "$" | "," )
Within a relative-path reference, the complete path segments "." and
".." have special meanings: "the current hierarchy level" and "the
level above this hierarchy level", respectively. Although this is
very similar to their use within Unix-based filesystems to indicate
directory levels, these path components are only considered special
when resolving a relative-path reference to its absolute form
(Section 5.2).
Authors should be aware that a path segment which contains a colon
character cannot be used as the first segment of a relative URI path
(e.g., "this:that"), because it would be mistaken for a scheme name.
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?