📄 rfc1630.txt
字号:
Berners-Lee [Page 7]
RFC 1630 URIs in WWW June 1994
Encoding reserved characters
When a system uses a local addressing scheme, it is useful to provide
a mapping from local addresses into URIs so that references to
objects within the addressing scheme may be referred to globally, and
possibly accessed through gateway servers.
For a new naming scheme, any mapping scheme may be defined provided
it is unambiguous, reversible, and provides valid URIs. It is
recommended that where hierarchical aspects to the local naming
scheme exist, they be mapped onto the hierarchical URL path syntax in
order to allow the partial form to be used.
It is also recommended that the conventional scheme below be used in
all cases except for any scheme which encodes binary data as opposed
to text, in which case a more compact encoding such as pure
hexadecimal or base 64 might be more appropriate. For example, the
conventional URI encoding method is used for mapping WAIS, FTP,
Prospero and Gopher addresses in the URI specification.
CONVENTIONAL URI ENCODING SCHEME
Where the local naming scheme uses ASCII characters which are not
allowed in the URI, these may be represented in the URL by a
percent sign "%" immediately followed by two hexadecimal digits
(0-9, A-F) giving the ISO Latin 1 code for that character.
Character codes other than those allowed by the syntax shall not
be used unencoded in a URI.
REDUCED OR INCREASED SAFE CHARACTER SETS
The same encoding method may be used for encoding characters whose
use, although technically allowed in a URI, would be unwise due to
problems of corruption by imperfect gateways or misrepresentation
due to the use of variant character sets, or which would simply be
awkward in a given environment. Because a % sign always indicates
an encoded character, a URI may be made "safer" simply by encoding
any characters considered unsafe, while leaving already encoded
characters still encoded. Similarly, in cases where a larger set
of characters is acceptable, % signs can be selectively and
reversibly expanded.
Before two URIs can be compared, it is therefore necessary to
bring them to the same encoding level.
However, the reserved characters mentioned above have a quite
different significance when encoded, and so may NEVER be encoded
and unencoded in this way.
Berners-Lee [Page 8]
RFC 1630 URIs in WWW June 1994
The percent sign intended as such must always be encoded, as its
presence otherwise always indicates an encoding. Sequences which
start with a percent sign but are not followed by two hexadecimal
characters are reserved for future extension. (See Example 3.)
Example 1
The URIs
http://info.cern.ch/albert/bertram/marie-claude
and
http://info.cern.ch/albert/bertram/marie%2Dclaude
are identical, as the %2D encodes a hyphen character.
Example 2
The URIs
http://info.cern.ch/albert/bertram/marie-claude
and
http://info.cern.ch/albert/bertram%2Fmarie-claude
are NOT identical, as in the second case the encoded slash does not
have hierarchical significance.
Example 3
The URIs
fxqn:/us/va/reston/cnri/ietf/24/asdf%*.fred
and
news:12345667123%asdghfh@info.cern.ch
are illegal, as all % characters imply encodings, and there is no
decoding defined for "%*" or "%as" in this recommendation.
Partial (relative) form
Within a object whose URI is well defined, the URI of another object
may be given in abbreviated form, where parts of the two URIs are the
same. This allows objects within a group to refer to each other
Berners-Lee [Page 9]
RFC 1630 URIs in WWW June 1994
without requiring the space for a complete reference, and it
incidentally allows the group of objects to be moved without changing
any references. It must be emphasized that when a reference is
passed in anything other than a well controlled context, the full
form must always be used.
In the World-Wide Web applications, the context URI is that of the
document or object containing a reference. In this case partial URIs
can be generated in virtual objects or stored in real objects,
without the need for dramatic change if the higher-order parts of a
hierarchical naming system are modified. Apart from terseness, this
gives greater robustness to practical systems, by enabling
information hiding between system components.
The partial form relies on a property of the URI syntax that certain
characters ("/") and certain path elements ("..", ".") have a
significance reserved for representing a hierarchical space, and must
be recognized as such by both clients and servers.
A partial form can be distinguished from an absolute form in that the
latter must have a colon and that colon must occur before any slash
characters. Systems not requiring partial forms should not use any
unencoded slashes in their naming schemes. If they do, absolute URIs
will still work, but confusion may result. (See note on Gopher
below.)
The rules for the use of a partial name relative to the URI of the
context are:
If the scheme parts are different, the whole absolute URI must
be given. Otherwise, the scheme is omitted, and:
If the partial URI starts with a non-zero number of consecutive
slashes, then everything from the context URI up to (but not
including) the first occurrence of exactly the same number of
consecutive slashes which has no greater number of consecutive
slashes anywhere to the right of it is taken to be the same and
so prepended to the partial URL to form the full URL. Otherwise:
The last part of the path of the context URI (anything following
the rightmost slash) is removed, and the given partial URI
appended in its place, and then:
Within the result, all occurrences of "xxx/../" or "/." are
recursively removed, where xxx, ".." and "." are complete path
elements.
Berners-Lee [Page 10]
RFC 1630 URIs in WWW June 1994
Note: Trailing slashes
If a path of the context locator ends in slash, partial URIs are
treated differently to the URI with the same path but without a
trailing slash. The trailing slash indicates a void segment of the
path.
Note: Gopher
The gopher system does not have the concept of relative URIs, and the
gopher community currently allows / as data characters in gopher URIs
without escaping them to %2F. Relative forms may not in general be
used for documents served by gopher servers. If they are used, then
WWW software assumes, normally correctly, that in fact they do have
hierarchical significance despite the specifications. The use of HTTP
rather than gopher protocol is however recommended.
Examples
In the context of URI
magic://a/b/c//d/e/f
the partial URIs would expand as follows:
g magic://a/b/c//d/e/g
/g magic://a/g
//g magic://g
../g magic://a/b/c//d/g
g:h g:h
and in the context of the URI
magic://a/b/c//d/e/
the results would be exactly the same.
Fragment-id
This represents a part of, fragment of, or a sub-function within, an
object. Its syntax and semantics are defined by the application
responsible for the object, or the specification of the content type
of the object. The only definition here is of the allowed characters
by which it may be represented in a URL.
Berners-Lee [Page 11]
RFC 1630 URIs in WWW June 1994
Specific syntaxes for representing fragments in text documents by
line and character range, or in graphics by coordinates, or in
structured documents using ladders, are suitable for standardization
but not defined here.
The fragment-id follows the URL of the whole object from which it is
separated by a hash sign (#). If the fragment-id is void, the hash
sign may be omitted: A void fragment-id with or without the hash sign
means that the URL refers to the whole object.
While this hook is allowed for identification of fragments, the
question of addressing of parts of objects, or of the grouping of
objects and relationship between continued and containing objects, is
not addressed by this document.
Fragment identifiers do NOT address the question of objects which are
different versions of a "living" object, nor of expressing the
relationships between different versions and the living object.
There is no implication that a fragment identifier refers to anything
which can be extracted as an object in its own right. It may, for
example, refer to an indivisible point within an object.
Specific Schemes
The mapping for URIs onto some existing standard and experimental
protocols is outlined in the BNF syntax definition. Notes on
particular protocols follow. These URIs are frequently referred to
as URLs, though the exact definition of the term URL is still under
discussion (March 1993). The schemes covered are:
http Hypertext Transfer Protocol (examples)
ftp File Transfer protocol
gopher Gopher protocol
mailto Electronic mail address
news Usenet news
telnet, rlogin and tn3270
Reference to interactive sessions
wais Wide Area Information Servers
file Local file access
Berners-Lee [Page 12]
RFC 1630 URIs in WWW June 1994
The following schemes are proposed as essential to the unification of
the web with electronic mail, but not currently (to the author's
knowledge) implemented:
mid Message identifiers for electronic mail
cid Content identifiers for MIME body part
The schemes for X.500, network management database, and Whois++ have
not been specified and may be the subject of further study. Schemes
for Prospero, and restricted NNTP use are not currently implemented
as far as the author is aware.
The "urn" prefix is reserved for use in encoding a Uniform Resource
Name when that has been developed by the IETF working group.
New schemes may be registered at a later time.
HTTP
The HTTP protocol specifies that the path is handled transparently by
those who handle URLs, except for the servers which de-reference
them. The path is passed by the client to the server with any
request, but is not otherwise understood by the client.
The host details are not passed on to the client when the URL is an
HTTP URL which refers to the server in question. In this case the
string sent starts with the slash which follows the host details.
However, when an HTTP server is being used as a gateway (or "proxy")
then the entire URI, whether HTTP or some other scheme, is passed on
the HTTP command line. The search part, if present, is sent as part
of the HTTP command, and may in this respect be treated as part of
the path. No fragmentid part of a WWW URI (the hash sign and
following) is sent with the request. Spaces and control characters
in URLs must be escaped for transmission in HTTP, as must other
disallowed characters.
EXAMPLES
These examples are not part of the specification: they are
provided as illustations only. The URI of the "welcome" page to a
server is conventionally
http://www.my.work.com/
As the rest of the URL (after the hostname an port) is opaque
to the client, it shows great variety but the following are all
fairly typical.
Berners-Lee [Page 13]
RFC 1630 URIs in WWW June 1994
http://www.my.uni.edu/info/matriculation/enroling.html
http://info.my.org/AboutUs/Phonebook
http://www.library.my.town.va.us/Catalogue/76523471236%2Fwen44--4.98
http://www.my.org/462F4F2D4241522A314159265358979323846
A URL for a server on a different port to 80 looks like
http://info.cern.ch:8000/imaginary/test
A reference to a particular part of a document may, including the
fragment identifier, look like
http://www.myu.edu/org/admin/people#andy
in which case the string "#andy" is not sent to the server, but is
retained by the client and used when the whole object had been
retrieved.
A search on a text database might look like
http://info.my.org/AboutUs/Index/Phonebook?dobbins
and on another database
http://info.cern.ch/RDB/EMP?*%20where%20name%%3Ddobbins
In all cases the client passes the path string to the server
uninterpreted, and for the client to deduce anything from
FTP
The ftp: prefix indicates that the FTP protocol is used, as defined
in STD 9, RFC 959 or any successor. The port number, if present,
gives the port of the FTP server if not the FTP default.
User name and password
The syntax allows for the inclusion of a user name and even a
password for those systems which do not use the anonymous FTP
convention. The default, however, if no user or password is
supplied, will be to use that convention, viz. that the user name
is "anonymous" and the password the user's Internet-style mail
address.
Berners-Lee [Page 14]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -