📄 rfc1630.txt

📁 RFC 的详细文档！
💻 TXT
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页






Network Working Group                                     T. Berners-Lee
Request for Comments: 1630                                          CERN
Category: Informational                                        June 1994


                 Universal Resource Identifiers in WWW

                A Unifying Syntax for the Expression of
             Names and Addresses of Objects on the Network
                     as used in the World-Wide Web

Status of this Memo

   This memo provides information for the Internet community.  This memo
   does not specify an Internet standard of any kind.  Distribution of
   this memo is unlimited.

IESG Note:

   Note that the work contained in this memo does not describe an
   Internet standard.  An Internet standard for general Resource
   Identifiers is under development within the IETF.

Introduction

   This document defines the syntax used by the World-Wide Web
   initiative to encode the names and addresses of objects on the
   Internet.  The web is considered to include objects accessed using an
   extendable number of protocols, existing, invented for the web
   itself, or to be invented in the future.  Access instructions for an
   individual object under a given protocol are encoded into forms of
   address string.  Other protocols allow the use of object names of
   various forms.  In order to abstract the idea of a generic object,
   the web needs the concepts of the universal set of objects, and of
   the universal set of names or addresses of objects.

   A Universal Resource Identifier (URI) is a member of this universal
   set of names in registered name spaces and addresses referring to
   registered protocols or name spaces.  A Uniform Resource Locator
   (URL), defined elsewhere, is a form of URI which expresses an address
   which maps onto an access algorithm using network protocols. Existing
   URI schemes which correspond to the (still mutating) concept of IETF
   URLs are listed here. The Uniform Resource Name (URN) debate attempts
   to define a name space (and presumably resolution protocols) for
   persistent object names. This area is not addressed by this document,
   which is written in order to document existing practice and provide a
   reference point for URL and URN discussions.




Berners-Lee                                                     [Page 1]

RFC 1630                      URIs in WWW                      June 1994


   The world-wide web protocols are discussed on the mailing list www-
   talk-request@info.cern.ch and the newsgroup comp.infosystems.www is
   preferable for beginner's questions. The mailing list uri-
   request@bunyip.com has discussion related particularly to the URI
   issue.  The author may be contacted as timbl@info.cern.ch.

   This document is available in hypertext form at:

   http://info.cern.ch/hypertext/WWW/Addressing/URL/URI_Overview.html

The Need For a Universal Syntax

   This section describes the concept of the URI and does not form part
   of the specification.

   Many protocols and systems for document search and retrieval are
   currently in use, and many more protocols or refinements of existing
   protocols are to be expected in a field whose expansion is explosive.

   These systems are aiming to achieve global search and readership of
   documents across differing computing platforms, and despite a
   plethora of protocols and data formats.  As protocols evolve,
   gateways can allow global access to remain possible. As data formats
   evolve, format conversion programs can preserve global access.  There
   is one area, however, in which it is impractical to make conversions,
   and that is in the names and addresses used to identify objects.
   This is because names and addresses of objects are passed on in so
   many ways, from the backs of envelopes to hypertext objects, and may
   have a long life.

   A common feature of almost all the data models of past and proposed
   systems is something which can be mapped onto a concept of "object"
   and some kind of name, address, or identifier for that object.  One
   can therefore define a set of name spaces in which these objects can
   be said to exist.

   Practical systems need to access and mix objects which are part of
   different existing and proposed systems.  Therefore, the concept of
   the universal set of all objects, and hence the universal set of
   names and addresses, in all name spaces, becomes important.  This
   allows names in different spaces to be treated in a common way, even
   though names in different spaces have differing characteristics, as
   do the objects to which they refer.








Berners-Lee                                                     [Page 2]

RFC 1630                      URIs in WWW                      June 1994


   URIs

      This document defines a way to encapsulate a name in any
      registered name space, and label it with the the name space,
      producing a member of the universal set.  Such an encoded and
      labelled member of this set is known as a Universal Resource
      Identifier, or URI.

      The universal syntax allows access of objects available using
      existing protocols, and may be extended with technology.

      The specification of the URI syntax does not imply anything about
      the properties of names and addresses in the various name spaces
      which are mapped onto the set of URI strings.  The properties
      follow from the specifications of the protocols and the associated
      usage conventions for each scheme.

   URLs

      For existing Internet access protocols, it is necessary in most
      cases to define the encoding of the access algorithm into
      something concise enough to be termed address.  URIs which refer
      to objects accessed with existing protocols are known as "Uniform
      Resource Locators" (URLs) and are listed here as used in WWW, but
      to be formally defined in a separate document.

   URNs

      There is currently a drive to define a space of more persistent
      names than any URLs.  These "Uniform Resource Names" are the
      subject of an IETF working group's discussions.  (See Sollins and
      Masinter, Functional Specifications for URNs, circulated
      informally.)

      The URI syntax and URL forms have been in widespread use by
      World-Wide Web software since 1990.















Berners-Lee                                                     [Page 3]

RFC 1630                      URIs in WWW                      June 1994


Design Criteria and Choices

   This section is not part of the specification: it is simply an
   explanation of the way in which the specification was derived.

   Design criteria

      The syntax was designed to be:

      Extensible              New naming schemes may be added later.

      Complete                It is possible to encode any naming
                              scheme.

      Printable               It is possible to express any URI using
                              7-bit ASCII characters so that URIs may,
                              if necessary, be passed using pen and ink.

   Choices for a universal syntax

      For the syntax itself there is little choice except for the order
      and punctuation of the elements, and the acceptable characters and
      escaping rules.

      The extensibility requirement is met by allowing an arbitrary (but
      registered) string to be used as a prefix.  A prefix is chosen as
      left to right parsing is more common than right to left.  The
      choice of a colon as separator of the prefix from the rest of the
      URI was arbitrary.

      The decoding of the rest of the string is defined as a function of
      the prefix.  New prefixed are introduced for new schemes as
      necessary, in agreement with the registration authority.  The
      registration of a new scheme clearly requires the definition of
      the decoding of the URI into a given name space, and a definition
      of the properties and, where applicable, resolution protocols, for
      the name space.

      The completeness requirement is easily met by allowing
      particularly strange or plain binary names to be encoded in base
      16 or 64 using the acceptable characters.

      The printability requirement could have been met by requiring all
      schemes to encode characters not part of a basic set.  This led to
      many discussions of what the basic set should be.  A difficult
      case, for example, is when an ISO latin 1 string appears in a URL,
      and within an application with ISO Latin-1 capability, it can be
      handled intact.  However, for transport in general, the non-ASCII



Berners-Lee                                                     [Page 4]

RFC 1630                      URIs in WWW                      June 1994


      characters need to be escaped.

      The solution to this was to specify a safe set of characters, and
      a general escaping scheme which may be used for encoding "unsafe"
      characters.  This "safe" set is suitable, for example, for use in
      electronic mail.  This is the canonical form of a URI.

      The choice of escape character for introducing representations of
      non-allowed characters also tends to be a matter of taste.  An
      ANSI standard exists in the C language, using the back-slash
      character "\".  The use of this character on unix command lines,
      however, can be a problem as it is interpreted by many shell
      programs, and would have itself to be escaped.  It is also a
      character which is not available on certain keyboards.  The equals
      sign is commonly used in the encoding of names having
      attribute=value pairs.  The percent sign was eventually chosen as
      a suitable escape character.

      There is a conflict between the need to be able to represent many
      characters including spaces within a URI directly, and the need to
      be able to use a URI in environments which have limited character
      sets or in which certain characters are prone to corruption.  This
      conflict has been resolved by use of an hexadecimal escaping
      method which may be applied to any characters forbidden in a given
      context.  When URLs are moved between contexts, the set of
      characters escaped may be enlarged or reduced unambiguously.

      The use of white space characters is risky in URIs to be printed
      or sent by electronic mail, and the use of multiple white space
      characters is very risky.  This is because of the frequent
      introduction of extraneous white space when lines are wrapped by
      systems such as mail, or sheer necessity of narrow column width,
      and because of the inter-conversion of various forms of white
      space which occurs during character code conversion and the
      transfer of text between applications.  This is why the canonical
      form for URIs has all white spaces encoded.

Reommendations

   This section describes the syntax for URIs as used in the WorldWide
   Web initiative.  The generic syntax provides a framework for new
   schemes for names to be resolved using as yet undefined protocols.

URI syntax

   A complete URI consists of a naming scheme specifier followed by a
   string whose format is a function of the naming scheme.  For locators
   of information on the Internet, a common syntax is used for the IP



Berners-Lee                                                     [Page 5]

RFC 1630                      URIs in WWW                      June 1994


   address part. A BNF description of the URL syntax is given in an a
   later section. The components are as follows.  Fragment identifiers
   and relative URIs are not involved in the basic URL definition.

   SCHEME

      Within the URI of a object, the first element is the name of the
      scheme, separated from the rest of the object by a colon.

   PATH

      The rest of the URI follows the colon in a format depending on the
      scheme. The path is interpreted in a manner dependent on the
      protocol being used.  However, when it contains slashes, these
      must imply a hierarchical structure.

Reserved characters

   The path in the URI has a significance defined by the particular
   scheme.  Typically, it is used to encode a name in a given name
   space, or an algorithm for accessing an object.  In either case, the
   encoding may use those characters allowed by the BNF syntax, or
   hexadecimal encoding of other characters.

   Some of the reserved characters have special uses as defined here.

   THE PERCENT SIGN

      The percent sign ("%", ASCII 25 hex) is used as the escape
      character in the encoding scheme and is never allowed for anything
      else.

   HIERARCHICAL FORMS

      The slash ("/", ASCII 2F hex) character is reserved for the
      delimiting of substrings whose relationship is hierarchical.  This
      enables partial forms of the URI.  Substrings consisting of single
      or double dots ("." or "..") are similarly reserved.

      The significance of the slash between two segments is that the
      segment of the path to the left is more significant than the
      segment of the path to the right.  ("Significance" in this case
      refers solely to closeness to the root of the hierarchical
      structure and makes no value judgement!)







Berners-Lee                                                     [Page 6]

RFC 1630                      URIs in WWW                      June 1994


      Note

         The similarity to unix and other disk operating system filename
         conventions should be taken as purely coincidental, and should
         not be taken to indicate that URIs should be interpreted as
         file names.

   HASH FOR FRAGMENT IDENTIFIERS

      The hash ("#", ASCII 23 hex) character is reserved as a delimiter
      to separate the URI of an object from a fragment identifier .

   QUERY STRINGS

      The question mark ("?", ASCII 3F hex) is used to delimit the
      boundary between the URI of a queryable object, and a set of words
      used to express a query on that object.  When this form is used,
      the combined URI stands for the object which results from the
      query being applied to the original object.

      Within the query string, the plus sign is reserved as shorthand
      notation for a space.  Therefore, real plus signs must be encoded.
      This method was used to make query URIs easier to pass in systems
      which did not allow spaces.

      The query string represents some operation applied to the object,
      but this specification gives no common syntax or semantics for it.
      In practice the syntax and sematics may depend on the scheme and
      may even on the base URI.

   OTHER RESERVED CHARACTERS

      The astersik ("*", ASCII 2A hex) and exclamation mark ("!" , ASCII
      21 hex) are reserved for use as having special signifiance within
      specific schemes.

Unsafe characters

   In canonical form, certain characters such as spaces, control
   characters, some characters whose ASCII code is used differently in
   different national character variant 7 bit sets, and all 8bit
   characters beyond DEL (7F hex) of the ISO Latin-1 set, shall not be
   used unencoded. This is a recommendation for trouble-free
   interchange, and as indicated below, the encoded set may be extended
   or reduced.
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -