📄 rfc1866.txt
字号:
element. [SGML]
entity
data with an associated notation or interpretation; for
example, a sequence of octets associated with an
Internet Media Type. [SGML]
fragment identifier
the portion of an HREF attribute value following the `#'
character which modifies the presentation of the
destination of a hyperlink.
form data set
a sequence of name/value pairs; the names are given by
an HTML document and the values are given by a user.
HTML document
An SGML document conforming to this document type
definition.
hyperlink
a relationship between two anchors, called the head and
the tail. The link goes from the tail to the head. The
head and tail are also known as destination and source,
respectively.
Berners-Lee & Connolly Standards Track [Page 7]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
markup
Syntactically delimited characters added to the data of
a document to represent its structure. There are four
different kinds of markup: descriptive markup (tags),
references, markup declarations, and processing
instructions. [SGML]
may
A document or user interface is conforming whether this
statement applies or not.
media type
an Internet Media Type, as per [IMEDIA].
message entity
a head and body. The head is a collection of name/value
fields, and the body is a sequence of octets. The head
defines the content type and content transfer encoding
of the body. [MIME]
minimally conforming
HTML user agent
A user agent that conforms to this specification except
for form processing. It may only process level 1 HTML
documents.
must
Documents or user agents in conflict with this statement
are not conforming.
numeric character
reference
markup that refers to a character by its code position
in the document character set.
SGML document
A sequence of characters organized physically as a set
of entities and logically into a hierarchy of elements.
An SGML document consists of data characters and markup;
the markup describes the structure of the information
and an instance of that structure. [SGML]
shall
If a document or user agent conflicts with this
statement, it does not conform to this specification.
Berners-Lee & Connolly Standards Track [Page 8]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
should
If a document or user agent conflicts with this
statement, undesirable results may occur in practice
even though it conforms to this specification.
start-tag
Descriptive markup that identifies the start of an
element and specifies its generic identifier and
attributes. [SGML]
syntax-reference
character set
A coded character set whose range includes all
characters used for markup; e.g. name characters and
delimiter characters.
tag
Markup that delimits an element. A tag includes a name
which refers to an element declaration in the DTD, and
may include attributes. [SGML]
text entity
A finite sequence of characters. A text entity typically
takes the form of a sequence of octets with some
associated character encoding scheme, transmitted over
the network or stored in a file. [SGML]
typical
Typical processing is described for many elements. This
is not a mandatory part of the specification but is
given as guidance for designers and to help explain the
uses for which the elements were intended.
URI
A Uniform Resource Identifier is a formatted string that
serves as an identifier for a resource, typically on the
Internet. URIs are used in HTML to identify the anchors
of hyperlinks. URIs in common practice include Uniform
Resource Locators (URLs)[URL] and Relative URLs
[RELURL].
user agent
A component of a distributed system that presents an
interface and processes requests on behalf of a user;
for example, a www browser or a mail user agent.
Berners-Lee & Connolly Standards Track [Page 9]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
WWW
The World-Wide Web is a hypertext-based, distributed
information system created by researchers at CERN in
Switzerland. <URL:http://www.w3.org/>
3. HTML as an Application of SGML
HTML is an application of ISO 8879:1986 -- Standard Generalized
Markup Language (SGML). SGML is a system for defining structured
document types and markup languages to represent instances of those
document types[SGML]. The public text -- DTD and SGML declaration --
of the HTML document type definition are provided in 9, "HTML Public
Text".
The term "HTML" refers to both the document type defined here and the
markup language for representing instances of this document type.
3.1. SGML Documents
An HTML document is an SGML document; that is, a sequence of
characters organized physically into a set of entities, and logically
as a hierarchy of elements.
In the SGML specification, the first production of the SGML syntax
grammar separates an SGML document into three parts: an SGML
declaration, a prologue, and an instance. For the purposes of this
specification, the prologue is a DTD. This DTD describes another
grammar: the start symbol is given in the doctype declaration, the
terminals are data characters and tags, and the productions are
determined by the element declarations. The instance must conform to
the DTD, that is, it must be in the language defined by this grammar.
The SGML declaration determines the lexicon of the grammar. It
specifies the document character set, which determines a character
repertoire that contains all characters that occur in all text
entities in the document, and the code positions associated with
those characters.
The SGML declaration also specifies the syntax-reference character
set of the document, and a few other parameters that bind the
abstract syntax of SGML to a concrete syntax. This concrete syntax
determines how the sequence of characters of the document is mapped
to a sequence of terminals in the grammar of the prologue.
Berners-Lee & Connolly Standards Track [Page 10]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
For example, consider the following document:
<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<title>Parsing Example</title>
<p>Some text. <em>*wow*</em></p>
An HTML user agent should use the SGML declaration that is given in
9.5, "SGML Declaration for HTML". According to its document character
set, `*' refers to an asterisk character, `*'.
The instance above is regarded as the following sequence of
terminals:
1. start-tag: TITLE
2. data characters: "Parsing Example"
3. end-tag: TITLE
4. start-tag: P
5. data characters "Some text."
6. start-tag: EM
7. data characters: "*wow*"
8. end-tag: EM
9. end-tag: P
Berners-Lee & Connolly Standards Track [Page 11]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
The start symbol of the DTD grammar is HTML, and the productions are
given in the public text identified by `-//IETF//DTD HTML 2.0//EN'
(9.1, "HTML DTD"). The terminals above parse as:
HTML
|
\-HEAD
| |
| \-TITLE
| |
| \-<TITLE>
| |
| \-"Parsing Example"
| |
| \-</TITLE>
|
\-BODY
|
\-P
|
\-<P>
|
\-"Some text. "
|
\-EM
| |
| \-<EM>
| |
| \-"*wow*"
| |
| \-</EM>
|
\-</P>
Some of the elements are delimited explicitly by tags, while the
boundaries of others are inferred. The <HTML> element contains a
<HEAD> element and a <BODY> element. The <HEAD> contains <TITLE>,
which is explicitly delimited by start- and end-tags.
3.2. HTML Lexical Syntax
SGML specifies an abstract syntax and a reference concrete syntax.
Aside from certain quantities and capacities (e.g. the limit on the
length of a name), all HTML documents use the reference concrete
syntax. In particular, all markup characters are in the repertoire of
[ISO-646]. Data characters are drawn from the document character set
(see 6, "Characters, Words, and Paragraphs").
Berners-Lee & Connolly Standards Track [Page 12]
RFC 1866 Hypertext Markup Language - 2.0 November 1995
A complete discussion of SGML parsing, e.g. the mapping of a sequence
of characters to a sequence of tags and data, is left to the SGML
standard[SGML]. This section is only a summary.
3.2.1. Data Characters
Any sequence of characters that do not constitute markup (see 9.6
"Delimiter Recognition" of [SGML]) are mapped directly to strings of
data characters. Some markup also maps to data character strings.
Numeric character references map to single-character strings, via the
document character set. Each reference to one of the general entities
defined in the HTML DTD maps to a single-character string.
For example,
abc<def => "abc","<","def"
abc<def => "abc","<","def"
The terminating semicolon on entity or numeric character references
is only necessary when the character following the reference would
otherwise be recognized as part of the name (see 9.4.5 "Reference
End" in [SGML]).
abc < def => "abc ","<"," def"
abc < def => "abc ","<"," def"
An ampersand is only recognized as markup when it is followed by a
letter or a `#' and a digit:
abc & lt def => "abc & lt def"
abc &# 60 def => "abc &# 60 def"
A useful technique for translating plain text to HTML is to replace
each '<', '&', and '>' by an entity reference or numeric character
reference as follows:
ENTITY NUMERIC
CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION
--------- ---------- ----------- ---------------------
& & & Ampersand
< < < Less than
> > > Greater than
NOTE - There are SGML mechanisms, CDATA and RCDATA
declared content, that allow most `<', `>', and `&'
characters to be entered without the use of entity
references. Because these mechanisms tend to be used and
implemented inconsistently, and because they conflict
Berners-Lee & Connolly Standards Track [Page 13]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -