⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 xml-intro.txt

📁 python web programming 部分
💻 TXT
📖 第 1 页 / 共 3 页
字号:
       |           +------item
       |           +------item
       |           +------item
       |           +------item
       |           +------item
       |           +------item
       |           +------item
       |           +------item
       |           +------item       
       |
       |----- directions
 
       Figure 1 : XML document tree

If an element contains no data, you can use the following syntax:

    <foo></foo>

Similarly, the following notation is used to specify elements that accept no data.

    <foo/>

XML places a number of restrictions on element names.  Specifically, element names
must start with a letter or underscore (_) and may only contain letters, digits,
periods, hyphens, and underscores thereafter.  There is no length limitation although
you probably don't want to use extremely long names unless you're intent is to be annoy users.
In addition, element names are case sensitive.  Therefore, <title> is different than
<TITLE> or <Title>.

Since the special characters < and > are used to denote tags, they may
not appear in an XML document.  Instead, the special sequences
&lt; and &gt; are used to denote these characters.  For example:
 
   <para>The &lt;BLINK&gt; tag rulez man!</para>

A similar syntax of &#nnnn; where nnnn is a decimal number is used to specify
an arbitrary Unicode character. For example:

    Jalape&#241;o pepper

Hexadecimal characters can be specified using &#xhhhh as follows:

    Jalape&#xf1;o pepper

Since trying to remember numeric character codes is somewhat inconvenient, 
common characters are sometimes referenced by special names.  For example:

    Jalape&ntilde;o pepper

In certain cases, using the special character codes is awkward or
inconvenient.  For example, if you wanted to include an interactive Python
session in a document, you might have to write the following:

    &gt;&gt;&gt; import xml.sax

However, XML allows raw character data to be inserted into a document as follows:

    <![CDATA[>>> import xml.sax]]>

In general, repeated whitespace such as newlines and extra spaces are ignored
in XML documents.   However, documents may specify alternate whitespace handling rules
in a number of ways.  For instance, a CDATA section described above can be used to
include preformatted text.  Similarly, a document element can indicate that spacing
should be preserved by including the xml:space attribute as follows:

    <code xml:space="preserve">
    for i in range(0,10):
         print i
    </code>

xml:space is a reserved attribute that is either "default" or "preserve".  

Just as XML allows user-definable elements, elements are allowed to
have an an arbitrary number of user-definable attributes.  Attributes
are additional properties that can be attached to each element.  For
example:

    <item num="1"> Jalapeno pepper, diced </item>
    <item num="12" units="bottles"> Ice-cold beer </item>

Attributes are always defined by a name=value pair as shown.  Like
element names, attribute names must start with a letter or underscore and
are case-sensitive.  The value is always a string enclosed within
double quotes (") or single quotes (').   Unlike HTML, an attribute value
must always be supplied and it must always be enclosed in quotes.

If an attribute value contains quotes, it should be enclosed by a different type of quote.
For example:

    <expletive type="d'oh">
    <door height='84"'>

If both kinds of quotes appear, they need to be escaped with code such
as &quot; (") or &apos; ('). 

5. DTDs
-------
One of XML's greatest strengths is that it allows you to create highly
customized documents by defining new types of elements and attributes.
However, doing this in a completely haphazard manner may lead to
complete chaos. To keep things under control, many XML documents are
written according to a specification known as a DTD (Document Type
Definition).

A DTD is a formal specification that enumerates all of the allowable
elements, attributes, and values that may appear in a document.  Not
only that, a DTD precisely defines the structure of how these elements
are supposed to be used in relation to each other within the
document tree. For instance, in the recipe example, a DTD would specify
that <recipe> is supposed to be the top level element, <item> can only
appear inside <ingredients>, "num" and "units" are the only allowable
attributes of <item>, the "num" attribute has to be a number and so
forth.

DTD information is specified using XML markup declarations, as denoted
by the special <!....> syntax previously defined.  The declaration of
a DTD usually looks like this.

    <!DOCTYPE recipe [
    <!-- The RECIPE DTD appears here -->
    <!......>
    <!......>
    ...
    ]>
 
Each declaration within the DTD then defines additional information
about the document. For example, if the recipe document were to
include a DTD it might look like this:

    <?xml version="1.0" encoding="utf-8"?>

    <!DOCTYPE recipe [
    <!-- The RECIPE DTD appears here -->
       <!ELEMENT recipe (title, description?, ingredients, directions)>
       <!ELEMENT ingredients (item+)>
       <!ELEMENT title (#PCDATA)>
       <!ELEMENT description (#PCDATA)>
       <!ELEMENT item (#PCDATA)>
       <!ELEMENT directions (#PCDATA)>
       <!ATTLIST item num    CDATA    #REQUIRED
                      units  (C | tsp | tbl | bottles | none)  "none">
    ]>
    <!-- End of DTD -->

    <recipe>
        ...
    </recipe>

In words, this DTD defines the following properties:

   -  The <recipe> element must include a <title> element, an optional
      <description> element, an <ingredients> element, and a 
      <directions> element in that order.   No element may appear
      more than once.  Furthermore, a <recipe> may not contain 
      any additional text (it can only contain those four elements).

   -  The <ingredients> element must contain one or more <item>
      elements.   

   -  The <title>, <description>, <item>, and <directions> elements can
      contain arbitrary parsable character data (#PCDATA).  However,
      they may not contain any other elements.

   -  The <item> element has two allowable attributes "num" and "units".
      The "num" attribute is required.  The "units" attribute is optional
      and is restricted to specific values such as "C", "tsp", "tbl", and
      so forth.  It has a default value of "none".

When an XML document includes a DTD directly in the document file as
shown, it is known as an internal DTD--meaning that the DTD only
applies to that document.  However, since it is usually more useful to
apply a DTD to a whole collection of XML documents, it is more common
to include the DTD from a separate location like this:

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE recipe SYSTEM "recipe.dtd">
    <recipe>
       ...
    </recipe>

In this case, the recipe DTD is read from a file recipe.dtd.  The contents of
this file include the same DTD as before except that the enclosing <!DOCTYPE [ ... ]>
text is not included.  For example:

     <!-- recipe.dtd -->
     <!ELEMENT recipe (title, description?, ingredients, directions)>
     <!ELEMENT ingredients (item+)>
     <!ELEMENT title (#PCDATA)>
     <!ELEMENT description (#PCDATA)>
     <!ELEMENT item (#PCDATA)>
     <!ELEMENT directions (#PCDATA)>
     <!ATTLIST item num    CDATA    #REQUIRED
                    units  (C | tsp | tbl | bottles | none)  "none">


When a DTD is read from an external location, it is known as an
external DTD.  However, in some cases, a document may define DTD
elements both externally and internally.  For example:

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE recipe SYSTEM "recipe.dtd" [
        <!-- Internal DTD subset -->
        <!ENTITY GuacImage SYSTEM "guac.gif" NDATA GIF>
        ...
    ]>
    <recipe>
       ...
    </recipe>

In this case, the external DTD is used to define general document structure 
whereas the internal DTD subset is used to define features specific to
the document.  In this case, the internal DTD is defining a reference
to an unparsed entity guac.gif (described shortly).

The topic of writing DTDs is an advanced topic and this chapter has
only provided a high-level view of what a DTD actually looks like when
it is included in a document.  Since many XML documents do not include
DTDs and DTDs generally don't play a central role in most XML
processing applications (other than document validation), no further
details are provided here.  You should consult an XML book for the gory
details of DTD design and creation.

6. Physical Structure and Entities
----------------------------------
Complicated XML documents are rarely created by including everything
in a single file.  Instead, different pieces of the document tend to
be included from other files or public URLs.  For example, in the previous
section, a declaration such as this was used to include a DTD from a
separate file:

    <?xml version="1.0" encoding="utf-8"?>
    <!DOCTYPE recipe SYSTEM "recipe.dtd">
    <recipe>
       ...
    </recipe>

More formally, when an XML document is composed from multiple pieces
like this, each "piece" is known as an "entity."  In simple terms, you might
think of entities as being files.  For example, the external DTD recipe.dtd
is an entity and the XML document itself is an entity.   If you were composing
a very large XML document such as a book, you might divide it up into a
collection of smaller files for each chapter like this:

       Book.xml
        |
        |-------> Book.dtd
        |-------> chap1.xml
        |-------> chap2.xml
        ...
        |-------> chap37.xml
        |-------> index.sml

In this case, each file is a entity.  One reason for using the
"entity" terminology is that entities are really more general than
simple files.  For example, you could also compose a document by
linking to documents elsewhere on the internet (as specified by a
URL).  For example:

       Book.xml
        |
        |------> http://www.newriders.com/Book.dtd
        |------> chap1.xml
        |------> chap2.xml
        |------> http://www.dead.com/book/draft/chap3.xml
        ...

XML also allows entities to be defined internally, in which case they work
like a simple macro.

Entities are always declared in a DTD using an <!ENTITY ...> declaration.
The most simple kind of entity is an internal general entity which is 
defined like this:

   <!DOCTYPE recipe [
      ...
      <!ENTITY guac "Guacamole">
   ]>

In this case, the entity works exactly like a simple macro.  To use it, simply use 
& like this:

   <title>Famous &guac;</title>

In this case, the &guac; sequence is simply replaced with the text "Guacamole".
XML provides a few predefined entities that were already described in a previous section:

    &lt;        <

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -