📄 2a_echo.html
字号:
isn't really wide, you won't see them all.</p>
</li>
<li>
<p> The single-tag empty element you defined (<code><item/></code>)
is treated exactly the same as a two-tag empty element (<code><item></item></code>).
It is, for all intents and purposes, identical. (It's just easier to type
and consumes less space.) </p>
</li>
</ul>
<h3><a name="identifying"></a>Identifying the Events</h3>
<p>This version of the echo program might be useful for displaying an XML file,
but it's not telling you much about what's going on in the parser. The next
step is to modify the program so that you see where the spaces and vertical
lines are coming from.</p>
<blockquote>
<p><b>Note:</b> The code discussed in this section is in <a href="work/Echo02.java"><code>Echo02.java</code></a>.
The output it produces is contained in <a href="work/Echo02-01.log"><code>Echo02-01.log</code></a>.
</p>
</blockquote>
<p> Make the changes highlighted below to identify the events as they occur:</p>
<pre> public void startDocument ()
throws SAXException
{
<new><b> nl();
nl();
emit ("START DOCUMENT");
nl();
</b></new> emit ("<?xml version='1.0' encoding='UTF-8'?>");
<old><strike>nl();</strike></old>
}
public void endDocument ()
throws SAXException
{
<new><b> nl(); emit ("END DOCUMENT");
</b></new> try {
...
}
public void startElement (String name, AttributeList attrs)
throws SAXException
{
<new><b> nl(); emit ("ELEMENT: ");
</b></new> emit ("<"+name);
if (attrs != null) {
for (int i = 0; i < attrs.getLength (); i++) {
<old><strike>emit (" ");</strike></old>
<old><strike>emit (attrs.getName(i)+"=\""+attrs.getValue (i)+"\"");</strike></old>
<new><b> nl();
emit(" ATTR: ");
emit (attrs.getName (i));
emit ("\t\"");
emit (attrs.getValue (i));
emit ("\"");
</b></new> }
}
<new><b> if (attrs.getLength() > 0) nl();
</b></new> emit (">");
}
public void endElement (String name)
throws SAXException
{
<new><b> nl();
emit ("END_ELM: ");
</b></new> emit ("</"+name+">");
}
public void characters (char buf [], int offset, int len)
throws SAXException
{
<new><b> nl(); emit ("CHARS: |"); </b></new>
String s = new String(buf, offset, len);
emit (s);
<new><b> emit ("|");
</b></new> }
</pre>
<p>Compile and run this version of the program to produce a more informative output
listing. The attributes are now shown one per line, which is nice. But, more
importantly, output lines like this one:</p>
<blockquote>
<pre>CHARS: |
|
</pre>
</blockquote>
<p>show that the <code>characters</code> method is responsible for echoing both
the spaces that create the indentation and the multiple newlines that separate
the attributes.</p>
<blockquote>
<p><b><a name="lineEndings"></a>Note: </b>The XML specification requires all
input line separators to be normalized to a single newline. The newline character
is specified as <code>\n</code> in Java, C, and Unix systems, but goes by
the alias "linefeed" in Windows systems.</p>
</blockquote>
<h3><a name="compressing"></a>Compressing the Output</h3>
<p>To make the output more readable, modify the program so that it only outputs
characters containing something other than whitespace.</p>
<blockquote>
<p><b>Note:</b> The code discussed in this section is in <a href="work/Echo03.java"><code>Echo03.java</code></a>.
</p>
</blockquote>
<p>Make the changes shown below to suppress output of characters that are all
whitespace:</p>
<pre> public void characters (char buf [], int offset, int len)
throws SAXException
{
<old><strike>nl(); emit ("CHARS: |");</strike></old>
<new><b> nl(); emit ("CHARS: ");
</b></new> String s = new String(buf, offset, len);
<old><strike>emit (s);</strike></old>
<old><strike>emit ("|");</strike></old>
<new><b> if (!s.trim().equals("")) emit (s);
</b></new> }
</pre>
<p>If you run the program now, you will see that you have eliminated the indentation
as well, because the indent space is part of the whitespace that precedes the
start of an element. Add the code highlighted below to manage the indentation:</p>
<pre>
static private Writer out;
<new><b>
private String indentString = " "; // Amount to indent
private int indentLevel = 0;
</b></new>
...
public void startElement (String name, AttributeList attrs)
throws SAXException
{
<new><b> indentLevel++;
</b></new> nl(); emit ("ELEMENT: ");
...
}
public void endElement (String name)
throws SAXException
{
nl();
emit ("END_ELM: ");
emit ("</"+name+">");
<new><b> indentLevel--;
</b></new> }
...
private void nl ()
throws SAXException
{
...
try {
out.write (lineEnd);
<new><b> for (int i=0; i < indentLevel; i++) out.write(indentString);
</b></new>
} catch (IOException e) {
...
}
</pre>
<p>This code sets up an indent string, keeps track of the current indent level,
and outputs the indent string whenever the <code>nl</code> method is called.
If you set the indent string to "", the output will be un-indented
(Try it. You'll see why it's worth the work to add the indentation.)</p>
<p><b> </b>You'll be happy to know that you have reached the end of the "mechanical"
code you have to add to the Echo program. From here on, you'll be doing things
that give you more insight into how the parser works. The steps you've taken
so far, though, have given you a lot of insight into how the parser sees the
XML data it processes. It's also given you a helpful debugging tool you can
use to see what the parser sees.</p>
<h3><a name="inspecting"></a>Inspecting the Output</h3>
<p>The complete output for this version of the program is contained in <a href="work/Echo03-01.log"><code>Echo03-01.log</code></a>.
Part of that output is shown here:</p>
<pre> ELEMENT: <slideshow
...
CHARS:
CHARS:
ELEMENT: <slide
...
END_ELM: </slide>
CHARS:
CHARS:
</pre>
<p>Note that the <code>characters</code> method was invoked twice in a row. Inspecting
the source file <a href="samples/slideSample01.xml"><code>slideSample01.xml</code></a>
shows that there is a comment before the first slide. The first call to <code>characters</code>
comes before that comment. The second call comes after. (Later on, you'll see
how to be notified when the parser encounters a comment, although in most cases
you won't need such notifications.)</p>
<p>Note, too, that the <code>characters</code> method is invoked after the first
slide element, as well as before. When you are thinking in terms of hierarchically
structured data, that seems odd. After all, you intended for the <code>slideshow</code>
element to contain <code>slide</code> elements, not text. Later on, you'll see
how to restrict the <code>slideshow</code> element using a DTD. When you do
that, the <code>characters</code> method will no longer be invoked. </p>
<p>In the absence of a DTD, though, the parser must assume that any element it
sees contains text like that in the first item element of the overview slide:</p>
<blockquote>
<pre><item>Why <em>WonderWidgets</em> are great</item></pre>
</blockquote>
<p>Here, the hierarchical structure looks like this:</p>
<blockquote>
<pre>ELEMENT: <item>
CHARS: Why
ELEMENT: <em>
CHARS: WonderWidgets
END_ELM: </em>
CHARS: are great
END_ELM: </item>
</pre>
</blockquote>
<h3><a name="docsAndData"></a>Documents and Data</h3>
<p>In this example, it's clear that there are characters intermixed with the hierarchical
structure of the elements. The fact that text can surround elements (or be prevented
from doing so with a DTD or schema) helps to explain why you sometimes hear
talk about "XML data" and other times hear about "XML documents".
XML comfortably handles both structured data and text documents that include
markup. The only difference between the two is whether or not text is allowed
between the elements.</p>
<blockquote>
<p><b>Note: </b><br>
In an upcoming section of this tutorial, you will work with the <code>ignorableWhitespace</code>
method in the <code>DocumentHandler</code> interface. This method can only
be invoked when a DTD is present. If a DTD specifies that <code>slideshow</code>
does not contain text, then all of the whitespace surrounding the <code>slide</code>
elements is by definition ignorable. On the other hand, if <code>slideshow</code>
can contain text (which must be assumed to be true in the absence of a DTD),
then the parser must assume that spaces and lines it sees between the <code>slide</code>
elements are significant parts of the document. </p>
</blockquote>
<blockquote>
<hr size=4>
</blockquote>
<p>
<p>
<table width="100%">
<tr>
<td align=left> <a href="1_write.html"><img src="../images/PreviousArrow.gif" width=26 height=26 align=top border=0 alt="Previous | "></a><a
href="2b_echo.html"><img src="../images/NextArrow.gif" width=26 height=26 align=top border=0 alt="Next | "></a><a href="../alphaIndex.html"><img src="../images/xml_IDX.gif" width=26 height=26 align=top border=0 alt="Index | "></a><a href="../TOC.html"><img
src="../images/xml_TOC.gif" width=26 height=26 align=top border=0 alt="TOC | "></a><a href="../index.html"><img
src="../images/xml_Top.gif" width=26 height=26 align=top border=0 alt="Top | "></a>
</td>
<td align=right><strong><em><a href="index.html">Top</a></em></strong> <a href="../TOC.html#intro"><strong><em>Contents</em></strong></a>
<a href="../alphaIndex.html"><strong><em>Index</em></strong></a> <a href="../glossary.html"><strong><em>Glossary</em></strong></a>
</td>
</tr>
</table>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -