📄 2a_echo.html
字号:
<P>Element attributes are listed all together on a single line. If your window isn't really wide, you won't see them all.</P> </LI> <LI> <P> The single-tag empty element you defined (<CODE><item/></CODE>) is treated exactly the same as a two-tag empty element (<CODE><item></item></CODE>). It is, for all intents and purposes, identical. (It's just easier to type and consumes less space.) </P> </LI></UL><H3><A NAME=identifying></A>Identifying the Events</H3><P>This version of the echo program might be useful for displaying an XML file, but it's not telling you much about what's going on in the parser. The next step is to modify the program so that you see where the spaces and vertical lines are coming from.</P><BLOCKQUOTE> <P><B>Note:</B> The code discussed in this section is in <A HREF=work/Echo02.java><CODE>Echo02.java</CODE></A>. The output it produces is contained in <A HREF=work/Echo02-01.log><CODE>Echo02-01.log</CODE></A>. </P></BLOCKQUOTE><P> Make the changes highlighted below to identify the events as they occur:</P><PRE> public void startDocument () throws SAXException {<NEW><B> nl(); nl(); emit ("START DOCUMENT"); nl(); </B></NEW> emit ("<?xml version='1.0' encoding='UTF-8'?>"); <OLD><STRIKE>nl();</STRIKE></OLD> } public void endDocument () throws SAXException {<NEW><B> nl(); emit ("END DOCUMENT");</B></NEW> try { ... } public void startElement (String name, AttributeList attrs) throws SAXException {<NEW><B> nl(); emit ("ELEMENT: ");</B></NEW> emit ("<"+name); if (attrs != null) { for (int i = 0; i < attrs.getLength (); i++) { <OLD><STRIKE>emit (" ");</STRIKE></OLD> <OLD><STRIKE>emit (attrs.getName(i)+"=\""+attrs.getValue (i)+"\"");</STRIKE></OLD><NEW><B> nl(); emit(" ATTR: "); emit (attrs.getName (i)); emit ("\t\""); emit (attrs.getValue (i)); emit ("\"");</B></NEW> } }<NEW><B> if (attrs.getLength() > 0) nl();</B></NEW> emit (">"); } public void endElement (String name) throws SAXException {<NEW><B> nl(); emit ("END_ELM: ");</B></NEW> emit ("</"+name+">"); } public void characters (char buf [], int offset, int len) throws SAXException { <NEW><B> nl(); emit ("CHARS: |"); </B></NEW> String s = new String(buf, offset, len); emit (s);<NEW><B> emit ("|");</B></NEW> }</PRE><P>Compile and run this version of the program to produce a more informative output listing. The attributes are now shown one per line, which is nice. But, more importantly, output lines like this one:</P><BLOCKQUOTE> <PRE>CHARS: | |</PRE></BLOCKQUOTE><P>show that the <CODE>characters</CODE> method is responsible for echoing both the spaces that create the indentation and the multiple newlines that separate <A name=DIFF27></A><A href=#DIFF0><IMG src=../diffpics/oold.gif></A><STRIKE>them. </STRIKE><A name=DIFF27></A><A href=#DIFF0><IMG src=../diffpics/onew.gif></A><STRONG><I>the attributes.</STRONG></I></P><BLOCKQUOTE> <P><B><A NAME=lineEndings></A>Note: </B>The XML specification requires all input line separators to be normalized to a single newline. The newline character is specified as <CODE>\n</CODE> in Java, C, and Unix systems, but goes by the alias "linefeed" in Windows systems.</P></BLOCKQUOTE><H3><A NAME=compressing></A>Compressing the Output</H3><P>To make the output more readable, modify the program so that it only outputs characters containing something other than whitespace.</P><BLOCKQUOTE> <P><B>Note:</B> The code discussed in this section is in <A HREF=work/Echo03.java><CODE>Echo03.java</CODE></A>. </P></BLOCKQUOTE><P>Make the changes shown below to suppress output of characters that are all whitespace:</P><PRE> public void characters (char buf [], int offset, int len) throws SAXException { <OLD><STRIKE>nl(); emit ("CHARS: |");</STRIKE></OLD><NEW><B> nl(); emit ("CHARS: ");</B></NEW> String s = new String(buf, offset, len); <OLD><STRIKE>emit (s);</STRIKE></OLD> <OLD><STRIKE>emit ("|");</STRIKE></OLD><NEW><B> if (!s.trim().equals("")) emit (s);</B></NEW> }</PRE><P>If you run the program now, you will see that you have eliminated the indentation as well, because the indent space is part of the whitespace that precedes the start of an element. Add the code highlighted below to manage the indentation:</P><PRE> static private Writer out; <NEW><B> private String indentString = " "; // Amount to indent private int indentLevel = 0;</B></NEW> ... public void startElement (String name, AttributeList attrs) throws SAXException {<NEW><B> indentLevel++;</B></NEW> nl(); emit ("ELEMENT: "); ... } public void endElement (String name) throws SAXException { nl(); emit ("END_ELM: "); emit ("</"+name+">");<NEW><B> indentLevel--;</B></NEW> } ... private void nl () throws SAXException { ... try { out.write (lineEnd);<NEW><B> for (int i=0; i < indentLevel; i++) out.write(indentString); </B></NEW> } catch (IOException e) { ... }</PRE><P>This code sets up an indent string, keeps track of the current indent level, and outputs the indent string whenever the <CODE>nl</CODE> method is called. If you set the indent string to "", the output will be un-indented (Try it. You'll see why it's worth the work to add the indentation.)</P><P><B> </B>You'll be happy to know that you have reached the end of the "mechanical" code you have to add to the Echo program. From here on, you'll be doing things that give you more insight into how the parser works. The steps you've taken so far, though, have given you a lot of insight into how the parser sees the XML data it processes. It's also given you a helpful debugging tool you can use to see what the parser sees.</P><H3><A NAME=inspecting></A>Inspecting the Output</H3><P>The complete output for this version of the program is contained in <A HREF=work/Echo03-01.log><CODE>Echo03-01.log</CODE></A>. Part of that output is shown here:</P><PRE> ELEMENT: <slideshow ... CHARS: CHARS: ELEMENT: <slide ... END_ELM: </slide> CHARS: CHARS: </PRE><P>Note that the <CODE>characters</CODE> method was invoked twice in a row. Inspecting the source file <A HREF=samples/slideSample01.xml><CODE>slideSample01.xml</CODE></A> shows that there is a comment before the first slide. The first call to <CODE>characters</CODE> comes before that comment. The second call comes after. (Later on, you'll see how to be notified when the parser encounters a comment, although in most cases you won't need such notifications.)</P><P>Note, too, that the <CODE>characters</CODE> method is invoked after the first slide element, as well as before. When you are thinking in terms of hierarchically structured data, that seems odd. After all, you intended for the <CODE>slideshow</CODE> element to contain <CODE>slide</CODE> elements, not text. Later on, you'll see how to restrict the <CODE>slideshow</CODE> element using a DTD. When you do that, the <CODE>characters</CODE> method will no longer be invoked. </P><P>In the absence of a DTD, though, the parser must assume that any element it sees contains text like that in the first item element of the overview slide:</P><BLOCKQUOTE> <PRE><item>Why <em>WonderWidgets</em> are great</item></PRE></BLOCKQUOTE><P>Here, the hierarchical structure looks like this:</P><BLOCKQUOTE> <PRE>ELEMENT: <item>CHARS: Why ELEMENT: <em> CHARS: WonderWidgets END_ELM: </em>CHARS: are greatEND_ELM: </item></PRE></BLOCKQUOTE><H3><A NAME=docsAndData></A>Documents and Data</H3><P>In this example, it's clear that there are characters intermixed with the hierarchical structure of the elements. The fact that text can surround elements (or be prevented from doing so with a DTD or schema) helps to explain why you sometimes hear talk about "XML data" and other times hear about "XML documents". XML comfortably handles both structured data and text documents that include markup. The only difference between the two is whether or not text is allowed between the elements.</P><BLOCKQUOTE> <P><B>Note: </B><BR> In an upcoming section of this tutorial, you will work with the <CODE>ignorableWhitespace</CODE> method in the <CODE>DocumentHandler</CODE> interface. This method can only be invoked when a DTD is present. If a DTD specifies that <CODE>slideshow</CODE> does not contain text, then all of the whitespace surrounding the <CODE>slide</CODE> elements is by definition ignorable. On the other hand, if <CODE>slideshow</CODE> can contain text (which must be assumed to be true in the absence of a DTD), then the parser must assume that spaces and lines it sees between the <CODE>slide</CODE> elements are significant parts of the document. </P></BLOCKQUOTE><BLOCKQUOTE><HR SIZE=4></BLOCKQUOTE><P><P> <TABLE WIDTH=100%><TR> <TD ALIGN=left> <A HREF=1_write.html><IMG SRC=../images/PreviousArrow.gif WIDTH=26 HEIGHT=26 ALIGN=top BORDER=0 ALT="Previous | "></A><A HREF=2b_echo.html><IMG SRC=../images/NextArrow.gif WIDTH=26 HEIGHT=26 ALIGN=top BORDER=0 ALT="Next | "></A><A HREF=../alphaIndex.html><IMG SRC=../images/xml_IDX.gif WIDTH=26 HEIGHT=26 ALIGN=top BORDER=0 ALT="Index | "></A><A HREF=../TOC.html><IMG SRC=../images/xml_TOC.gif WIDTH=26 HEIGHT=26 ALIGN=top BORDER=0 ALT="TOC | "></A><A HREF=../index.html><IMG SRC=../images/xml_Top.gif WIDTH=26 HEIGHT=26 ALIGN=top BORDER=0 ALT="Top | "></A> </TD><TD ALIGN=right><STRONG><EM><A HREF=index.html>Top</A></EM></STRONG> <A HREF=../TOC.html#intro><STRONG><EM>Contents</EM></STRONG></A> <A HREF=../alphaIndex.html><STRONG><EM>Index</EM></STRONG></A> <A HREF=../glossary.html><STRONG><EM>Glossary</EM></STRONG></A></TD></TR></TABLE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -