📄 jaxpdom2.html
字号:
</p><a name="wp68348"> </a><p class="pBody">So for simple data structures like the address book above, you could save yourself a bit of work by using JDOM or dom4j. It may make sense to use one of those models even when the data is technically "mixed", but when there is always one (and only one) segment of text for a given node.</p><a name="wp68375"> </a><p class="pBody">Here is an example of that kind of structure, which would also be easily processed in JDOM or dom4j:</p><div class="pPreformattedRelative"><pre class="pPreformattedRelative"><addressbook> <entry>Fred <email>fred@home</email> </entry>  ...</addressbook><a name="wp68391"> </a></pre></div><a name="wp68453"> </a><p class="pBody">Here, each entry has a bit of identifying text, followed by other elements. With this structure, the program could navigate to an entry, invoke <code class="cCode">text()</code> to find out who it belongs to, and process the <code class="cCode"><email></code> sub element if it is at the correct node.</p><a name="wp68475"> </a><h3 class="pHeading2">Increasing the Complexity</h3><a name="wp68485"> </a><p class="pBody">But to get a full understanding of the kind of processing you need to do when searching or manipulating a DOM, it is important to know the kinds of nodes that a DOM can conceivably contain.</p><a name="wp68500"> </a><p class="pBody">Here is an example that tries to bring the point home. It is a representation of this data:</p><div class="pPreformattedRelative"><pre class="pPreformattedRelative"><sentence> The &projectName; <![CDATA[<i>project</i>]]> is <?editor: red><bold>important</bold><?editor: normal>.</sentence><a name="wp68518"> </a></pre></div><a name="wp68548"> </a><p class="pBody">This sentence contains an <span style="font-style: italic">entity reference</span> -- a pointer to an "entity" which is defined elsewhere. In this case, the entity contains the name of the project. The example also contains a <code class="cCode">CDATA</code> section (uninterpreted data, like <code class="cCode"><pre></code> data in HTML), as well as <span style="font-style: italic">processing instructions </span>(<code class="cCode"><?...?></code>) that in this case tell the editor to which color to use when rendering the text.</p><a name="wp68588"> </a><p class="pBody">Here is the DOM structure for that data. It's fairly representative of the kind of structure that a robust application should be prepared to handle:</p><div class="pPreformattedRelative"><pre class="pPreformattedRelative">+ ELEMENT: sentence + TEXT: The + ENTITY REF: projectName + COMMENT: The latest name we're using + TEXT: Eagle + CDATA: <i>project</i> + TEXT: is + PI: editor: red + ELEMENT: bold + TEXT: important + PI: editor: normal<a name="wp68598"> </a></pre></div><a name="wp68732"> </a><p class="pBody">This example depicts the kinds of nodes that may occur in a DOM. Although your application may be able to ignore most of them most of the time, a truly robust implementation needs to recognize and deal with each of them.</p><a name="wp68860"> </a><p class="pBody">Similarly, the process of navigating to a node involves processing subelements, ignoring the ones you don't care about and inspecting the ones you do care about, until you find the node you are interested in.</p><a name="wp69079"> </a><p class="pBody">Often, in such cases, you are interested in finding a node that contains specific text. For example, in The DOM API you saw an example where you wanted to find a <code class="cCode"><coffee> </code>node whose <code class="cCode"><name></code> element contains the text, "Mocha Java". To carry out that search, the program needed to work through the list of <code class="cCode"><coffee> </code>elements and, for each one: a) get the <code class="cCode"><name></code> element under it and, b) examine the <code class="cCode">TEXT</code> node under that element. </p><a name="wp69139"> </a><p class="pBody">That example made some simplifying assumptions, however. It assumed that processing instructions, comments, <code class="cCode">CDATA</code> nodes, and entity references would not exist in the data structure. Many simple applications can get away with such assumptions. Truly robust applications, on the other hand, need to be prepared to deal with the all kinds of valid XML data.</p><a name="wp73222"> </a><p class="pBody">(A "simple" application will work only so long as the input data contains the simplified XML structures it expects. But there are no validation mechanisms to ensure that more complex structures will not exist. After all, XML was specifically designed to allow them.)</p><a name="wp69158"> </a><p class="pBody">To be more robust, the sample code described in The DOM API, would have to do these things:</p><div class="pSmartList1"><ol type="1" class="pSmartList1"><a name="wp69170"> </a><div class="pSmartList1"><li>When searching for the <code class="cCode"><name></code> element:</li></div><div class="pSmartList2"><ol type="a" class="pSmartList2"><a name="wp73140"> </a><div class="pSmartList2"><li>Ignore comments, attributes, and processing instructions.</li></div><a name="wp73107"> </a><div class="pSmartList2"><li>Allow for the possibility that the <code class="cCode"><coffee></code> subelements do not occur in the expected order.</li></div><a name="wp73118"> </a><div class="pSmartList2"><li>Skip over <code class="cCode">TEXT</code> nodes that contain ignorable whitespace, if not validating.</li></div></ol></div><a name="wp69194"> </a><div class="pSmartList1"><li>When extracting text for a node:</li></div><div class="pSmartList2"><ol type="a" class="pSmartList2"><a name="wp69187"> </a><div class="pSmartList2"><li>Extract text from <code class="cCode">CDATA</code> nodes as well as text nodes.</li></div><a name="wp69219"> </a><div class="pSmartList2"><li>Ignore comments, attributes, and processing instructions when gathering the text.</li></div><a name="wp73166"> </a><div class="pSmartList2"><li>If an entity reference node or another element node is encountered, recurse. (That is, apply the text-extraction procedure to all subnodes.)</li></div></ol></div></ol></div><hr><a name="wp73188"> </a><p class="pNote">Note: The JAXP 1.2 parser does not insert entity reference nodes into the DOM. Instead, it inserts a <code class="cCode">TEXT</code> node containing the contents of the reference. The JAXP 1.1 parser which is built into the 1.4 platform, on the other hand, does insert entity reference nodes. So a robust implementation which is parser-independent needs to be prepared to handle entity reference nodes.</p><hr><a name="wp69233"> </a><p class="pBody">Many applications, of course, won't have to worry about such things, because the kind of data they see will be strictly controlled. But if the data can come from a variety of external sources, then the application will probably need to take these possibilities into account.</p><a name="wp75710"> </a><p class="pBody">The code you need to carry out these functions is given near the end of the DOM tutorial in <a href="JAXPDOM7.html#wp75112">Searching for Nodes</a> and <a href="JAXPDOM7.html#wp75118">Obtaining Node Content</a>. Right now, the goal is simply to determine whether DOM is suitable for your application. </p><a name="wp68875"> </a><h3 class="pHeading2">Choosing Your Model</h3><a name="wp68885"> </a><p class="pBody">As you can see, when you are using DOM, even a simple operation like getting the text from a node can take a bit of programming. So if your programs will be handling simple data structures, JDOM, dom4j, or even the 1.4 regular expression package (<code class="cCode">java.util.regex</code>) may be more appropriate for your needs.</p><a name="wp68905"> </a><p class="pBody">For full-fledged documents and complex applications, on the other hand, DOM gives you a lot of flexibility. And if you need to use XML Schema, then once again DOM is the way to go for now, at least.</p><a name="wp69263"> </a><p class="pBody">If you will be processing both documents <span style="font-style: italic">and</span> data in the applications you develop, then DOM may still be your best choice. After all, once you have written the code to examine and process a DOM structure, it is fairly easy to customize it for a specific purpose. So choosing to do everything in DOM means you'll only have to deal with one set of APIs, rather than two.</p><a name="wp68937"> </a><p class="pBody">Plus, the DOM standard <span style="font-style: italic">is</span> a standard. It is robust and complete, and it has many implementations. That is a significant decision-making factor for many large installations -- particularly for production applications, to prevent doing large rewrites in the event of an API change.</p><a name="wp68951"> </a><p class="pBody">Finally, even though the text in an address book may not permit bold, italics, colors, and font sizes today, someday you may want to handle things. Since DOM will handle virtually anything you throw at it, choosing DOM makes it easier to "future-proof" your application.</p> </blockquote> <img src="images/blueline.gif" width="550" height="8" ALIGN="BOTTOM" NATURALSIZEFLAG="3" ALT="Divider"> <table width="550" summary="layout" id="SummaryNotReq1"> <tr> <td align="left" valign="center"> <font size="-1"> <a href="http://java.sun.com/j2ee/1.4/download.html#tutorial" target="_blank">Download</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/faq.html" target="_blank">FAQ</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/history.html" target="_blank">History</a> </td> <td align="center" valign="center"><a accesskey="p" href="JAXPDOM.html"><img id="LongDescNotReq1" src="images/PrevArrow.gif" width="26" height="26" border="0" alt="Prev" /></a><a accesskey="c" href="J2EETutorialFront.html"><img id="LongDescNotReq1" src="images/UpArrow.gif" width="26" height="26" border="0" alt="Home" /></a><a accesskey="n" href="JAXPDOM3.html"><img id="LongDescNotReq3" src="images/NextArrow.gif" width="26" height="26" border="0" alt="Next" /></a><a accesskey="i" href="J2EETutorialIX.html"></a> </td> <td align="right" valign="center"> <font size="-1"> <a href="http://java.sun.com/j2ee/1.4/docs/api/index.html" target="_blank">API</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/search.html" target="_blank">Search</a> <br> <a href="http://java.sun.com/j2ee/1.4/docs/tutorial/information/sendusmail.html" target="_blank">Feedback</a></font> </font> </td> </tr> </table> <img src="images/blueline.gif" width="550" height="8" ALIGN="BOTTOM" NATURALSIZEFLAG="3" ALT="Divider"><p><font size="-1">All of the material in <em>The J2EE(TM) 1.4 Tutorial</em> is <a href="J2EETutorialFront2.html">copyright</a>-protected and may not be published in other workswithout express written permission from Sun Microsystems.</font> </body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -