📄 xmlreader.html

📁 xml开源解析代码.版本为libxml2-2.6.29,可支持GB3212.网络消息发送XML时很有用.
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
<pre>def processNode(reader):    print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(),                              reader.Name(), reader.IsEmptyElement(),                              reader.Value())</pre><p>The result of the test is:</p><pre>0 1 doc 0 None1 1 a 1 None1 1 b 0 None2 3 #text 0 some text1 15 b 0 None1 3 #text 01 1 c 1 None0 15 doc 0 None</pre><p>There are a few things to note:</p><ul>  <li>the increase of the depth value (first row) as children nodes are    explored</li>  <li>the text node child of the b element, of type 3 and its content</li>  <li>the text node containing the line return between elements b and c</li>  <li>that elements have the Value None (or NULL in C)</li></ul><p>The equivalent routine for <code>processNode()</code> as used by<code>xmllint --stream --debug</code> is the following and can be found inthe xmllint.c module in the source distribution:</p><pre>static void processNode(xmlTextReaderPtr reader) {    xmlChar *name, *value;    name = xmlTextReaderName(reader);    if (name == NULL)        name = xmlStrdup(BAD_CAST "--");    value = xmlTextReaderValue(reader);    printf("%d %d %s %d",            xmlTextReaderDepth(reader),            xmlTextReaderNodeType(reader),            name,            xmlTextReaderIsEmptyElement(reader));    xmlFree(name);    if (value == NULL)        printf("\n");    else {        printf(" %s\n", value);        xmlFree(value);    }}</pre><h2><a name="Extracting1">Extracting information for the attributes</a></h2><p>The previous examples don't indicate how attributes are processed. Thesimple test "<code>&lt;doc a="b"/&gt;</code>" provides the followingresult:</p><pre>0 1 doc 1 None</pre><p>This proves that attribute nodes are not traversed by default. The<em>HasAttributes</em> property allow to detect their presence. To checktheir content the API has special instructions. Basically two kinds of operationsare possible:</p><ol>  <li>to move the reader to the attribute nodes of the current element, in    that case the cursor is positionned on the attribute node</li>  <li>to directly query the element node for the attribute value</li></ol><p>In both case the attribute can be designed either by its position in thelist of attribute (<em>MoveToAttributeNo</em> or <em>GetAttributeNo</em>) orby their name (and namespace):</p><ul>  <li><em>GetAttributeNo</em>(no): provides the value of the attribute with    the specified index no relative to the containing element.</li>  <li><em>GetAttribute</em>(name): provides the value of the attribute with    the specified qualified name.</li>  <li>GetAttributeNs(localName, namespaceURI): provides the value of the    attribute with the specified local name and namespace URI.</li>  <li><em>MoveToAttributeNo</em>(no): moves the position of the current    instance to the attribute with the specified index relative to the    containing element.</li>  <li><em>MoveToAttribute</em>(name): moves the position of the current    instance to the attribute with the specified qualified name.</li>  <li><em>MoveToAttributeNs</em>(localName, namespaceURI): moves the position    of the current instance to the attribute with the specified local name    and namespace URI.</li>  <li><em>MoveToFirstAttribute</em>: moves the position of the current    instance to the first attribute associated with the current node.</li>  <li><em>MoveToNextAttribute</em>: moves the position of the current    instance to the next attribute associated with the current node.</li>  <li><em>MoveToElement</em>: moves the position of the current instance to    the node that contains the current Attribute  node.</li></ul><p>After modifying the processNode() function to show attributes:</p><pre>def processNode(reader):    print "%d %d %s %d %s" % (reader.Depth(), reader.NodeType(),                              reader.Name(), reader.IsEmptyElement(),                              reader.Value())    if reader.NodeType() == 1: # Element        while reader.MoveToNextAttribute():            print "-- %d %d (%s) [%s]" % (reader.Depth(), reader.NodeType(),                                          reader.Name(),reader.Value())</pre><p>The output for the same input document reflects the attribute:</p><pre>0 1 doc 1 None-- 1 2 (a) [b]</pre><p>There are a couple of things to note on the attribute processing:</p><ul>  <li>Their depth is the one of the carrying element plus one.</li>  <li>Namespace declarations are seen as attributes, as in DOM.</li></ul><h2><a name="Validating">Validating a document</a></h2><p>Libxml2 implementation adds some extra features on top of the XmlTextReaderAPI. The main one is the ability to DTD validate the parsed documentprogressively. This is simply the activation of the associated feature of theparser used by the reader structure. There are a few options availabledefined as the enum xmlParserProperties in the libxml/xmlreader.h headerfile:</p><ul>  <li>XML_PARSER_LOADDTD: force loading the DTD (without validating)</li>  <li>XML_PARSER_DEFAULTATTRS: force attribute defaulting (this also imply    loading the DTD)</li>  <li>XML_PARSER_VALIDATE: activate DTD validation (this also imply loading    the DTD)</li>  <li>XML_PARSER_SUBST_ENTITIES: substitute entities on the fly, entity    reference nodes are not generated and are replaced by their expanded    content.</li>  <li>more settings might be added, those were the one available at the 2.5.0    release...</li></ul><p>The GetParserProp() and SetParserProp() methods can then be used to getand set the values of those parser properties of the reader. For example</p><pre>def parseAndValidate(file):    reader = libxml2.newTextReaderFilename(file)    reader.SetParserProp(libxml2.PARSER_VALIDATE, 1)    ret = reader.Read()    while ret == 1:        ret = reader.Read()    if ret != 0:        print "Error parsing and validating %s" % (file)</pre><p>This routine will parse and validate the file. Error messages can becaptured by registering an error handler. See python/tests/reader2.py formore complete Python examples. At the C level the equivalent call to cativatethe validation feature is just:</p><pre>ret = xmlTextReaderSetParserProp(reader, XML_PARSER_VALIDATE, 1)</pre><p>and a return value of 0 indicates success.</p><h2><a name="Entities">Entities substitution</a></h2><p>By default the xmlReader will report entities as such and not replace themwith their content. This default behaviour can however be overriden using:</p><p><code>reader.SetParserProp(libxml2.PARSER_SUBST_ENTITIES,1)</code></p><h2><a name="L1142">Relax-NG Validation</a></h2><p style="font-size: 10pt">Introduced in version 2.5.7</p><p>Libxml2 can now validate the document being read using the xmlReader usingRelax-NG schemas. While the Relax NG validator can't always work in astreamable mode, only subsets which cannot be reduced to regular expressionsneed to have their subtree expanded for validation. In practice it meansthat, unless the schemas for the top level element content is not expressableas a regexp, only chunk of the document needs to be parsed whilevalidating.</p><p>The steps to do so are:</p><ul>  <li>create a reader working on a document as usual</li>  <li>before any call to read associate it to a Relax NG schemas, either the    preparsed schemas or the URL to the schemas to use</li>  <li>errors will be reported the usual way, and the validity status can be    obtained using the IsValid() interface of the reader like for DTDs.</li></ul><p>Example, assuming the reader has already being created and that the schemastring contains the Relax-NG schemas:</p><pre><code>rngp = libxml2.relaxNGNewMemParserCtxt(schema, len(schema))<br>rngs = rngp.relaxNGParse()<br>reader.RelaxNGSetSchema(rngs)<br>ret = reader.Read()<br>while ret == 1:<br>    ret = reader.Read()<br>if ret != 0:<br>    print "Error parsing the document"<br>if reader.IsValid() != 1:<br>    print "Document failed to validate"</code><br></pre><p>See <code>reader6.py</code> in the sources or documentation for a completeexample.</p><h2><a name="Mixing">Mixing the reader and tree or XPath operations</a></h2><p style="font-size: 10pt">Introduced in version 2.5.7</p><p>While the reader is a streaming interface, its underlying implementationis based on the DOM builder of libxml2. As a result it is relatively simpleto mix operations based on both models under some constraints. To do so thereader has an Expand() operation allowing to grow the subtree under thecurrent node. It returns a pointer to a standard node which can bemanipulated in the usual ways. The node will get all its ancestors and thefull subtree available. Usual operations like XPath queries can be used onthat reduced view of the document. Here is an example extracted fromreader5.py in the sources which extract and prints the bibliography for the"Dragon" compiler book from the XML 1.0 recommendation:</p><pre>f = open('../../test/valid/REC-xml-19980210.xml')input = libxml2.inputBuffer(f)reader = input.newTextReader("REC")res=""while reader.Read():    while reader.Name() == 'bibl':        node = reader.Expand()            # expand the subtree        if node.xpathEval("@id = 'Aho'"): # use XPath on it            res = res + node.serialize()        if reader.Next() != 1:            # skip the subtree            break;</pre><p>Note, however that the node instance returned by the Expand() call is onlyvalid until the next Read() operation. The Expand() operation does notaffects the Read() ones, however usually once processed the full subtree isnot useful anymore, and the Next() operation allows to skip it completely andprocess to the successor or return 0 if the document end is reached.</p><p><a href="mailto:xml@gnome.org">Daniel Veillard</a></p><p>$Id: xmlreader.html 3320 2005-10-18 19:11:55Z veillard $</p><p></p></body></html>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -