📄 xml-sax.txt

📁 python web programming 部分
💻 TXT
📖 第 1 页 / 共 2 页
字号:
上一页 12
characters:  u'   Famous Guacamole'
characters:  u'\n'
characters:  u'   '
endElement (19,3,None,guac.xml):  title
characters:  u'\n'
characters:  u'   '

It should be noted that these functions are primarily intended for 
error recovery and reporting.   Since values might be approximate,
the returned locations should not be used as the basis for editing
the original document.

c.startDocument()
c.endDocument()

The startDocument() and endDocument() methods are called to indicate
the start and end of the document.  They take no arguments.

c.startPrefixMapping(prefix,uri)
c.endPrefixMapping(prefix)

The startPrefixMapping() and endPrefixMapping() methods are used if
you want to do your own tracking of XML namespaces.
feature_namespaces must be enabled for this to work.  If
you have a namespace declaration like this,

   <foo:bar  xmlns:foo="http://dead.com/foo">
   ...
   </foo:bar>

the following calls are made

   startPrefixMapping("foo","http://dead.com/foo")
   ...
   endPrefixMapping("foo")

Normally it is not necessary to define these functions since XML parsers
already know how to deal with namespaces. 

c.startElement(name,attrs)
c.endElement(name)

The startElement() and endElement() methods are used to handle the start
and end of XML elements.  name is the element name and attrs is a special
AttributesImpl object that supports the following methods:

    attrs.getLength()          - Return number of attributes
    attrs.getNames()           - Return attribute names
    attrs.getType(name)        - Return the type of attribute name 
                                 (usually 'CDATA')
    attrs.getValue(name)       - Return value of attribute name

The above methods are only invoked if a document is being processed
without namespace support.  If namespaces are enabled, the following
methods are used instead:

c.startElementNS(name,qname,attrs)
c.endElementNS(name,qname)

The startElementNS() and endElementNS() methods are used to handle
elements when XML namespace handling is enabled (feature_namespaces).
In this case, name is a tuple (uri,localname) and qname is the fully
qualified element name (which is usually None unless
feature_namespace_prefixes has been enabled).  attrs is a special
AttributesNSImpl object that supports the normal Attribute methods 
(listed) above in addition to the following:

    attrs.getValueByQName(qname)    - Return value for qualified name
    attrs.getNameByQName(qname)     - Return (namespace,localname) pair
                                     for a name.
    attrs.getQNameByName(name)      - Return qualified name for a pair
                                      (namespace,localname)
    attrs.getQNames()               - Return qualified names of all
                                      attributes

Here is a small example:

class SimpleHandler(saxutils.DefaultHandler):
    def startElementNS(self,name,qname,attrs):
        print 'startElementNS: ', name,qname
        for n in attrs.getNames():
            print "  attribute :", n, attrs.getValue(n)
            
    def endElementNS(self,name,qname):
        print 'endElementNS: ', name, qname


When run on the following input,

<?xml version="1.0"?>
<foo:bar xmlns:foo="http://dead.com">
   <foo:text foo:name="blah" type="whatever">
      Hi there
   </foo:text>
</foo:bar>

the following output is produced:

startElementNS:  (u'http://dead.com', u'bar') None
startElementNS:  (u'http://dead.com', u'text') None
  attribute : (u'http://dead.com', u'name') blah
  attribute : (None, u'type') whatever
endElementNS:  (u'http://dead.com', u'text') None
endElementNS:  (u'http://dead.com', u'bar') None


c.characters(content)

The characters() method is used to receive raw character data.

c.ignorableWhitespace(content)

The ignorableWhitespace() method is called to supply ignorable
whitespace in a document.  This is generally only invoked if the
underlying parser is running in validation mode (feature_validation 
is set).   The argument is all of the whitespace characters that
were ignored.

c.processingIntruction(target,data)

The processingInstruction() method is used to handle XML processing
instructions.  An XML processing instruction is enclosed in <? ... ?>.
For example:

   <?xml-stylesheet href="mystyle.css" type="text/css"?>

target is the type of instruction (e.g., "xml-stylesheet").  data
is the rest of the instruction up to the ending ?>.    This method
is not invoked for the initial <?xml ...?> declaration.

c.skippedEntity(name)

The skippedEntity() method is invoked for each skipped entity.
Typically, a non-validating parser may skip an entity if it
hasn't been declared in an external DTD or no other information
is known about it.   This behavior depends on the parser and the
settings of feature_external_ges and feature_external_pes.


DTDHandler
----------
The DTDHandler interface is used to process DTD declarations that
might be needed for parsing.  Typically, this only pertains to
unparsed entities and notation declarations.  For example, an
XML document might include declarations like this:

    <!DOCTYPE recipe [
       ...
       <!ELEMENT image EMPTY>
       <!ATTRLIST image name ENTITY #REQUIRED
                        alt  CDATA  #IMPLIED
       >
       <!NOTATION GIF SYSTEM "CompuServe Graphics Interchange Format 87a">
       <!ENTITY GuacImage SYSTEM "guac.gif" NDATA GIF>
       ...
    ]>
    
    <recipe>
        ...
        <image name="GuacImage"/>
        ...
    </recipe>

To capture the NOTATION and ENTITY information, you write a DTDHandler
like this:

    from xml.sax import handler
    class SimpleDTD(handler.DTDHandler):
         def notationDecl(self,name,publicid,systemid):
             print "notationDecl: ", name, publicid, systemid
         def unparsedEntityDecl(self,name,publicid,systemid,ndata):
             print "unparsedEntityDecl: ", name, publicid, systemid, ndata

    p = make_parser()
    ...
    p.setDTDHandler(SimpleDTD())
    ...
    p.parse(file)


When run on a sample file, this produces output like this:

   notationDecl:  GIF None CompuServe Graphics Interchange Format 87a
   unparsedEntityDecl:  GuacImage None guac.gif GIF

It is important to note that the notationDecl() and unparsedEntityDecl() 
methods are only invoked for the declarations inside the DTD.  They are
not used when an entity is actually encountered.  For example, in the
element:

   <image name="GuacImage"/>

It would be the responsibility of the ContentHandler to look at the
value of name and see if it corresponds to a known entity (this would
be done in a startElement() or startElementNS() method).  If so,
information received from the notationDecl() call can be used to
take appropriate action such as loading an image in some format or
processing the file in some manner.

EntityResolver
--------------
The EntityResolver class is used to intercept references to external
entities.   For example, consider the following XML:

    <!DOCTYPE ... [

        <!ENTITY commonheader SYSTEM "header.xml">
    
    ]>
    <document>
         &commonheader;
    </document>

In this case, the <!ENTITY> declaration creates an entity commonheader
that refers to a file "header.xml".  Later in the document,
&commonheader; is used to actually include the entity.  Normally, the
XML parser will try to load the corresponding header.xml file and
insert it into the input stream on its own.  If you want to change
this behavior, you can define your own EntityResolver class. For
example, this code catches references to 'header.xml' and changes
them to 'altheader.xml':

    from xml.sax import handler
    class SimpleEntity(handler.EntityResolver):
        def resolveEntity(self,publicid,systemid):
             print "resolveEntity: ",publicid,systemid

             if systemid == 'header.xml':
	         return 'altheader.xml'
             else:
                 return systemid

    p = make_parser()
    p.setEntityResolver(SimpleEntity())
    ...
    p.parse(file)

Similarly, one could use an EntityResolver to retrieve extenal entities
from a database or some other non-traditional location.

The EntityResolver interface only defines a single method resolveEntity()
as shown above.  There are several choices on the return value.  If
a simple string is returned, the underlying XML parser uses it as the
entity name and tries to resolve it on its own.  For instance, the above
example simply changed the name of 'header.xml' to 'altheader.xml' by
returning the string 'altheader.xml'.

Alternatively, it is possible to return an open file object. For example:

    import codecs
    class SimpleEntity(handler.EntityResolver):
        def resolveEntity(self,publicid,systemid):
             print "resolveEntity: ",publicid,systemid
             if systemid == 'header.xml':
	         return codecs.open("altheader.xml","r","utf-8")
             else:
                 return systemid


It is also possible to return an InputSource object.  InputSource is a
class defined in xml.sax.xmlreader that can be used to define alternative
input streams to the parser.  For example:

    from xml.sax import xmlreader

    class MyInputSource(xmlreader.InputSource):
        ...

    class SimpleEntity(handler.EntityResolver):
        def resolveEntity(self,publicid,systemid):
             print "resolveEntity: ",publicid,systemid
             if systemid == 'header.xml':
	         return MyInputSource("altheader.xml")
             else:
                 return systemid


ErrorHandler
------------

The ErrorHandler class is used to intercept parsing errors and warnings.
The class defines three methods:

error(exception)

This method is called when the parser encounters an recoverable error
such as an undeclared attribute (e.g., discovered by a validating XML
parser).  If this function discards the exception, parsing will
continue.  The only caveat is that document information may not be
delivered correctly after an error has occurred.  Therefore the only
reason for allowing the parser to continue is to report more error
messages.

fatalError(exception)

This method is invoked when the parser encounters an unrecoverable error
such as missing closing element tag or an unresolved entity. Parsing is
expected to terminate when the method returns.

warning(exception)

This method is used by the parser to report minor warning information.
Unless an exception is raised by this method, parsing will continue
normally after return.

Here is a simple example that shows how to set up an error handler:

from xml.sax import handler
class SimpleError(handler.ErrorHandler):
    def error(self,exception):
        print "error: ", exception

    def fatalError(self,exception):
        print "fatalError: ", exception
        raise exception

    def warning(self,exception):
        print "warning: ", exception

p = make_parser()
...
p.setErrorHandler(SimpleError())
...
p.parse(input)

As input, all of the ErrorHandler methods receive a SAXParseException
object.  If you want to propagate the error, a method can simply raise
the passed argument as shown in the fatalError() method above.

SAX Utilities
-------------
The xml.sax.saxutils module contains several utility functions and
classes for writing SAX processors.

escape(data [,entities])
    This function escapes special characters such as &, <, and > in 
    a string of data.  The optional entities parameter is a dictionary
    mapping strings to entities.  For example:

      >>> from xml.sax import saxutils 
      >>> saxutils.escape("<foo>")
      '&lt;foo&gt;'
      >>> saxutils.escape(u"Jalape\xf1o", { u'\xf1' : '&ntilde;'})
      u'Jalape&ntilde;o'

XMLGenerator([out [,encoding]])
     This class implements a ContentHandler that converts SAX events
     back into an XML document.   out is a file object and encoding
     is the encoding of the output stream (default is 'iso-8859-1').
     You might use this class if you wanted to rewrite parts of
     an XML document.  For example:

     import sys
     from xml.sax import saxutils
     from xml.sax import make_parser

     class Generator(saxutils.XMLGenerator):
         def startElement(self,name,attrs):
             if name == 'ingredients':
                  name = 'ingredientlist'
             saxutils.XMLGenerator.startElement(self,name,attrs)
         def endElement(self,name):
             if name == 'ingredients':
                  name = 'ingredientlist'
             saxutils.XMLGenerator.endElement(self,name)
     p = make_parser()
     p.setContentHandler(Generator())
     p.parse(sys.argv[1])

XMLFilterBase()
     This class is used to implement a filter that sits between
     the XMLReader object and the different handler classes.
     Normally it doesn't do anything but pass events on unmodified.
     However, an application might use this it it wanted to
     modify SAX event processing.

prepare_input_source(source [,base])
     This function is used to create an InputSource object ready
     for reading by an XML parser.  source is the name of an
     input source, a file object, or an InputSource object. base is
     an optional base URL.  The parse() method and other parts 
     of the XML package use this function to convert various types
     of input sources into an InputSource object that can be
     used by the low level XML parser.

For further information
-----------------------
A detailed reference covering the entire Python SAX API can be found
in the standard Python library documentation.  However, this
documentation includes no examples nor much of an explanation as to
how the interfaces are supposed to be used.  The Python/XML HOWTO
includes a few simple SAX examples, but does not include a full SAX
reference.  Details about the SAX specification itself can be found at
http://www.megginson.com/SAX.  In addition, almost every XML book
includes a small section on SAX--typically describing the SAX Java
API.  Fortunately, most of this also applies to Python.
上一页 12
💿 文件大小 1141 K
👤 上传用户 jill
📂 所属分类电子书籍
🏷️ 相关标签

#programming #python #web #分
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -