📄 xml-python.txt
字号:
<directions> Combine all ingredients and hand whisk to desired consistency. Serve and enjoy with ice-cold beers.</directions>
</recipe>
An even easier way to accomplish this would be to use some special methods of the
Document object like this:
import sys
from xml.dom.ext import PrettyPrint
from xml.dom import minidom
doc = minidom.parse(open(sys.argv[1]))
for n in doc.getElementsByTagName("ingredients"):
n.tagName = "ingredientlist"
for n in doc.getElementsByTagName("item"):
n.tagName = "ingredient"
attr = n.getAttribute("num")
n.setAttribute("quantity",attr)
n.removeAttribute("num")
PrettyPrint(doc)
In all, the DOM interface contains more than a hundred
functions--mostly concerned with straightforward tree manipulation and
data access. The details of the interface can be found in the Python
library documentation for the xml.dom module. Since DOM is a
standardized interface, even more information can be obtained in the
Document Object Model (DOM) Level-2 Specification available at
http://www.w3.org/TR/DOM-Level-2-Core.
Other Python XML Packages
-------------------------
Pyxie:
Pyxie is an XML processing package developed by Sean McGrath that is
described in his book "XML Processing with Python", (Prentice Hall
PTR, ISBN 0-13-021119-2). Details are available at
http://pyxie.sourceforge.net.
Pyxie works by converting XML documents to a simplified tree encoding
format known as PYX. A PYX representation of our earlier example
(generated using the Pyxie xmln tool) looks like this:
(recipe
-\n
-
(title
-
-\n
- Famous Guacamole
-\n
-
)title
-\n
-
(description
-\n
- A southwest favorite!
-\n
-
)description
-\n
-
(ingredients
-\n
-
(item
Anum 4
Aunits none
- Large avocados, chopped
)item
-\n
-
(item
Anum 1
Aunits none
- Tomato, chopped
)item
-\n
-
(item
Anum 1/2
Aunits C
- White onion, chopped
)item
-\n
-
(item
Anum 2
Aunits tbl
- Fresh squeezed lemon juice
)item
-\n
-
(item
Anum 1
Aunits none
- Jalape
-帽
-o pepper, diced
)item
-\n
-
(item
Anum 1
Aunits tbl
- Fresh cilantro, minced
)item
-\n
-
(item
Anum 1
Aunits tbl
- Garlic, minced
)item
-\n
-
(item
Anum 3
Aunits tsp
- Salt
)item
-\n
-
(item
Anum 12
Aunits bottles
- Ice-cold beer
)item
-\n
-
)ingredients
-\n
-
(directions
-\n
- Combine all ingredients and hand whisk to desired consistency.
-\n
- Serve and enjoy with ice-cold beers.
-\n
-
)directions
-\n
)recipe
One features of the PYX format is that different document features are
easy to parse using simple line-oriented parsers. The first character
in each line determines the content: a '(' indicates a start element,
an 'A' indicates an attribute, a ')' indicates a closing element, and
a '-' indicates text.
In this simplified line-oriented format, it is relatively easy to
implement parsing tools without using any of the more sophisticated
XML parsing modules. It is also possible to write parsers and utilities
that process data in a style not easily incorporated in SAX or DOM. For
example, the following script reads a recipe in PYX format and extracts
the ingredient list:
# PYX example
f = open("guac.pyx")
# Read an item, return as a (text,attributes) tuple
def read_item(f):
L = f.readline()
attrs = { }
text = []
while L:
if L[0] == 'A':
i = L.index(' ')
attrs[L[1:i]] = L[i+1:-1]
elif L[0] == '-':
text.append(L[1:-1])
elif L[0] == ')':
break
L = f.readline()
return "".join(text), attrs
# Look for ingredients and print them out
L = f.readline()
while L:
if L[0] == '(':
if L[1:-1] == 'ingredients':
L = f.readline()
# Look for items list
while L:
if L[0] == '(':
if L[1:-1] == 'item':
text, attrs = read_item(f)
print attrs, text
elif L[0] == ')':
if L[1:-1] == 'ingredients': break
L = f.readline()
break
L = f.readline()
f.close()
The output of this module looks like the following:
{'num': '4', 'units': 'none'} Large avocados, chopped
{'num': '1', 'units': 'none'} Tomato, chopped
{'num': '1/2', 'units': 'C'} White onion, chopped
{'num': '2', 'units': 'tbl'} Fresh squeezed lemon juice
{'num': '1', 'units': 'none'} Jalapeno pepper, diced
{'num': '1', 'units': 'tbl'} Fresh cilantro, minced
{'num': '1', 'units': 'tbl'} Garlic, minced
{'num': '3', 'units': 'tsp'} Salt
{'num': '12', 'units': 'bottles'} Ice-cold beer
The pyxie module also provides a number of utilities
for both event driven and tree-based document processing as well as
hooks to SAX and DOM. In addition, the PYX format is relatively easy
to generate so a program might use it as an intermediate representation
(i.e., a program could generate PYX and have that converted to XML using
a special utility).
4Suite:
4Suite is a collection of open source XML processing applications developed by
Fourthought, Inc. Portions of the 4Suite package are included in the standard
PyXML distribution. However, more advanced capabilities must be downloaded
separately at http://www.fourthought.com. The package currently includes
support for the following XML related technologies:
XPath
XPointer
XSLT
XLink
RDF
In addition, the package includes some support for XML-database
integration. Full coverage of the complete 4Suite package is
impossible here. However, some of the more interesting components are
the XPath and XSLT modules.
The xml.xpath module can be used to easily select different parts of
an XML document given a DOM tree using XPath specifiers. For
instance, suppose you wanted to select all of the ingredients from a
recipe. Here is an really easy way to do it:
import sys
from xml.dom import minidom
from xml import xpath
# Read document into a DOM tree
doc = minidom.parse(open(sys.argv[1]))
# Select all of the items
result = xpath.Evaluate('/recipe/ingredients/item',doc)
print result
The result is simply a list of DOM nodes:
[<DOM Element: item at 136487164>, <DOM Element: item at 136557964>,
<DOM Element: item at 136461572>, <DOM Element: item at 136589316>,
<DOM Element: item at 136592892>, <DOM Element: item at 136597260>,
<DOM Element: item at 136600804>, <DOM Element: item at 136604348>,
<DOM Element: item at 136607836>]
To extract information, one would then walk the list and use the
standard DOM API for retrieving information. (need example??)
The XPath specification allows for wildcards and more advanced pattern matching
similar to regular expressions. For example, if you wanted to extract
an ingredient list, but weren't sure about the surrounding context, you might
do this:
result = xpath.Evaluate("/*/ingredients/item",doc)
If you wanted to select all items that didn't have any units
specified, you might use this:
result = xpath.Evaluate("/recipe/ingredients/item[@units='none']",doc)
The xml.xslt module provides support for transforming XML documents
using XSLT (eXensible Stylesheet Language Transformations). XSLT is a
XML-based language that is used to transform XML documents into other
formats such as HTML. For example, an XSLT stylesheet for our recipe
might look something like this:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/recipe">
<HTML>
<HEAD>
<TITLE>
<xsl:value-of select="title"/>
</TITLE>
</HEAD>
<BODY>
<xsl:apply-templates/>
</BODY>
</HTML>
</xsl:template>
<xsl:template match="/recipe/title">
<H1>
<xsl:apply-templates/>
</H1>
</xsl:template>
<xsl:template match="/recipe/description">
<P>
<xsl:apply-templates/>
</P>
</xsl:template>
<xsl:template match="/recipe/ingredients">
<H3>Ingredients</H3>
<UL>
<xsl:apply-templates/>
</UL>
</xsl:template>
<xsl:template match="/recipe/ingredients/item[@units='none']" xml:space="preserve">
<LI>
<xsl:value-of select="@num"/> <xsl:apply-templates/>
</LI>
</xsl:template>
<xsl:template match="/recipe/ingredients/item[@units!='none']" xml:space="preserve">
<LI>
<xsl:value-of select="@num"/> <xsl:value-of select="@units"/><xsl:apply-templates/>
</LI>
</xsl:template>
<xsl:template match="/recipe/directions">
<H3>Directions</H3>
<P>
<xsl:apply-templates/>
</P>
</xsl:template>
</xsl:stylesheet>
To use this stylesheet, you often put a stylesheet specifier in a XML documents.
For example:
<?xml version="1.0" encoding="iso-8859-1"?>
<?xml-stylesheet type="text/xml" href="recipe.xsl"?>
<recipe>
...
</recipe>
4Suite includes a tool 4xslt that can be used to process documents through an XSLT
processor. For example:
$ 4xslt guac.xml recipe.xsl
<HTML>
<HEAD>
<meta http-equiv='Content-Type' content='text/html; charset=iso-8859-1'>
<TITLE>
Famous Guacamole
</TITLE>
</HEAD>
<BODY>
<H1>
Famous Guacamole
</H1>
<P>
A southwest favorite!
</P>
<H3>Ingredients</H3>
<UL>
<LI>4 Large avocados, chopped
</LI>
<LI>1 Tomato, chopped
</LI>
<LI>1/2 C White onion, chopped
</LI>
<LI>2 tbl Fresh squeezed lemon juice
</LI>
<LI>1 Jalapeño pepper, diced
</LI>
<LI>1 tbl Fresh cilantro, minced
</LI>
<LI>1 tbl Garlic, minced
</LI>
<LI>3 tsp Salt
</LI>
<LI>12 bottles Ice-cold beer
</LI>
</UL>
<H3>Directions</H3>
<P>
Combine all ingredients and hand whisk to desired consistency.
Serve and enjoy with ice-cold beers.
</P>
</BODY>
</HTML>
In addition, a xslt module is available that can be used to gain
more control over XSLT processing.
It should be noted that although some browsers can use XSLT
stylesheets to render XML documents into HTML, this is a rather recent
development. If you care about portability, you might not want to
make such assumptions about the client's browser.
Also, even though XSLT is a relatively simple way to specify document
formatting, it is somewhat slow and requires entire XML documents to
be loaded into a DOM tree in order to operate. If you need to turn
XML into HTML quickly, it may be faster to write some special purpose
code using the SAX API or a low-level parser such as Expat.
Miscellaneous
-------------
Old python modules? (xmllib, pyexpat)
xmlrpc?
soap?
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -