📄 pymeldlite.py

📁 用python实现的邮件过滤器
💻 PY
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
r"""Lets you manipulate XML/XHTML using a Pythonic object model.  `PyMeldLite`is a single Python module, _PyMeldLite.py_.    It works with all versions ofPython from 2.2 upwards.  It is a restricted version of PyMeld (seehttp://www.entrian.com/PyMeld) - PyMeldLite supports only well-formed XMLwith no namespaces, whereas PyMeld supports virtually all XML or HTMLdocuments.  PyMeldLite is released under the PSF license whereas PyMeld isreleased under the Sleepycat License.  PyMeld and PyMeldLite support thesame API.*Features:* o Allows program logic and HTML to be completely separated - a graphical   designer can design the HTML in a visual XHTML editor, without needing to   deal with any non-standard syntax or non-standard attribute names.  The   program code knows nothing about XML or HTML - it just deals with objects   and attributes like any other piece of Python code. o Designed with common HTML-application programming tasks in mind.   Populating an HTML form with a record from a database is a one-liner   (using the `%` operator - see below).  Building an HTML table from a set   of records is just as easy, as shown in the example below. o Does nothing but maniplating HTML/XML, hence fits in with any other Web   toolkits you're using. o Tracebacks always point to the right place - many Python/HTML mixing   systems use exec or eval, making bugs hard to track down.*Quick overview*A `PyMeldLite.Meld` object represents an XML document, or a piece of one.All the elements in a document with `id=name` attributes are made availableby a Meld object as `object.name`.  The attributes of elements are availablein the same way.  A brief example is worth a thousand words:>>> from PyMeldLite import Meld>>> xhtml = '''<html><body>... <textarea id="message" rows="2" wrap="off">Type your message.</textarea>... </body></html>'''>>> page = Meld(xhtml)                # Create a Meld object from XHTML.>>> print page.message                # Access an element within the document.<textarea id="message" rows="2" wrap="off">Type your message.</textarea>>>> print page.message.rows           # Access an attribute of an element.2>>> page.message = "New message."     # Change the content of an element.>>> page.message.rows = 4             # Change an attribute value.>>> del page.message.wrap             # Delete an attribute.>>> print page                        # Print the resulting page.<html><body><textarea id="message" rows="4">New message.</textarea></body></html>So the program logic and the HTML are completely separated - a graphicaldesigner can design the HTML in a visual XHTML editor, without needing todeal with any non-standard syntax or non-standard attribute names.  Theprogram code knows nothing about XML or HTML - it just deals with objects andattributes like any other piece of Python code.  Populating an HTML form witha record from a database is a one-liner (using the `%` operator - see below).Building an HTML table from a set of records is just as easy, as shown in theexample below:*Real-world example:*Here's a data-driven example populating a table from a data source, basing thetable on sample data put in by the page designer.  Note that in the real worldthe HTML would normally be a larger page read from an external file, keepingthe data and presentation separate, and the data would come from an externalsource like an RDBMS.  The HTML could be full of styles, images, anything youlike and it would all work just the same.>>> xhtml = '''<html><table id="people">... <tr id="header"><th>Name</th><th>Age</th></tr>... <tr id="row"><td id="name">Example name</td><td id="age">21</td></tr>... </table></html>... '''>>> doc = Meld(xhtml)>>> templateRow = doc.row.clone()  # Take a copy of the template row, then>>> del doc.row                    # delete it to make way for the real rows.>>> for name, age in [("Richie", 30), ("Dave", 39), ("John", 78)]:...      newRow = templateRow.clone()...      newRow.name = name...      newRow.age = age...      doc.people += newRow>>> print re.sub(r'</tr>\s*', '</tr>\n', str(doc))  # Prettify the output<html><table id="people"><tr id="header"><th>Name</th><th>Age</th></tr><tr id="row"><td id="name">Richie</td><td id="age">30</td></tr><tr id="row"><td id="name">Dave</td><td id="age">39</td></tr><tr id="row"><td id="name">John</td><td id="age">78</td></tr></table></html>Note that if you were going to subsequently manipulate the table, usingPyMeldLite or JavaScript for instance, you'd need to rename each `row`,`name` and `age` element to have a unique name - you can do that by assigningto the `id` attribute but I've skipped that to make the example simpler.As the example shows, the `+=` operator appends content to an element -appending `<tr>`s to a `<table>` in this case.*Shortcut: the % operator*Using the `object.id = value` syntax for every operation can get tedious, sothere are shortcuts you can take using the `%` operator.  This works just likethe built-in `%` operator for strings.  The example above could have beenwritten like this:>>> for name, age in [("Richie", 30), ("Dave", 39), ("John", 78)]:...      doc.people += templateRow % (name, age)The `%` operator, given a single value or a sequence, assigns values toelements with `id`s in the order that they appear, just like the `%` operatorfor strings.  Note that there's no need to call `clone()` when you're using`%`, as it automatically returns a modified clone (again, just like `%` doesfor strings).  You can also use a dictionary:>>> print templateRow % {'name': 'Frances', 'age': 39}<tr id="row"><td id="name">Frances</td><td id="age">39</td></tr>That's really useful when you have a large number of data items - forexample, populating an HTML form with a record from an RDBMS becomes aone-liner.*Element content*When you refer to a named element in a document, you get a Meld objectrepresenting that whole element:>>> page = Meld('<html><span id="x">Hello world</span></html>')>>> print page.x<span id="x">Hello world</span>If you just want to get the content of the element as string, use the`_content` attribute:>>> print page.x._contentHello worldYou can also assign to `_content`, though that's directly equivalent toassigning to the tag itself:>>> page.x._content = "Hello again">>> print page<html><span id="x">Hello again</span></html>>>> page.x = "Goodbye">>> print page<html><span id="x">Goodbye</span></html>The only time that you need to assign to `_content` is when you've taken areference to an element within a document:>>> x = page.x>>> x._content = "I'm back">>> print page<html><span id="x">I'm back</span></html>Saying `x = "I'm back"` would simply re-bind `x` to the string `"I'm back"`without affecting the document.*Non-self-closing tags*Some web browsers don't cope with self-closing tags (eg. `<x/>`) in XHTML.For instance, some versions of Internet Explorer don't understand`<textarea/>` to be the equivalent of `<textarea></textarea>`, and interpretthe rest of the document as the content of the `textarea`.  For this reason,PyMeldLite has a module-level attribute called `nonSelfClose`, which is adictionary whose keys are the names of the tags that shouldn't beself-closing.  `textarea` is the only such tag by default.>>> page = Meld('''<html><textarea name='spam'/></html>''')>>> print page<html><textarea name="spam"></textarea></html>*Legal information:*_PyMeldLite.py_ is released under the terms of the Python Software FoundationLicense - see http://www.python.org/"""# PyMeldLite is released under the terms of the Python Software Foundation# License - see http://www.python.org/  It is a restricted version of PyMeld;# see http://www.entrian.com/__version__ = "1.0"__author__ = "Richie Hindle <richie@entrian.com>"# Entrian.Coverage: Pragma Stopimport sys, re, stringtry:    True, False, boolexcept NameError:    True = 1    False = 0    def bool(x):        return not not x# Entrian.Coverage: Pragma Startclass _Fail:    """Unambiguously identifies failed attribute lookups."""    pass_fail = _Fail()# Non-self-closing tags; see the module documentation.nonSelfClose = {'textarea': None}# Map high characters to charrefs.def replaceHighCharacters(match):    return "&#%d;" % ord(match.group(1))# Map meaningless low characters to '?'badxml_chars = ''.join([chr(c) for c in range(0, 32) if c not in [9, 10, 13]])badxml_map = string.maketrans(badxml_chars, '?' * len(badxml_chars))############################################################################### Super-lightweight DOM-like tree.#### The externally-visible `Meld` class is just a thin wrapper around a## lightweight DOM-like tree.  The classes `_Node`, `_RootNode`,## `_ElementNode` and `_TextNode` implement the tree, and `_TreeGenerator`## generates it from XML source.  When you do something like `page.field`,## you are given a new `Meld` instance that refers to the underlying tree.##class _Node:    """Represents a node in a document, with a dictionary of `attributes`    and a list of `children`."""    def __init__(self, attributes=None):        """Constructs a `_Node`.  Don't call this directly - `_Node` is only        ever used as a base class."""        if attributes is None:            self.attributes = {}        else:            self.attributes = attributes        self.children = []    def getElementNode(self):        """Returns the `_ElementNode` for a node.  See the `_RootNode`        documentation for why this exists.  You should always go via this        when you want the children or attributes of an element that might        be the root element of a tree."""        # _RootNode overrides this.        return self    def _cloneChildren(self, newNode):        """Populates the given node with clones of `self`'s children."""        for child in self.children:            newNode.children.append(child.clone(newNode))    def childrenToText(self):        """Asks the children to recursively textify themselves, then returns        the resulting text."""        text = []        for child in self.children:            text.append(child.toText())        return ''.join(text)class _RootNode(_Node):    """The root of a tree.  The root is always a `_RootNode` rather than the    top-level `_ElementNode` because there may be things like a `<?xml?>`    declaration outside the root element.  This is why `getElementNode()`    exists."""    def __init__(self, attributes=None):        """Constructs a `_RootNode`, optionally with a dictionary of        `attributes`."""        _Node.__init__(self, attributes)    def getElementNode(self):        """See `_Node.getElementNode()`."""        for child in self.children:            if isinstance(child, _ElementNode):                return child    def clone(self):        """Creates a deep clone of a node."""        newNode = _RootNode(self.attributes)        self._cloneChildren(newNode)        return newNode    def toText(self):        """Generates the XML source for the node."""        # A _RootNode has no text of its own.        return self.childrenToText()class _ElementNode(_Node):    """A node representing an element in a document, with a `parent`, a `tag`,    a dictionary of `attributes` and a list of `children`."""    def __init__(self, parent, tag, attributes):        """Constructs an `_ElementNode`, optionally with a dictionary of        `attributes`."""        _Node.__init__(self, attributes)        self.parent = parent        self.tag = tag    def clone(self, parent=None):        """Creates a deep clone of a node."""        newNode = _ElementNode(parent, self.tag, self.attributes.copy())        self._cloneChildren(newNode)        return newNode    def toText(self):        """Generates the XML source for the node."""        text = ['<%s' % self.tag]        attributes = self.attributes.items()        attributes.sort()        for attribute, value in attributes:            text.append(' %s="%s"' % (attribute, value))        childText = self.childrenToText()        if childText or nonSelfClose.has_key(self.tag):            text.append('>')            text.append(childText)            text.append('</%s>' % self.tag)        else:            text.append('/>')        return ''.join(text)class _TextNode(_Node):    """A tree node representing a piece of text rather than an element."""    def __init__(self, text):        """Constructs a `_TextNode`."""        _Node.__init__(self)        self._text = text    def clone(self, parent=None):        """Creates a deep clone of a node."""        return _TextNode(self._text)    def toText(self):        """Returns the text."""        return self._text# For XML parsing we use xmllib in versions prior to 2.3, because we can't# be sure that expat will be there, or that it will be a decent version.# We use expat in versions 2.3 and above, because we can be sure it will# be there and xmllib is deprecated from 2.3.# The slightly odd Entrian.Coverage pragmas in this section make sure that# whichever branch is taken, we get code coverage for that branch and no# coverage failures for the other.if sys.hexversion >> 16 < 0x203:    # Entrian.Coverage: Pragma Stop    import xmllib    class _TreeGenerator(xmllib.XMLParser):        # Entrian.Coverage: Pragma Start        """An XML parser that generates a lightweight DOM tree.  Call `feed()`        with XML source, then `close()`, then `getTree()` will give you the        tree's `_RootNode`:        >>> g = _TreeGenerator()        >>> g.feed("<xml>Stuff. ")        >>> g.feed("More stuff.</xml>")        >>> g.close()        >>> tree = g.getTree()        >>> print tree.toText()        <xml>Stuff. More stuff.</xml>        """        def __init__(self):            xmllib.XMLParser.__init__(self,                                      translate_attribute_references=False)            self.entitydefs = {}    # This is an xmllib.XMLParser attribute.            self._tree = _RootNode()            self._currentNode = self._tree            self._pendingText = []        def getTree(self):            """Returns the generated tree; call `feed` then `close` first."""            return self._tree        def _collapsePendingText(self):            """Text (any content that isn't an open/close element) is built up            in `self._pendingText` until an open/close element is seen, at            which point it gets collapsed into a `_TextNode`."""            data = ''.join(self._pendingText)            self._currentNode.children.append(_TextNode(data))            self._pendingText = []        def handle_xml(self, encoding, standalone):            xml = '<?xml version="1.0"'            if encoding:                xml += ' encoding="%s"' % encoding            if standalone:                xml += ' standalone="%s"' % standalone            xml += '?>'            self._pendingText.append(xml)        def handle_doctype(self, tag, pubid, syslit, data):            doctype = '<!DOCTYPE %s' % tag            if pubid:                doctype += ' PUBLIC "%s"' % pubid            elif syslit:                doctype += ' SYSTEM'            if syslit:
12 3 下一页
💿 文件大小 1791 K
👤 上传用户 guigong
📂 所属分类数学计算
🏷️ 相关标签

#python #邮件过滤
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -