⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 node377.html

📁 一本很好的python的说明书,适合对python感兴趣的人
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>17.1.6.2 Information Discovery</title>
<META NAME="description" CONTENT="17.1.6.2 Information Discovery">
<META NAME="keywords" CONTENT="lib">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="STYLESHEET" href="lib.css" tppabs="http://www.python.org/doc/current/lib/lib.css">
<LINK REL="previous" HREF="node376.html" tppabs="http://www.python.org/doc/current/lib/node376.html">
<LINK REL="up" href="AST_Examples.html" tppabs="http://www.python.org/doc/current/lib/AST_Examples.html">
<LINK REL="next" href="module-symbol.html" tppabs="http://www.python.org/doc/current/lib/module-symbol.html">
</head>
<body>
<DIV CLASS="navigation"><table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td><A HREF="node376.html" tppabs="http://www.python.org/doc/current/lib/node376.html"><img src="previous.gif" tppabs="http://www.python.org/doc/current/icons/previous.gif" border="0" height="32"
  alt="Previous Page" width="32"></A></td>
<td><A href="AST_Examples.html" tppabs="http://www.python.org/doc/current/lib/AST_Examples.html"><img src="up.gif" tppabs="http://www.python.org/doc/current/icons/up.gif" border="0" height="32"
  alt="Up One Level" width="32"></A></td>
<td><A href="module-symbol.html" tppabs="http://www.python.org/doc/current/lib/module-symbol.html"><img src="next.gif" tppabs="http://www.python.org/doc/current/icons/next.gif" border="0" height="32"
  alt="Next Page" width="32"></A></td>
<td align="center" width="100%">Python Library Reference</td>
<td><A href="contents.html" tppabs="http://www.python.org/doc/current/lib/contents.html"><img src="contents.gif" tppabs="http://www.python.org/doc/current/icons/contents.gif" border="0" height="32"
  alt="Contents" width="32"></A></td>
<td><a href="modindex.html" tppabs="http://www.python.org/doc/current/lib/modindex.html" title="Module Index"><img src="modules.gif" tppabs="http://www.python.org/doc/current/icons/modules.gif" border="0" height="32"
  alt="Module Index" width="32"></a></td>
<td><A href="genindex.html" tppabs="http://www.python.org/doc/current/lib/genindex.html"><img src="index.gif" tppabs="http://www.python.org/doc/current/icons/index.gif" border="0" height="32"
  alt="Index" width="32"></A></td>
</tr></table>
<b class="navlabel">Previous:</b> <a class="sectref" HREF="node376.html" tppabs="http://www.python.org/doc/current/lib/node376.html">17.1.6.1 Emulation of compile()</A>
<b class="navlabel">Up:</b> <a class="sectref" href="AST_Examples.html" tppabs="http://www.python.org/doc/current/lib/AST_Examples.html">17.1.6 Examples</A>
<b class="navlabel">Next:</b> <a class="sectref" href="module-symbol.html" tppabs="http://www.python.org/doc/current/lib/module-symbol.html">17.2 symbol  </A>
<br><hr></DIV>
<!--End of Navigation Panel-->

<H3><A NAME="SECTION0019162000000000000000">
17.1.6.2 Information Discovery</A>
</H3>

<P>
Some applications benefit from direct access to the parse tree.  The
remainder of this section demonstrates how the parse tree provides
access to module documentation defined in
docstrings without
requiring that the code being examined be loaded into a running
interpreter via <tt class="keyword">import</tt>.  This can be very useful for
performing analyses of untrusted code.

<P>
Generally, the example will demonstrate how the parse tree may be
traversed to distill interesting information.  Two functions and a set
of classes are developed which provide programmatic access to high
level function and class definitions provided by a module.  The
classes extract information from the parse tree and provide access to
the information at a useful semantic level, one function provides a
simple low-level pattern matching capability, and the other function
defines a high-level interface to the classes by handling file
operations on behalf of the caller.  All source files mentioned here
which are not part of the Python installation are located in the
<span class="file">Demo/parser/</span> directory of the distribution.

<P>
The dynamic nature of Python allows the programmer a great deal of
flexibility, but most modules need only a limited measure of this when
defining classes, functions, and methods.  In this example, the only
definitions that will be considered are those which are defined in the
top level of their context, e.g., a function defined by a <tt class="keyword">def</tt>
statement at column zero of a module, but not a function defined
within a branch of an <tt class="keyword">if</tt> ... <tt class="keyword">else</tt> construct, though
there are some good reasons for doing so in some situations.  Nesting
of definitions will be handled by the code developed in the example.

<P>
To construct the upper-level extraction methods, we need to know what
the parse tree structure looks like and how much of it we actually
need to be concerned about.  Python uses a moderately deep parse tree
so there are a large number of intermediate nodes.  It is important to
read and understand the formal grammar used by Python.  This is
specified in the file <span class="file">Grammar/Grammar</span> in the distribution.
Consider the simplest case of interest when searching for docstrings:
a module consisting of a docstring and nothing else.  (See file
<span class="file">docstring.py</span>.)

<P>
<dl><dd><pre class="verbatim">
"""Some documentation.
"""
</pre></dl>

<P>
Using the interpreter to take a look at the parse tree, we find a
bewildering mass of numbers and parentheses, with the documentation
buried deep in nested tuples.

<P>
<dl><dd><pre class="verbatim">
&gt;&gt;&gt; import parser
&gt;&gt;&gt; import pprint
&gt;&gt;&gt; ast = parser.suite(open('docstring.py').read())
&gt;&gt;&gt; tup = ast.totuple()
&gt;&gt;&gt; pprint.pprint(tup)
(257,
 (264,
  (265,
   (266,
    (267,
     (307,
      (287,
       (288,
        (289,
         (290,
          (292,
           (293,
            (294,
             (295,
              (296,
               (297,
                (298,
                 (299,
                  (300, (3, '"""Some documentation.\012"""'))))))))))))))))),
   (4, ''))),
 (4, ''),
 (0, ''))
</pre></dl>

<P>
The numbers at the first element of each node in the tree are the node
types; they map directly to terminal and non-terminal symbols in the
grammar.  Unfortunately, they are represented as integers in the
internal representation, and the Python structures generated do not
change that.  However, the <tt class='module'><a href="module-symbol.html" tppabs="http://www.python.org/doc/current/lib/module-symbol.html">symbol</a></tt> and <tt class='module'><a href="module-token.html" tppabs="http://www.python.org/doc/current/lib/module-token.html">token</a></tt> modules
provide symbolic names for the node types and dictionaries which map
from the integers to the symbolic names for the node types.

<P>
In the output presented above, the outermost tuple contains four
elements: the integer <code>257</code> and three additional tuples.  Node
type <code>257</code> has the symbolic name <tt class="constant">file_input</tt>.  Each of
these inner tuples contains an integer as the first element; these
integers, <code>264</code>, <code>4</code>, and <code>0</code>, represent the node types
<tt class="constant">stmt</tt>, <tt class="constant">NEWLINE</tt>, and <tt class="constant">ENDMARKER</tt>,
respectively.
Note that these values may change depending on the version of Python
you are using; consult <span class="file">symbol.py</span> and <span class="file">token.py</span> for
details of the mapping.  It should be fairly clear that the outermost
node is related primarily to the input source rather than the contents
of the file, and may be disregarded for the moment.  The <tt class="constant">stmt</tt>
node is much more interesting.  In particular, all docstrings are
found in subtrees which are formed exactly as this node is formed,
with the only difference being the string itself.  The association
between the docstring in a similar tree and the defined entity (class,
function, or module) which it describes is given by the position of
the docstring subtree within the tree defining the described
structure.

<P>
By replacing the actual docstring with something to signify a variable
component of the tree, we allow a simple pattern matching approach to
check any given subtree for equivalence to the general pattern for
docstrings.  Since the example demonstrates information extraction, we
can safely require that the tree be in tuple form rather than list
form, allowing a simple variable representation to be
<code>['variable_name']</code>.  A simple recursive function can implement
the pattern matching, returning a boolean and a dictionary of variable
name to value mappings.  (See file <span class="file">example.py</span>.)

<P>
<dl><dd><pre class="verbatim">
from types import ListType, TupleType

def match(pattern, data, vars=None):
    if vars is None:
        vars = {}
    if type(pattern) is ListType:
        vars[pattern[0]] = data
        return 1, vars
    if type(pattern) is not TupleType:
        return (pattern == data), vars
    if len(data) != len(pattern):
        return 0, vars
    for pattern, data in map(None, pattern, data):
        same, vars = match(pattern, data, vars)
        if not same:
            break
    return same, vars
</pre></dl>

<P>
Using this simple representation for syntactic variables and the symbolic
node types, the pattern for the candidate docstring subtrees becomes
fairly readable.  (See file <span class="file">example.py</span>.)

<P>
<dl><dd><pre class="verbatim">
import symbol
import token

DOCSTRING_STMT_PATTERN = (
    symbol.stmt,
    (symbol.simple_stmt,
     (symbol.small_stmt,
      (symbol.expr_stmt,
       (symbol.testlist,
        (symbol.test,
         (symbol.and_test,
          (symbol.not_test,
           (symbol.comparison,
            (symbol.expr,
             (symbol.xor_expr,
              (symbol.and_expr,
               (symbol.shift_expr,
                (symbol.arith_expr,
                 (symbol.term,
                  (symbol.factor,
                   (symbol.power,
                    (symbol.atom,
                     (token.STRING, ['docstring'])
                     )))))))))))))))),
     (token.NEWLINE, '')
     ))
</pre></dl>

<P>
Using the <tt class="function">match()</tt> function with this pattern, extracting the

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -