⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 chap1.txt

📁 很详细的Python文字处理教程
💻 TXT
📖 第 1 页 / 共 5 页
字号:
      #*------ Limit nesting depth of map()/filter() ------#
      intermed = filter(niceProperty, map(someTransform, lines))
      final = map(otherTransform, intermed)

  Any nesting of successive `filter()` or `map()` calls, however,
  can be reduced to single functions using the proper combinatorial
  HOFs. Therefore, the number of procedural steps needed is pretty
  much always quite small. However, the reduction in total
  lines-of-code is offset by the lines used for giving names to
  combinatorial functions. Overall, FP style code is usually about
  one-half the length of imperative style equivalents (fewer lines
  generally mean correspondingly fewer bugs).

  A nice feature of combinatorial functions is that they can
  provide a complete Boolean algebra for functions that have not
  been called yet (the use of `operator.add` and `operator.mul` in
  'combinatorial.py' is more than accidental, in that sense). For
  example, with a collection of simple values, you might express a
  (complex) relation of multiple truth values as, for example:

      #*---------- Simple Boolean algebra of values ----------#
      satisfied = (this or that) and (foo or bar)

  In the case of text processing on chunks of text, these truth
  values are often the results of predicative functions applied
  to a chunk, for example:

      #*---------- Boolean algebra of return values ----------#
      satisfied = (thisP(s) or thatP(s)) and (fooP(s) or barP(s))

  In an expression like the above one, several predicative
  functions are applied to the same string (or other object), and
  a set of logical relations on the results are evaluated. But
  this expression is itself a logical predicate of the string. For
  naming clarity--and especially if you wish to evaluate the same
  predicate more than once--it is convenient to create an actual
  function expressing the predicate:

      #*------ Boolean algebra of composed functions ------#
      satisfiedP = both(either(thisP,thatP), either(fooP,barP))

  Using a predicative function created with combinatorial
  techniques is the same as using any other function:

      #*------ Use of a compositional Boolean function ------#
      selected = filter(satisfiedP, lines)


  EXERCISE:  More on combinatorial functions
  --------------------------------------------------------------------

  The module 'combinatorial.py' presented above provides some of
  the most commonly useful combinatorial higher-order functions.
  But there is room for enhancement in the brief example. Creating
  a personal or organization library of useful HOFs is a way to
  improve the reusability of your current text processing
  libraries.

  QUESTIONS:

  1.  Some of the functions defined in 'combinatorial.py' are
      not, strictly speaking, combinatorial.  In a precise sense,
      a combinatorial function should take one or several
      functions as arguments and return one or more function
      objects that "combine" the input arguments.  Identify which
      functions are not "strictly" combinatorial, and determine
      exactly what type of thing each one -does- return.

  2.  The functions 'both()' and 'and_()' do almost the same
      thing.  But they differ in an important, albeit subtle, way.
      'and_()', like the Python operator `and`, uses -shortcutting-
      in its evaluation.  Consider these lines:

      >>> f = lambda n: n**2 > 10
      >>> g = lambda n: 100/n > 10
      >>> and_(f,g)(5)
      1
      >>> both(f,g)(5)
      1
      >>> and_(f,g)(0)
      0
      >>> both(f,g)(0)
      Traceback (most recent call last):
      ...

      The shortcutting 'and_()' can potentially allow the first
      function to act as a "guard" for the second one.  The second
      function never gets called if the first function returns a
      false value on a given argument.

      a. Create a similarly shortcutting combinatorial 'or_()'
         function for your library.

      b. Create general shortcutting functions 'shortcut_all()'
         and 'shortcut_some()' that behave similarly to the
         functions 'all()' and 'some()', respectively.

      c. Describe some situations where nonshortcutting
         combinatorial functions like 'both()', 'all()', or
         'anyof3()' are more desirable than similar shortcutting
         functions.

  3.  The function 'ident()' would appear to be pointless, since
      it simply returns whatever value is passed to it.  In truth,
      'ident()' is an almost indispensable function for a
      combinatorial collection.  Explain the significance of
      'ident()'.

      Hint: Suppose you have a list of lines of text, where some
      of the lines may be empty strings.  What filter can you
      apply to find all the lines that start with a '#'?

  4.  The function 'not_()' might make a nice addition to a
      combinatorial library.  We could define this function as:

      >>> not_ = lambda f: lambda x, f=f: not f(x)

      Explore some situations where a 'not_()' function would aid
      combinatoric programming.

  5.  The function 'apply_each()' is used in 'combinatorial.py'
      to build some other functions.  But the utility of
      'apply_each()' is more general than its supporting role
      might suggest.  A trivial usage of 'apply_each()' might
      look something like:

      >>> apply_each(map(adder_factory, range(5)),(10,))
      [10, 11, 12, 13, 14]

      Explore some situations where 'apply_each()' simplifies
      applying multiple operations to a chunk of text.

  6.  Unlike the functions 'all()' and 'some()', the functions
      'compose()' and 'compose3()' take a fixed number of input
      functions as arguments.  Create a generalized composition
      function that takes a list of input functions, of any
      length, as an argument.

  7.  What other combinatorial higher-order functions that have
      not been discussed here are likely to prove useful in text
      processing? Consider other ways of combining first-order
      functions into useful operations, and add these to your
      library.  What are good names for these enhanced HOFs?


  TOPIC -- Specializing Python Datatypes
  --------------------------------------------------------------------

  Python comes with an excellent collection of standard
  datatypes--Appendix A discusses each built-in type. At the same
  time, an important principle of Python programming makes types
  less important than programmers coming from other languages tend
  to expect. According to Python's "principle of pervasive
  polymorphism" (my own coinage), it is more important what an
  object -does- than what it -is-. Another common way of putting
  the principle is: if it walks like a duck and quacks like a duck,
  treat it like a duck.

  Broadly, the idea behind polymorphism is letting the same
  function or operator work on things of different types. In C++ or
  Java, for example, you might use signature-based method
  overloading to let an operation apply to several types of things
  (acting differently as needed). For example:

      #------------ C++ signature-based polymorphism -----------#
      #include <stdio.h>
      class Print {
      public:
        void print(int i)    { printf("int %d\n", i); }
        void print(double d) { printf("double %f\n", d); }
        void print(float f)  { printf("float %f\n", f); }
      };
      main() {
        Print *p = new Print();
        p->print(37);      /* --> "int 37" */
        p->print(37.0);    /* --> "double 37.000000" */
      }

  The most direct Python translation of signature-based overloading
  is a function that performs type checks on its argument(s). It is
  simple to write such functions:

      #------- Python "signature-based" polymorphism -----------#
      def Print(x):
          from types import *
          if type(x) is FloatType:  print "float", x
          elif type(x) is IntType:  print "int", x
          elif type(x) is LongType: print "long", x

  Writing signature-based functions, however, is extremely
  un-Pythonic. If you find yourself performing these sorts of
  explicit type checks, you have probably not understood the
  problem you want to solve correctly! What you -should- (usually)
  be interested in is not what type 'x' is, but rather whether 'x'
  can perform the action you need it to perform (regardless what
  type of thing it is strictly).

  PYTHONIC POLYMORPHISM:

  Probably the single most common case where pervasive polymorphism
  is useful is in identifying "file-like" objects. There are many
  objects that can do things that files can do, such as those
  created with [urllib], [cStringIO], [zipfile], and by other
  means. Various objects can perform only subsets of what actual
  files can: some can read, others can write, still others can
  seek, and so on. But for many purposes, you have no need to
  exercise every "file-like" capability--it is good enough to make
  sure that a specified object has those capabilities you actually
  need.

  Here is a typical example. I have a module that uses DOM to work
  with XML documents; I would like users to be able to specify an
  XML source in any of several ways: using the name of an XML file,
  passing a file-like object that contains XML, or indicating an
  already-built DOM object to work with (built with any of several
  XML libraries). Moreover, future users of my module may get their
  XML from novel places I have not even thought of (an RDBMS, over
  sockets, etc.). By looking at what a candidate object can -do-, I
  can just utilize whichever capabilities that object -has-:

      #-------- Python capability-based polymorphism -----------#
      def toDOM(xml_src=None):
          from xml.dom import minidom
          if hasattr(xml_src, 'documentElement'):
              return xml_src    # it is already a DOM object
          elif hasattr(xml_src,'read'):
              # it is something that knows how to read data
              return minidom.parseString(xml_src.read())
          elif type(xml_src) in (StringType, UnicodeType):
              # it is a filename of an XML document
              xml = open(xml_src).read()
              return minidom.parseString(xml)
          else:
              raise ValueError, "Must be initialized with " +\
                    "filename, file-like object, or DOM object"

  Even simple-seeming numeric types have varying capabilities. As
  with other objects, you should not usually care about the
  internal representation of an object, but rather about what it
  can do. Of course, as one way to assure that an object has a
  capability, it is often appropriate to coerce it to a type using
  the built-in functions `complex()`, `dict()`, `float()`, `int()`,
  `list()`, `long()`, `str()`, `tuple()` and `unicode()`. All of
  these functions make a good effort to transform anything that
  looks a little bit like the type of thing they name into a true
  instance of it. It is usually not necessary, however, actually to
  transform values to prescribed types; again we can just check
  capabilities.

  For example, suppose that you want to remove the "least
  significant" portion of any number--perhaps because they
  represent measurements of limited accuracy. For whole
  numbers--ints or longs--you might mask out some low-order bits;
  for fractional values you might round to a given precision.
  Rather than testing value types explicitly, you can look for
  numeric capabilities. One common way to test a capability in
  Python is to -try- to do something, and catch any exceptions that
  occur (then try something else). Below is a simple example:

      #----------- Checking what numbers can do ---------------#
      def approx(x):                # int attributes require 2.2+
          if hasattr(x,'__and__'):  # supports bitwise-and
              return x & ~0x0FL
          try:                      # supports real/imag
              return (round(x.real,2)+round(x.imag,2)*1j)
          except AttributeError:
              return round(x,2)

  ENHANCED OBJECTS:

  The reason that the principle of pervasive polymorphism matters
  is because Python makes it easy to create new objects that behave
  mostly--but not exactly--like basic datatypes.  File-like
  objects were already mentioned as examples; you may or may not
  think of a file object as a datatype precisely.  But even basic
  datatypes like numbers, strings, lists, and dictionaries can be
  easily specialized and/or emulated.

  There are two details to pay attention to when emulating basic
  datatypes.  The most important matter to understand is that the
  capabilities of an object--even those utilized with syntactic
  constructs--are generally implemented by its "magic" methods,
  each named with leading and trailing double underscores.  Any
  object that has the right magic methods can act like a basic
  datatype in those contexts that use the supplied methods.  At
  heart, a basic datatype is just an object with some
  well-optimized versions of the right collection of magic
  methods.

  The second detail concerns exactly how you get at the magic
  methods--or rather, how best to make use of existing
  implementations. There is nothing stopping you from writing your
  own version of any basic datatype, except for the piddling
  details of doing so. However, there are quite a few such details,

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -