📄 chap1.txt
字号:
#*------ Limit nesting depth of map()/filter() ------#
intermed = filter(niceProperty, map(someTransform, lines))
final = map(otherTransform, intermed)
Any nesting of successive `filter()` or `map()` calls, however,
can be reduced to single functions using the proper combinatorial
HOFs. Therefore, the number of procedural steps needed is pretty
much always quite small. However, the reduction in total
lines-of-code is offset by the lines used for giving names to
combinatorial functions. Overall, FP style code is usually about
one-half the length of imperative style equivalents (fewer lines
generally mean correspondingly fewer bugs).
A nice feature of combinatorial functions is that they can
provide a complete Boolean algebra for functions that have not
been called yet (the use of `operator.add` and `operator.mul` in
'combinatorial.py' is more than accidental, in that sense). For
example, with a collection of simple values, you might express a
(complex) relation of multiple truth values as, for example:
#*---------- Simple Boolean algebra of values ----------#
satisfied = (this or that) and (foo or bar)
In the case of text processing on chunks of text, these truth
values are often the results of predicative functions applied
to a chunk, for example:
#*---------- Boolean algebra of return values ----------#
satisfied = (thisP(s) or thatP(s)) and (fooP(s) or barP(s))
In an expression like the above one, several predicative
functions are applied to the same string (or other object), and
a set of logical relations on the results are evaluated. But
this expression is itself a logical predicate of the string. For
naming clarity--and especially if you wish to evaluate the same
predicate more than once--it is convenient to create an actual
function expressing the predicate:
#*------ Boolean algebra of composed functions ------#
satisfiedP = both(either(thisP,thatP), either(fooP,barP))
Using a predicative function created with combinatorial
techniques is the same as using any other function:
#*------ Use of a compositional Boolean function ------#
selected = filter(satisfiedP, lines)
EXERCISE: More on combinatorial functions
--------------------------------------------------------------------
The module 'combinatorial.py' presented above provides some of
the most commonly useful combinatorial higher-order functions.
But there is room for enhancement in the brief example. Creating
a personal or organization library of useful HOFs is a way to
improve the reusability of your current text processing
libraries.
QUESTIONS:
1. Some of the functions defined in 'combinatorial.py' are
not, strictly speaking, combinatorial. In a precise sense,
a combinatorial function should take one or several
functions as arguments and return one or more function
objects that "combine" the input arguments. Identify which
functions are not "strictly" combinatorial, and determine
exactly what type of thing each one -does- return.
2. The functions 'both()' and 'and_()' do almost the same
thing. But they differ in an important, albeit subtle, way.
'and_()', like the Python operator `and`, uses -shortcutting-
in its evaluation. Consider these lines:
>>> f = lambda n: n**2 > 10
>>> g = lambda n: 100/n > 10
>>> and_(f,g)(5)
1
>>> both(f,g)(5)
1
>>> and_(f,g)(0)
0
>>> both(f,g)(0)
Traceback (most recent call last):
...
The shortcutting 'and_()' can potentially allow the first
function to act as a "guard" for the second one. The second
function never gets called if the first function returns a
false value on a given argument.
a. Create a similarly shortcutting combinatorial 'or_()'
function for your library.
b. Create general shortcutting functions 'shortcut_all()'
and 'shortcut_some()' that behave similarly to the
functions 'all()' and 'some()', respectively.
c. Describe some situations where nonshortcutting
combinatorial functions like 'both()', 'all()', or
'anyof3()' are more desirable than similar shortcutting
functions.
3. The function 'ident()' would appear to be pointless, since
it simply returns whatever value is passed to it. In truth,
'ident()' is an almost indispensable function for a
combinatorial collection. Explain the significance of
'ident()'.
Hint: Suppose you have a list of lines of text, where some
of the lines may be empty strings. What filter can you
apply to find all the lines that start with a '#'?
4. The function 'not_()' might make a nice addition to a
combinatorial library. We could define this function as:
>>> not_ = lambda f: lambda x, f=f: not f(x)
Explore some situations where a 'not_()' function would aid
combinatoric programming.
5. The function 'apply_each()' is used in 'combinatorial.py'
to build some other functions. But the utility of
'apply_each()' is more general than its supporting role
might suggest. A trivial usage of 'apply_each()' might
look something like:
>>> apply_each(map(adder_factory, range(5)),(10,))
[10, 11, 12, 13, 14]
Explore some situations where 'apply_each()' simplifies
applying multiple operations to a chunk of text.
6. Unlike the functions 'all()' and 'some()', the functions
'compose()' and 'compose3()' take a fixed number of input
functions as arguments. Create a generalized composition
function that takes a list of input functions, of any
length, as an argument.
7. What other combinatorial higher-order functions that have
not been discussed here are likely to prove useful in text
processing? Consider other ways of combining first-order
functions into useful operations, and add these to your
library. What are good names for these enhanced HOFs?
TOPIC -- Specializing Python Datatypes
--------------------------------------------------------------------
Python comes with an excellent collection of standard
datatypes--Appendix A discusses each built-in type. At the same
time, an important principle of Python programming makes types
less important than programmers coming from other languages tend
to expect. According to Python's "principle of pervasive
polymorphism" (my own coinage), it is more important what an
object -does- than what it -is-. Another common way of putting
the principle is: if it walks like a duck and quacks like a duck,
treat it like a duck.
Broadly, the idea behind polymorphism is letting the same
function or operator work on things of different types. In C++ or
Java, for example, you might use signature-based method
overloading to let an operation apply to several types of things
(acting differently as needed). For example:
#------------ C++ signature-based polymorphism -----------#
#include <stdio.h>
class Print {
public:
void print(int i) { printf("int %d\n", i); }
void print(double d) { printf("double %f\n", d); }
void print(float f) { printf("float %f\n", f); }
};
main() {
Print *p = new Print();
p->print(37); /* --> "int 37" */
p->print(37.0); /* --> "double 37.000000" */
}
The most direct Python translation of signature-based overloading
is a function that performs type checks on its argument(s). It is
simple to write such functions:
#------- Python "signature-based" polymorphism -----------#
def Print(x):
from types import *
if type(x) is FloatType: print "float", x
elif type(x) is IntType: print "int", x
elif type(x) is LongType: print "long", x
Writing signature-based functions, however, is extremely
un-Pythonic. If you find yourself performing these sorts of
explicit type checks, you have probably not understood the
problem you want to solve correctly! What you -should- (usually)
be interested in is not what type 'x' is, but rather whether 'x'
can perform the action you need it to perform (regardless what
type of thing it is strictly).
PYTHONIC POLYMORPHISM:
Probably the single most common case where pervasive polymorphism
is useful is in identifying "file-like" objects. There are many
objects that can do things that files can do, such as those
created with [urllib], [cStringIO], [zipfile], and by other
means. Various objects can perform only subsets of what actual
files can: some can read, others can write, still others can
seek, and so on. But for many purposes, you have no need to
exercise every "file-like" capability--it is good enough to make
sure that a specified object has those capabilities you actually
need.
Here is a typical example. I have a module that uses DOM to work
with XML documents; I would like users to be able to specify an
XML source in any of several ways: using the name of an XML file,
passing a file-like object that contains XML, or indicating an
already-built DOM object to work with (built with any of several
XML libraries). Moreover, future users of my module may get their
XML from novel places I have not even thought of (an RDBMS, over
sockets, etc.). By looking at what a candidate object can -do-, I
can just utilize whichever capabilities that object -has-:
#-------- Python capability-based polymorphism -----------#
def toDOM(xml_src=None):
from xml.dom import minidom
if hasattr(xml_src, 'documentElement'):
return xml_src # it is already a DOM object
elif hasattr(xml_src,'read'):
# it is something that knows how to read data
return minidom.parseString(xml_src.read())
elif type(xml_src) in (StringType, UnicodeType):
# it is a filename of an XML document
xml = open(xml_src).read()
return minidom.parseString(xml)
else:
raise ValueError, "Must be initialized with " +\
"filename, file-like object, or DOM object"
Even simple-seeming numeric types have varying capabilities. As
with other objects, you should not usually care about the
internal representation of an object, but rather about what it
can do. Of course, as one way to assure that an object has a
capability, it is often appropriate to coerce it to a type using
the built-in functions `complex()`, `dict()`, `float()`, `int()`,
`list()`, `long()`, `str()`, `tuple()` and `unicode()`. All of
these functions make a good effort to transform anything that
looks a little bit like the type of thing they name into a true
instance of it. It is usually not necessary, however, actually to
transform values to prescribed types; again we can just check
capabilities.
For example, suppose that you want to remove the "least
significant" portion of any number--perhaps because they
represent measurements of limited accuracy. For whole
numbers--ints or longs--you might mask out some low-order bits;
for fractional values you might round to a given precision.
Rather than testing value types explicitly, you can look for
numeric capabilities. One common way to test a capability in
Python is to -try- to do something, and catch any exceptions that
occur (then try something else). Below is a simple example:
#----------- Checking what numbers can do ---------------#
def approx(x): # int attributes require 2.2+
if hasattr(x,'__and__'): # supports bitwise-and
return x & ~0x0FL
try: # supports real/imag
return (round(x.real,2)+round(x.imag,2)*1j)
except AttributeError:
return round(x,2)
ENHANCED OBJECTS:
The reason that the principle of pervasive polymorphism matters
is because Python makes it easy to create new objects that behave
mostly--but not exactly--like basic datatypes. File-like
objects were already mentioned as examples; you may or may not
think of a file object as a datatype precisely. But even basic
datatypes like numbers, strings, lists, and dictionaries can be
easily specialized and/or emulated.
There are two details to pay attention to when emulating basic
datatypes. The most important matter to understand is that the
capabilities of an object--even those utilized with syntactic
constructs--are generally implemented by its "magic" methods,
each named with leading and trailing double underscores. Any
object that has the right magic methods can act like a basic
datatype in those contexts that use the supplied methods. At
heart, a basic datatype is just an object with some
well-optimized versions of the right collection of magic
methods.
The second detail concerns exactly how you get at the magic
methods--or rather, how best to make use of existing
implementations. There is nothing stopping you from writing your
own version of any basic datatype, except for the piddling
details of doing so. However, there are quite a few such details,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -