📄 pat5c.htm
字号:
parser or compiler generators are more appropriate.</LI>
<A NAME="auto1056"></A>
<P></P>
<A NAME="auto1057"></A>
<LI><EM>Adding new ways to interpret expressions.</EM>
The Interpreter pattern makes it easier to evaluate an expression in a
new way. For example, you can support pretty printing or
type-checking an expression by defining a new operation on the
expression classes. If you keep creating new ways of interpreting an
expression, then consider using the <A HREF="pat5kfs.htm" TARGET="_mainDisplayFrame">Visitor (331)</A>
pattern to avoid changing the grammar classes.</LI>
</OL>
<A NAME="implementation"></A>
<H2><A HREF="#samplecode"><IMG SRC="gifsb/down3.gif" BORDER=0></A> Implementation</H2>
<A NAME="auto1058"></A>
<P>The Interpreter and <A HREF="pat4cfs.htm" TARGET="_mainDisplayFrame">Composite (163)</A>
patterns share many implementation issues. The following issues
are specific to Interpreter:</P>
<OL>
<A NAME="auto1059"></A>
<LI><EM>Creating the abstract syntax tree.</EM>
The Interpreter pattern doesn't explain how to <EM>create</EM> an
abstract syntax tree. In other words, it doesn't address parsing.
The abstract syntax tree can be created by a table-driven parser, by a
hand-crafted (usually recursive descent) parser, or directly by the
client.</LI>
<A NAME="auto1060"></A>
<P></P>
<A NAME="variable-w-interp"></A>
<LI><EM>Defining the Interpret operation.</EM>
You don't have to define the Interpret operation in the expression
classes. If it's common to create a new interpreter, then it's better
to use the <A HREF="pat5kfs.htm" TARGET="_mainDisplayFrame">Visitor (331)</A> pattern to put Interpret in a
separate "visitor" object. For example, a grammar for a programming
language will have many operations on abstract syntax trees, such as
as type-checking, optimization, code generation, and so on. It will be
more likely to use a visitor to avoid defining these operations on
every grammar class.</LI>
<A NAME="auto1061"></A>
<P></P>
<A NAME="flywt-w-interp"></A>
<A NAME="term-symb-w-flywt"></A>
<LI><EM>Sharing terminal symbols with the Flyweight pattern.</EM>
Grammars whose sentences contain many occurrences of a terminal symbol
might benefit from sharing a single copy of that symbol. Grammars for
computer programs are good examples—each program variable will
appear in many places throughout the code. In the Motivation example,
a sentence can have the terminal symbol <CODE>dog</CODE> (modeled by the
LiteralExpression class) appearing many times.
<A NAME="auto1062"></A>
<P>Terminal nodes generally don't store information about their position
in the abstract syntax tree. Parent nodes pass them whatever context
they need during interpretation. Hence there is a distinction between
shared (intrinsic) state and passed-in (extrinsic) state, and the
<A HREF="pat4ffs.htm" TARGET="_mainDisplayFrame">Flyweight (<As+HREF="#Flyweight">195</A>) pattern applies.</P>
<A NAME="auto1063"></A>
<P>For example, each instance of LiteralExpression for <CODE>dog</CODE>
receives a context containing the substring matched so far. And every
such LiteralExpression does the same thing in its Interpret
operation—it checks whether the next part of the input contains a
<CODE>dog</CODE>---no matter where the instance appears in the tree.</P>
</LI>
</OL>
<A NAME="samplecode"><A>
<H2><A HREF="#knownuses"><IMG SRC="gifsb/down3.gif" BORDER=0></A> Sample Code</H2>
<A NAME="auto1064"></A>
<P>Here are two examples. The first is a complete example in Smalltalk
for checking whether a sequence matches a regular expression. The
second is a C++ program for evaluating Boolean expressions.</P>
<A NAME="Smalltalk_example_in_Interpreter"></A>
<P>The regular expression matcher tests whether a string is in the
language defined by the regular expression. The regular expression is
defined by the following grammar:</P>
<A NAME="auto1065"></A>
<PRE>
expression ::= literal | alternation | sequence | repetition |
'(' expression ')'
alternation ::= expression '|' expression
sequence ::= expression '&' expression
repetition ::= expression 'repeat'
literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }*
</PRE>
<A NAME="auto1066"></A>
<P>This grammar is a slight modification of the Motivation example. We
changed the concrete syntax of regular expressions a little, because
symbol "<CODE>*</CODE>" can't be a postfix operation in Smalltalk. So
we use <CODE>repeat</CODE> instead. For example, the regular expression</P>
<A NAME="auto1067"></A>
<PRE>
(('dog ' | 'cat ') repeat & 'weather')
</PRE>
<A NAME="auto1068"></A>
<P>matches the input string "<CODE>dog dog cat weather</CODE>".</P>
<A NAME="auto1069"></A>
<P>To implement the matcher, we define the five classes described on
<A HREF="#auto1006">page 243</A>. The class
<CODE>SequenceExpression</CODE> has instance variables
<CODE>expression1</CODE> and <CODE>expression2</CODE> for its children
in the abstract syntax tree. <CODE>AlternationExpression</CODE>
stores its alternatives in the instance variables
<CODE>alternative1</CODE> and <CODE>alternative2</CODE>, while
<CODE>RepetitionExpression</CODE> holds the expression it repeats in its
<CODE>repetition</CODE> instance variable.
LiteralExpression has a <CODE>components</CODE> instance variable that
holds a list of objects (probably characters). These represent the literal
string that must match the input sequence.</P>
<A NAME="auto1070"></A>
<P>The <CODE>match:</CODE> operation implements an interpreter for the
regular expression. Each of the classes defining the abstract syntax
tree implements this operation. It takes
<CODE>inputState</CODE> as an argument representing the current state
of the matching process, having read part of the input string.</P>
<A NAME="auto1071"></A>
<P>This current state is characterized by a set of input streams
representing the set of inputs that the regular expression could have
accepted so far. (This is roughly equivalent to recording all states
that the equivalent finite state automata would be in, having
recognized the input stream to this point).</P>
<A NAME="auto1072"></A>
<P>The current state is most important to the <CODE>repeat</CODE> operation.
For example, if the regular expression were</P>
<A NAME="auto1073"></A>
<PRE>
'a' repeat
</PRE>
<A NAME="auto1074"></A>
<P>then the interpreter could match "<CODE>a</CODE>", "<CODE>aa</CODE>",
"<CODE>aaa</CODE>", and so on. If it were</P>
<A NAME="auto1075"></A>
<PRE>
'a' repeat & 'bc'
</PRE>
<A NAME="auto1076"></A>
<P>then it could match "<CODE>abc</CODE>", "<CODE>aabc</CODE>",
"<CODE>aaabc</CODE>", and so on. But if the regular expression were</P>
<A NAME="auto1077"></A>
<PRE>
'a' repeat & 'abc'
</PRE>
<A NAME="auto1078"></A>
<P>then matching the input "<CODE>aabc</CODE>" against the subexpression
"<CODE>'a' repeat</CODE>" would yield two input streams, one having matched
one character of the input, and the other having matched two
characters. Only the stream that has accepted one character will
match the remaining "<CODE>abc</CODE>".</P>
<A NAME="seqexp-smalltk"></A>
<P>Now we consider the definitions of <CODE>match:</CODE> for each class
defining the regular expression. The definition for
<CODE>SequenceExpression</CODE> matches each of its subexpressions in
sequence. Usually it will eliminate input streams from its
<CODE>inputState</CODE>.</P>
<A NAME="auto1079"></A>
<PRE>
match: inputState
^ expression2 match: (expression1 match: inputState).
</PRE>
<A NAME="smallaltexp"></A>
<P>An <CODE>AlternationExpression</CODE> will return a state that consists
of the union of states from either alternative. The definition of
<CODE>match:</CODE> for <CODE>AlternationExpression</CODE> is</P>
<A NAME="auto1080"></A>
<PRE>
match: inputState
| finalState |
finalState := alternative1 match: inputState.
finalState addAll: (alternative2 match: inputState).
^ finalState
</PRE>
<A NAME="repeatexp-smalltk"></A>
<P>The <CODE>match:</CODE> operation for <CODE>RepetitionExpression</CODE>
tries to find as many states that could match as possible:</P>
<A NAME="auto1081"></A>
<PRE>
match: inputState
| aState finalState |
aState := inputState.
finalState := inputState copy.
[aState isEmpty]
whileFalse:
[aState := repetition match: aState.
finalState addAll: aState].
^ finalState
</PRE>
<A NAME="auto1082"></A>
<P>Its output state usually contains more states than its input state,
because a <CODE>RepetitionExpression</CODE> can match one, two, or many
occurrences of <CODE>repetition</CODE> on the input state. The output
states represent all these possibilities, allowing subsequent elements
of the regular expression to decide which state is the correct one.</P>
<A NAME="literal-small"></A>
<P>Finally, the definition of <CODE>match:</CODE> for
<CODE>LiteralExpression</CODE> tries to match its components against each
possible input stream. It keeps only those input streams that have a
match:</P>
<A NAME="auto1083"></A>
<PRE>
match: inputState
| finalState tStream |
finalState := Set new.
inputState
do:
[:stream | tStream := stream copy.
(tStream nextAvailable:
components size
) = components
ifTrue: [finalState add: tStream]
].
^ finalState
</PRE>
<A NAME="auto1084"></A>
<P>The <CODE>nextAvailable:</CODE> message advances the input stream. This
is the only <CODE>match:</CODE> operation that advances the stream.
Notice how the state that's returned contains a copy of the input
stream, thereby ensuring that matching a literal never changes the
input stream. This is important because each alternative of an
<CODE>AlternationExpression</CODE> should see identical copies of
the input stream.</P>
<A NAME="abssyntree2"></A>
<P>Now that we've defined the classes that make up an abstract syntax
tree, we can describe how to build it.
Rather than write a parser for regular expressions, we'll define
some operations on the <CODE>RegularExpression</CODE> classes so that
evaluating a Smalltalk expression will produce an abstract syntax tree
for the corresponding regular expression. That lets us use the
built-in Smalltalk compiler as if it were a parser for regular
expressions.</P>
<A NAME="buildabssyn"></A>
<P>To build the abstract syntax tree, we'll need to define
"<CODE>|</CODE>", "<CODE>repeat</CODE>", and "<CODE>&</CODE>" as
operations on <CODE>RegularExpression</CODE>. These operations are
defined in class <CODE>RegularExpression</CODE> like this:</P>
<A NAME="auto1085"></A>
<PRE>
& aNode
^ SequenceExpression new
expression1: self expression2: aNode asRExp
repeat
^ RepetitionExpression new repetition: self
| aNode
^ AlternationExpression new
alternative1: self alternative2: aNode asRExp
asRExp
^ self
</PRE>
<A NAME="auto1086"></A>
<P>The <CODE>asRExp</CODE> operation will convert literals into
<CODE>RegularExpressions</CODE>. These operations are defined in class
<CODE>String</CODE>:</P>
<A NAME="auto1087"></A>
<PRE>
& aNode
^ SequenceExpression new
expression1: self asRExp expression2: aNode asRExp
repeat
^ RepetitionExpression new repetition: self
| aNode
^ AlternationExpression new
alternative1: self asRExp alternative2: aNode asRExp
asRExp
^ LiteralExpression new components: self
</PRE>
<A NAME="smalltkv-use-interp"></A>
<P>If we defined these operations higher up in the class hierarchy
(<CODE>SequenceableCollection</CODE> in Smalltalk-80,
<CODE>IndexedCollection</CODE> in Smalltalk/V), then they would
also be defined for classes such as <CODE>Array</CODE> and
<CODE>OrderedCollection</CODE>. This would let
regular expressions match sequences of any kind of object.</P>
<A NAME="auto1088"></A>
<P>The second example is a system for manipulating and evaluating
Boolean expressions implemented in C++. The terminal symbols in this
language are Boolean variables, that is, the constants
<CODE>true</CODE> and <CODE>false</CODE>. Nonterminal symbols represent
expressions containing the operators <CODE>and</CODE>, <CODE>or</CODE>, and
<CODE>not</CODE>. The grammar is defined as
follows<A NAME="fn1"></A><A HREF="#footnote1"><SUP>1</SUP></A>:</P>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -