📄 sec-monads.html
字号:
> - the input is
being handled completely within the monad.</P
><P
>The lexical analysis function must have the following
type:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer :: (Token -> P a) -> P a</PRE
></TD
></TR
></TABLE
><P
>where <TT
CLASS="LITERAL"
>P</TT
> is the monad type constructor declared
with <TT
CLASS="LITERAL"
>%monad</TT
>, and <TT
CLASS="LITERAL"
>a</TT
> can be replaced by the
parser return type if desired.</P
><P
>You can see from this type that the lexer takes a
<SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>continuation</I
></SPAN
> as an argument. The lexer is to find
the next token, and pass it to this continuation to carry on
with the parse. Obviously, we need to keep track of the input
in the monad somehow, so that the lexer can do something
different each time it's called!</P
><P
>Let's take the exception monad above, and extend it to
add the input string so that we can use it with a threaded
lexer.</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>data ParseResult a = Ok a | Failed String
type P a = String -> ParseResult a
thenP :: P a -> (a -> P b) -> P b
m `thenP` k = \s ->
case m s of
Ok a -> k a s
Failed e -> Failed e
returnP :: a -> P a
returnP a = \s -> Ok a
failP :: String -> P a
failP err = \s -> Failed err
catchP :: P a -> (String -> P a) -> P a
catchP m k = \s ->
case m s of
Ok a -> OK a
Failed e -> k e s</PRE
></TD
></TR
></TABLE
><P
>Notice that this isn't a real state monad - the input
string just gets passed around, not returned. Our lexer will
now look something like this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer :: (Token -> P a) -> P a
lexer cont s =
... lexical analysis code ...
cont token s'</PRE
></TD
></TR
></TABLE
><P
>the lexer grabs the continuation and the input string,
finds the next token <TT
CLASS="LITERAL"
>token</TT
>, and passes it together
with the remaining input string <TT
CLASS="LITERAL"
>s'</TT
> to the
continuation.</P
><P
>We can now indicate lexical errors by ignoring the
continuation and calling <TT
CLASS="LITERAL"
>failP "error message" s</TT
>
within the lexer (don't forget to pass the input string to
make the types work out).</P
><P
>This may all seem a bit weird. Why, you ask, doesn't
the lexer just have type <TT
CLASS="LITERAL"
>P Token</TT
>? Well, the
decision was taken to use a continuation purely for
performance reasons. If the lexer must return a token, then
it turns out that the input string must be a real state
object, and the monad has to return it as well as pass it
around. If you want a lexer of type <TT
CLASS="LITERAL"
>P Token</TT
>, then
just define a wrapper to deal with the continuation:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexwrap :: (Token -> P a) -> P a
lexwrap cont = real_lexer `thenP` \token -> cont token</PRE
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="SEC-LINE-NUMBERS"
>2.5.3. Line Numbers</A
></H2
><P
>Previous versions of <SPAN
CLASS="APPLICATION"
>Happy</SPAN
> had a
<TT
CLASS="LITERAL"
>%newline</TT
> directive that enabled simple line numbers
to be counted by the parser and referenced in the actions. We
warned you that this facility may go away and be replaced by
something more general, well guess what? :-)</P
><P
>Line numbers can now be dealt with quite
straightforwardly using a monadic parser/lexer combination.
Ok, we have to extend the monad a bit more:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>type LineNumber = Int
type P a = String -> LineNumber -> ParseResult a
getLineNo :: P LineNumber
getLineNo = \s l -> Ok l</PRE
></TD
></TR
></TABLE
><P
>(the rest of the functions in the monad follow by just
adding the extra line number argument in the same way as the
input string). Again, the line number is just passed down,
not returned: this is OK because of the continuation-based
lexer that can change the line number and pass the new one to
the continuation.</P
><P
>The lexer can now update the line number as follows:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer cont s =
case s of
'\n':s -> \line -> lexer cont s (line + 1)
... rest of lexical analysis ...</PRE
></TD
></TR
></TABLE
><P
>It's as simple as that. Take a look at
<SPAN
CLASS="APPLICATION"
>Happy</SPAN
>'s own parser if you have the sources lying
around, it uses a monad just like the one above.</P
><P
>Reporting the line number of a parse error is achieved
by changing <TT
CLASS="LITERAL"
>happyError</TT
> to look something like
this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>happyError :: P a
happyError = getLineNo `thenP` \line ->
failP (show line ++ ": parse error")</PRE
></TD
></TR
></TABLE
><P
>We can also get hold of the line number during parsing,
to put it in the parsed data structure for future reference.
A good way to do this is to have a production in the grammar
that returns the current line number: </P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lineno :: { LineNumber }
: {- empty -} {% getLineNo }</PRE
></TD
></TR
></TABLE
><P
>The semantic value of <TT
CLASS="LITERAL"
>lineno</TT
> is the line
number of the last token read - this will always be the token
directly following the <TT
CLASS="LITERAL"
>lineno</TT
> symbol in the grammar,
since <SPAN
CLASS="APPLICATION"
>Happy</SPAN
> always keeps one lookahead token in
reserve.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="SEC-MONAD-SUMMARY"
>2.5.4. Summary</A
></H2
><P
>The types of various functions related to the parser are
dependent on what combination of <TT
CLASS="LITERAL"
>%monad</TT
> and
<TT
CLASS="LITERAL"
>%lexer</TT
> directives are present in the grammar. For
reference, we list those types here. In the following types,
<SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
> is the return type of the
parser. A type containing a type variable indicates that the
specified function must be polymorphic.</P
><P
></P
><UL
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> No <TT
CLASS="LITERAL"
>%monad</TT
> or
<TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse :: [Token] -> <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: [Token] -> a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%monad</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse :: [Token] -> P <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: [Token] -> P a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse :: T <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: T a
lexer :: (Token -> T a) -> T a</PRE
></TD
></TR
></TABLE
>
where the type constructor <TT
CLASS="LITERAL"
>T</TT
> is whatever you want (usually <TT
CLASS="LITERAL"
>T
a = String -> a</TT
>. I'm not sure if this is useful, or even if it works
properly.</P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%monad</TT
> and <TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse :: P <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: P a
lexer :: (Token -> P a) -> P a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
></UL
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="sec-type-signatures.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="happy.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="sec-error.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Type Signatures</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="sec-using.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>The Error Token</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -