📄 sec-monads.html

📁 編譯器像YACC的編譯及語法產生器
💻 HTML
📖 第 1 页 / 共 2 页
字号:
上一页 12
> - the input is
        being handled completely within the monad.</P
><P
>The lexical analysis function must have the following
	type:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer :: (Token -&#62; P a) -&#62; P a</PRE
></TD
></TR
></TABLE
><P
>where <TT
CLASS="LITERAL"
>P</TT
> is the monad type constructor declared
        with <TT
CLASS="LITERAL"
>%monad</TT
>, and <TT
CLASS="LITERAL"
>a</TT
> can be replaced by the
        parser return type if desired.</P
><P
>You can see from this type that the lexer takes a
        <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>continuation</I
></SPAN
> as an argument.  The lexer is to find
        the next token, and pass it to this continuation to carry on
        with the parse.  Obviously, we need to keep track of the input
        in the monad somehow, so that the lexer can do something
        different each time it's called!</P
><P
>Let's take the exception monad above, and extend it to
        add the input string so that we can use it with a threaded
        lexer.</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>data ParseResult a = Ok a | Failed String
type P a = String -&#62; ParseResult a

thenP :: P a -&#62; (a -&#62; P b) -&#62; P b
m `thenP` k = \s -&#62;
   case m s of 
       Ok a -&#62; k a s
	 Failed e -&#62; Failed e

returnP :: a -&#62; P a
returnP a = \s -&#62; Ok a

failP :: String -&#62; P a
failP err = \s -&#62; Failed err

catchP :: P a -&#62; (String -&#62; P a) -&#62; P a
catchP m k = \s -&#62;
   case m s of
      Ok a -&#62; OK a
	Failed e -&#62; k e s</PRE
></TD
></TR
></TABLE
><P
>Notice that this isn't a real state monad - the input
        string just gets passed around, not returned.  Our lexer will
        now look something like this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer :: (Token -&#62; P a) -&#62; P a
lexer cont s = 
    ... lexical analysis code ...
    cont token s'</PRE
></TD
></TR
></TABLE
><P
>the lexer grabs the continuation and the input string,
        finds the next token <TT
CLASS="LITERAL"
>token</TT
>, and passes it together
        with the remaining input string <TT
CLASS="LITERAL"
>s'</TT
> to the
        continuation.</P
><P
>We can now indicate lexical errors by ignoring the
        continuation and calling <TT
CLASS="LITERAL"
>failP "error message" s</TT
>
        within the lexer (don't forget to pass the input string to
        make the types work out).</P
><P
>This may all seem a bit weird.  Why, you ask, doesn't
        the lexer just have type <TT
CLASS="LITERAL"
>P Token</TT
>?  Well, the
        decision was taken to use a continuation purely for
        performance reasons.  If the lexer must return a token, then
        it turns out that the input string must be a real state
        object, and the monad has to return it as well as pass it
        around.  If you want a lexer of type <TT
CLASS="LITERAL"
>P Token</TT
>, then
        just define a wrapper to deal with the continuation:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexwrap :: (Token -&#62; P a) -&#62; P a
lexwrap cont = real_lexer `thenP` \token -&#62; cont token</PRE
></TD
></TR
></TABLE
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="SEC-LINE-NUMBERS"
>2.5.3. Line Numbers</A
></H2
><P
>Previous versions of <SPAN
CLASS="APPLICATION"
>Happy</SPAN
> had a
        <TT
CLASS="LITERAL"
>%newline</TT
> directive that enabled simple line numbers
        to be counted by the parser and referenced in the actions.  We
        warned you that this facility may go away and be replaced by
        something more general, well guess what? :-)</P
><P
>Line numbers can now be dealt with quite
        straightforwardly using a monadic parser/lexer combination.
        Ok, we have to extend the monad a bit more:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>type LineNumber = Int
type P a = String -&#62; LineNumber -&#62; ParseResult a

getLineNo :: P LineNumber
getLineNo = \s l -&#62; Ok l</PRE
></TD
></TR
></TABLE
><P
>(the rest of the functions in the monad follow by just
        adding the extra line number argument in the same way as the
        input string).  Again, the line number is just passed down,
        not returned: this is OK because of the continuation-based
        lexer that can change the line number and pass the new one to
        the continuation.</P
><P
>The lexer can now update the line number as follows:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lexer cont s =
  case s of
     '\n':s  -&#62;  \line -&#62; lexer cont s (line + 1)
     ... rest of lexical analysis ...</PRE
></TD
></TR
></TABLE
><P
>It's as simple as that.  Take a look at
        <SPAN
CLASS="APPLICATION"
>Happy</SPAN
>'s own parser if you have the sources lying
        around, it uses a monad just like the one above.</P
><P
>Reporting the line number of a parse error is achieved
        by changing <TT
CLASS="LITERAL"
>happyError</TT
> to look something like
        this:</P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>happyError :: P a
happyError = getLineNo `thenP` \line -&#62; 
             failP (show line ++ ": parse error")</PRE
></TD
></TR
></TABLE
><P
>We can also get hold of the line number during parsing,
        to put it in the parsed data structure for future reference.
        A good way to do this is to have a production in the grammar
        that returns the current line number: </P
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="100%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>lineno :: { LineNumber }
        : {- empty -}      {% getLineNo }</PRE
></TD
></TR
></TABLE
><P
>The semantic value of <TT
CLASS="LITERAL"
>lineno</TT
> is the line
        number of the last token read - this will always be the token
        directly following the <TT
CLASS="LITERAL"
>lineno</TT
> symbol in the grammar,
        since <SPAN
CLASS="APPLICATION"
>Happy</SPAN
> always keeps one lookahead token in
        reserve.</P
></DIV
><DIV
CLASS="SECT2"
><H2
CLASS="SECT2"
><A
NAME="SEC-MONAD-SUMMARY"
>2.5.4. Summary</A
></H2
><P
>The types of various functions related to the parser are
        dependent on what combination of <TT
CLASS="LITERAL"
>%monad</TT
> and
        <TT
CLASS="LITERAL"
>%lexer</TT
> directives are present in the grammar.  For
        reference, we list those types here.  In the following types,
        <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
> is the return type of the
        parser.  A type containing a type variable indicates that the
        specified function must be polymorphic.</P
><P
></P
><UL
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> No <TT
CLASS="LITERAL"
>%monad</TT
> or
	      <TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse      :: [Token] -&#62; <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: [Token] -&#62; a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%monad</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse      :: [Token] -&#62; P <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: [Token] -&#62; P a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse      :: T <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: T a
lexer      :: (Token -&#62; T a) -&#62; T a</PRE
></TD
></TR
></TABLE
>
where the type constructor <TT
CLASS="LITERAL"
>T</TT
> is whatever you want (usually <TT
CLASS="LITERAL"
>T
a = String -&#62; a</TT
>.  I'm not sure if this is useful, or even if it works
properly.</P
></DIV
></LI
><LI
><DIV
CLASS="FORMALPARA"
><P
><B
> with <TT
CLASS="LITERAL"
>%monad</TT
> and <TT
CLASS="LITERAL"
>%lexer</TT
> . </B
><TABLE
BORDER="0"
BGCOLOR="#E0E0E0"
WIDTH="90%"
><TR
><TD
><PRE
CLASS="PROGRAMLISTING"
>parse      :: P <SPAN
CLASS="emphasis"
><I
CLASS="EMPHASIS"
>t</I
></SPAN
>
happyError :: P a
lexer      :: (Token -&#62; P a) -&#62; P a</PRE
></TD
></TR
></TABLE
></P
></DIV
></LI
></UL
></DIV
></DIV
><DIV
CLASS="NAVFOOTER"
><HR
ALIGN="LEFT"
WIDTH="100%"><TABLE
SUMMARY="Footer navigation table"
WIDTH="100%"
BORDER="0"
CELLPADDING="0"
CELLSPACING="0"
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
><A
HREF="sec-type-signatures.html"
ACCESSKEY="P"
>Prev</A
></TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="happy.html"
ACCESSKEY="H"
>Home</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
><A
HREF="sec-error.html"
ACCESSKEY="N"
>Next</A
></TD
></TR
><TR
><TD
WIDTH="33%"
ALIGN="left"
VALIGN="top"
>Type Signatures</TD
><TD
WIDTH="34%"
ALIGN="center"
VALIGN="top"
><A
HREF="sec-using.html"
ACCESSKEY="U"
>Up</A
></TD
><TD
WIDTH="33%"
ALIGN="right"
VALIGN="top"
>The Error Token</TD
></TR
></TABLE
></DIV
></BODY
></HTML
>
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -