⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 bison_10.htm

📁 Lex和Yacc的Manual
💻 HTM
字号:
<HTML><HEAD><!-- This HTML file has been created by texi2html 1.44     from /opt/src/gnu/bison-1.25/bison.texinfo on 30 June 1997 --><TITLE>Bison 1.25 - Handling Context Dependencies</TITLE></HEAD><BODY>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_9.html">previous</A>, <A HREF="bison_11.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.<HR><H1><A NAME="SEC82" HREF="index.html#SEC82">Handling Context Dependencies</A></H1><P>The Bison paradigm is to parse tokens first, then group them into largersyntactic units.  In many languages, the meaning of a token is affected byits context.  Although this violates the Bison paradigm, certain techniques(known as <STRONG>kludges</STRONG>) may enable you to write Bison parsers for suchlanguages.</P><P>(Actually, "kludge" means any technique that gets its job done but isneither clean nor robust.)</P><H2><A NAME="SEC83" HREF="index.html#SEC83">Semantic Info in Token Types</A></H2><P>The C language has a context dependency: the way an identifier is useddepends on what its current meaning is.  For example, consider this:</P><PRE>foo (x);</PRE><P>This looks like a function call statement, but if <CODE>foo</CODE> is a typedefname, then this is actually a declaration of <CODE>x</CODE>.  How can a Bisonparser for C decide how to parse this input?</P><P>The method used in GNU C is to have two different token types,<CODE>IDENTIFIER</CODE> and <CODE>TYPENAME</CODE>.  When <CODE>yylex</CODE> finds anidentifier, it looks up the current declaration of the identifier in orderto decide which token type to return: <CODE>TYPENAME</CODE> if the identifier isdeclared as a typedef, <CODE>IDENTIFIER</CODE> otherwise.</P><P>The grammar rules can then express the context dependency by the choice oftoken type to recognize.  <CODE>IDENTIFIER</CODE> is accepted as an expression,but <CODE>TYPENAME</CODE> is not.  <CODE>TYPENAME</CODE> can start a declaration, but<CODE>IDENTIFIER</CODE> cannot.  In contexts where the meaning of the identifieris <EM>not</EM> significant, such as in declarations that can shadow atypedef name, either <CODE>TYPENAME</CODE> or <CODE>IDENTIFIER</CODE> isaccepted--there is one rule for each of the two token types.</P><P>This technique is simple to use if the decision of which kinds ofidentifiers to allow is made at a place close to where the identifier isparsed.  But in C this is not always so: C allows a declaration toredeclare a typedef name provided an explicit type has been specifiedearlier:</P><PRE>typedef int foo, bar, lose;static foo (bar);        /* redeclare <CODE>bar</CODE> as static variable */static int foo (lose);   /* redeclare <CODE>foo</CODE> as function */</PRE><P>Unfortunately, the name being declared is separated from the declarationconstruct itself by a complicated syntactic structure--the "declarator".</P><P>As a result, the part of Bison parser for C needs to be duplicated, withall the nonterminal names changed: once for parsing a declaration in whicha typedef name can be redefined, and once for parsing a declaration inwhich that can't be done.  Here is a part of the duplication, with actionsomitted for brevity:</P><PRE>initdcl:          declarator maybeasm '='          init        | declarator maybeasm        ;notype_initdcl:          notype_declarator maybeasm '='          init        | notype_declarator maybeasm        ;</PRE><P>Here <CODE>initdcl</CODE> can redeclare a typedef name, but <CODE>notype_initdcl</CODE>cannot.  The distinction between <CODE>declarator</CODE> and<CODE>notype_declarator</CODE> is the same sort of thing.</P><P>There is some similarity between this technique and a lexical tie-in(described next), in that information which alters the lexical analysis ischanged during parsing by other parts of the program.  The difference ishere the information is global, and is used for other purposes in theprogram.  A true lexical tie-in has a special-purpose flag controlled bythe syntactic context.</P><H2><A NAME="SEC84" HREF="index.html#SEC84">Lexical Tie-ins</A></H2><P><A NAME="IDX186"></A></P><P>One way to handle context-dependency is the <STRONG>lexical tie-in</STRONG>: a flagwhich is set by Bison actions, whose purpose is to alter the way tokens areparsed.</P><P>For example, suppose we have a language vaguely like C, but with a specialconstruct <SAMP>`hex (<VAR>hex-expr</VAR>)'</SAMP>.  After the keyword <CODE>hex</CODE> comesan expression in parentheses in which all integers are hexadecimal.  Inparticular, the token <SAMP>`a1b'</SAMP> must be treated as an integer rather thanas an identifier if it appears in that context.  Here is how you can do it:</P><PRE>%{int hexflag;%}%%...expr:   IDENTIFIER        | constant        | HEX '('                { hexflag = 1; }          expr ')'                { hexflag = 0;                   $$ = $4; }        | expr '+' expr                { $$ = make_sum ($1, $3); }        ...        ;constant:          INTEGER        | STRING        ;</PRE><P>Here we assume that <CODE>yylex</CODE> looks at the value of <CODE>hexflag</CODE>; whenit is nonzero, all integers are parsed in hexadecimal, and tokens startingwith letters are parsed as integers if possible.</P><P>The declaration of <CODE>hexflag</CODE> shown in the C declarations section ofthe parser file is needed to make it accessible to the actions (see section <A HREF="bison_6.html#SEC36">The C Declarations Section</A>).  You must also write the code in <CODE>yylex</CODE>to obey the flag.</P><H2><A NAME="SEC85" HREF="index.html#SEC85">Lexical Tie-ins and Error Recovery</A></H2><P>Lexical tie-ins make strict demands on any error recovery rules you have.See section <A HREF="bison_9.html#SEC81">Error Recovery</A>.</P><P>The reason for this is that the purpose of an error recovery rule is toabort the parsing of one construct and resume in some larger construct.For example, in C-like languages, a typical error recovery rule is to skiptokens until the next semicolon, and then start a new statement, like this:</P><PRE>stmt:   expr ';'        | IF '(' expr ')' stmt { ... }        ...        error ';'                { hexflag = 0; }        ;</PRE><P>If there is a syntax error in the middle of a <SAMP>`hex (<VAR>expr</VAR>)'</SAMP>construct, this error rule will apply, and then the action for thecompleted <SAMP>`hex (<VAR>expr</VAR>)'</SAMP> will never run.  So <CODE>hexflag</CODE> wouldremain set for the entire rest of the input, or until the next <CODE>hex</CODE>keyword, causing identifiers to be misinterpreted as integers.</P><P>To avoid this problem the error recovery rule itself clears <CODE>hexflag</CODE>.</P><P>There may also be an error recovery rule that works within expressions.For example, there could be a rule which applies within parenthesesand skips to the close-parenthesis:</P><PRE>expr:   ...        | '(' expr ')'                { $$ = $2; }        | '(' error ')'        ...</PRE><P>If this rule acts within the <CODE>hex</CODE> construct, it is not going to abortthat construct (since it applies to an inner level of parentheses withinthe construct).  Therefore, it should not clear the flag: the rest ofthe <CODE>hex</CODE> construct should be parsed with the flag still in effect.</P><P>What if there is an error recovery rule which might abort out of the<CODE>hex</CODE> construct or might not, depending on circumstances?  There is noway you can write the action to determine whether a <CODE>hex</CODE> construct isbeing aborted or not.  So if you are using a lexical tie-in, you had bettermake sure your error recovery rules are not of this kind.  Each rule mustbe such that you can be sure that it always will, or always won't, have toclear the flag.</P><HR>Go to the <A HREF="bison_1.html">first</A>, <A HREF="bison_9.html">previous</A>, <A HREF="bison_11.html">next</A>, <A HREF="bison_15.html">last</A> section, <A HREF="index.html">table of contents</A>.</BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -