📄 readme

📁 yacc源码
💻
字号:
	     BTYACC -- backtracking yacc	     ===========================BTYACC was created by Chris Dodd using ideas from manyplaces and lots of code from the Berkeley Yaccdistribution, which is a public domain yacc clone puttogether by the good folks at Berkeley.  This code isdistributed with NO WARRANTEE and is public domain.  It iscertain to contain bugs, which you should report to:	chrisd@collins.comVadim Maslov of Siber Systems <vadik@siber.com> modifiedBTYACC to make it suitable for production environment. Specifically, preprocessing commands %define, %ifdef,%endif, %include were added and many bugs were fixed.See the README.BYACC, NEW_FEATURES, and ACKNOWLEDGEMENTSfiles for more about Berkeley Yacc and other sources ofinfo.		Version 2.1 changes		-------------------		  by Vadim Maslov** Added preprocessor statements:%define <variable> %ifdef <variable>%endifThese statements have the same meaning as in C preprocessor.  They allow to introduce modularity and versioning into grammar building.  Preprocessor statement %include was introduced earlier,in ver. 2.0.						Version 2.0 changes		-------------------		  by Vadim Maslov** Added %include file_namecommand.It acts just like #include in C.Helps a lot when the list of tokens used by this grammar is generated by another program.** Changed 16-bit short numbers to 32-bit int numbers ingrammar tables, so that huge grammar tables (tables thatare larger than 32768 elements) resulting from hugegrammars (Cobol grammar, for instance) can work correctly.You need to have 32-bit integer to index table bigger than32768 elements, 16-bit integer is not enough.The original BtYacc just generated non-working tableslarger than 32768 elements without even notifying aboutthe table overflow.** Make error recovery work correctly when error happenswhile processing nested conflicts. Original BtYacc couldinfinitely cycle in certain situations that involved errorrecovery while in nested conflict.More detailed explanation: when we have nested conflicts(conflict that happens while trial-processing anotherconflict), it leads btyacc into NP-complete searching ofconflict tree. The ultimate goal is YYVALID operator thatselects a particular branch of that tree as a valid one.If no YYVALID is found on the tree, then error recoverytakes over.  The problem with this is that error recoveryis started in the same state context that exists on thelast surveyed branch of the conflict tree.  Sometimes thislast branch may be of zero length and it results inrecovering to exactly the same state as existed beforeentering the conflict. BtYacc cycles then.We solved this problem by memorizing the longest path inthe conflict tree while browsing it. If we ever get intoerror recovery, we restore state that existed on thelongest path.  Effectively we say: if we have an error,let us move forward as far as we possibly could while wewere browsing the conflict tree.** Introduce YYVALID_NESTED operation in addition tosimply YYVALID.  When we have a nested conflict (conflictwhile processing in trial mode for another conflict), wewant to relate YYVALID to a particular level of conflictbeing in trial.Since we mostly anticipate only 2-level nested conflictsYYVALID_NESTED tells the parser to satisfy only theinternal conflict.  Therefore, in 1-level conflictsituation YYVALID_NESTED acts like a regular YYVALID, butin 2-level conflict it is a no-op and the other YYVALIDfor outer conflict will be searched for.** Improved grammar error diagnostics.** Improved handling of situation where /tmp directory ismissing.  Original btyacc just died quietly when /tmpdirectory was missing.  We added code that states theproblem explicitly. While on UNIX /tmp directory is alwayspresent, it may be missing on WIN32 systems, thereforediagnosing this situation is important.	BackTracking changes        --------------------	  by Chriss DoddBTYACC is a modified version of yacc that supportsautomatic backtracking and semantic disambiguation toparse ambiguous grammars, as well as syntactic sugar forinherited attributes (which tend to introduce conflicts).Whenever a btyacc generated parser runs into ashift-reduce or reduce-reduce error in the parse table, itremembers the current parse point (yacc stack and inputstream state), and goes into trial parse mode.  It thencontinues parsing, ignoring most rule actions.  If it runsinto an error (either through the parse table or throughan action calling YYERROR), it backtracks to the mostrecent conflict point and tries a different alternative.If it finds a successful parse (reaches the end of theinput or an action calls YYVALID), it backtracks to thepoint where it first entered trial parse mode, andcontinues with a full parse (executing all actions),following the path of the successful trial.Actions in btyacc come in two flavors -- {}-actions, whichare only executed when not in trial mode, and []-actionswhich are executed regardless of mode.  There are alsoinherited attributes, which look like arguments (they areenclosed in "()") and act like []-actions.What this buys you:* No more lexer feedback hack.  In yacc grammars for C, astandard hack, know as the "lexer feedback hack" is usedto find typedef names.  The lexer uses semanticinformation to decide if any given identifier is atypedef-name or not and returns a special token.  Withbtyacc, you no longer need to do this; the lexer shouldjust always return an identifier.  The btyacc grammar thenneeds a rule of the form:typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]While the hack works adequately well for parsing C, itbecomes a nightmare when you try to parse something likeC++, where treating an ID as a typedef becomes heavilydependent on context.* Easy disambiguation via simple ordering.  Btyacc runsits trials via the rule "try shifting first, then tryreducing by the order that the conflicting rules appear inthe input file".  This means you can deal with semantic adisambiguation rule like:    [1] If it looks like a declaration it is, otherwise    [2] If it looks like an expression it is, otherwise    [3] it is a syntax error	[Ellis & Stroustrup, Annotated C++ Reference Manual, p93]To deal with this, you need only put all the rules fordeclarations before the rules for expressions in thegrammar file.* No extra cost if you do not use it.  Backtracking isonly triggered when the parse hits a shift/reduce orreduce/reduce conflict in the table.  If you have noconflicts in your grammar, there is no extra cost, otherthan some extra code which will never be invoked.* C++ and ANSI C compatible parsers.  The parsers producedby btyacc can be compiled with C++ correctly.  If you"#define" YYSTYPE to be some C++ type with constructor anddestructor, everything will work fine.  My favorite is"#define YYSTYPE SmartPointer", where SmartPointer is asmart pointer type that does garbage collection on thepointed to objects.BTYACC was originally written to make it easy to write aC++ parser (my goal was to be able to use the grammar outof the back of the ARM with as few modifications aspossible).  Anyone who has ever looked at Jim Roskindpublic domain C++ yacc grammar, or the yacc-based grammarused in g++ knows how difficult this is.  BTYACC is veryuseful for parsing any ambiguous grammar, particularlyones that come from trying to merge two (or more) completegrammars.Limitations of the backtracking: Currently, the generatedparser does NO pruning of alternate parsing paths.  Toavoid an exponential explosion of possible paths (andparsing time), you need to manually tell the parser whenit can throw away saved paths using YYVALID.  In pratice,this turns out to be fairly easy to do.  A C++ parser (forexample) can juts put a [YYVALID;] after every completedeclaration and statement rule, corresponding to pruningthe backtracking state after seeing a ';' or '}' -- therewill never be a situation in which it is useful tobacktrack past either of these.Inherited attributes in btyacc:Inherited attributes look a lot like function arguments tonon-terminals, which is what they end up being in arecursive descent parser, but NOT how they are implementedin btyacc.  Basically they are just syntactic sugar forembedded semantic actions and $0, $-1, ... in normal yacc.btyacc gives you two big advantages besides just thesyntax:    1. it does type checking on the inherited attributes,        so you do not have to specify $<type>0 and makes sure        you give the correct number of arguments (inherited        attributes) to every use of a non-terminal.    2. It "collapses" identical actions from that are produced        from inherited attributes.  This eliminates many        potential reduce-reduce conflicts arising from        the inherited attributes.You use inherited attributes by declaring the types of theattributes in the preamble with a type declaration anddeclaring names of the attributes on the lhs of the yaccrule.  You can of course have more than one rule with thesame lhs, and you can even give them different names ineach, but the type and number must be the same.Here is a small example:%type <t1> lhs(<t1>, <t2>)  /* lhs takes two inherited attributes */	   stuff(<t1>, <t2>)%%lhs($i1, $i2) : { $$ = $i1 }	      | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }This is roughly equivalent to the following yacc code:lhs : { $$ = $<t1>-1; }    | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff { $$ = $4; }    ;See the file "test/t2.y" for a longer and more completeexample.  At the current time, the start symbol cannothave any arguments.Variant parsers:Btyacc supports the -S flag to use a different parserskeleton, changing the way that the parser is called andused.  The skeleton "push.skel" is included to produce a"passive" parser that you feed tokens to (rather thanhaving the parser call a seperate yylex routine).  Withpush.skel, yyparse is defined as follows:int yyparse(int token, YYSTYPE yylval)You should call yyparse repeatedly with successive tokensof input.  It returns 0 if more input is needed, 1 for asuccessful parse, and -1 for an unrecoverable parse error.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -