📄 changes_from_1.33
字号:
ambiguities you are ok. The bad news is that you may have spent time fixing ambiguities that were not real, or used k=2 when ck=2 might have been sufficient, and so on. The following grammar demonstrates the problem: ------------------------------------------------------------ expr : ID ; start : stmt SEMI ; stmt : CASE expr COLON | expr SEMI | plain_stmt ; plain_stmt : ID COLON ; ------------------------------------------------------------ When compiled with k=1 and ck=2 it will report: warning: alts 2 and 3 of the rule itself ambiguous upon { IDENTIFIER }, { COLON } When antlr analyzes "stmt" it computes the first[1] set of all alternatives. It finds an ambiguity between alts 2 and 3 for ID. It then computes the first[2] set for alternatives 2 and 3 to resolve the ambiguity. In computing the first[2] set of "expr" (which is only one token long) it needs to determine what could follow "expr". Under a certain combination of circumstances antlr forgets that it is trying to analyze "stmt" which can only be followed by SEMI and adds to the first[2] set of "expr" the "global" follow set (including "COLON") which could follow "expr" (under other conditions) in the phrase "CASE expr COLON".#146. (Changed in MR11) Option -treport for locating "difficult" alts It can be difficult to determine which alternatives are causing pccts to work hard to resolve an ambiguity. In some cases the ambiguity is successfully resolved after much CPU time so there is no message at all. A rough measure of the amount of work being peformed which is independent of the CPU speed and system load is the number of tnodes created. Using "-info t" gives information about the total number of tnodes created and the peak number of tnodes. Tree Nodes: peak 1300k created 1416k lost 0 It also puts in the generated C or C++ file the number of tnodes created for a rule (at the end of the rule). However this information is not sufficient to locate the alternatives within a rule which are causing the creation of tnodes. Using: antlr -treport 100000 .... causes antlr to list on stdout any alternatives which require the creation of more than 100,000 tnodes, along with the lookahead sets for those alternatives. The following is a trivial case from the ansi.g grammar which shows the format of the report. This report might be of more interest in cases where 1,000,000 tuples were created to resolve the ambiguity. ------------------------------------------------------------------------- There were 0 tuples whose ambiguity could not be resolved by full lookahead There were 157 tnodes created to resolve ambiguity between: Choice 1: statement/2 line 475 file ansi.g Choice 2: statement/3 line 476 file ansi.g Intersection of lookahead[1] sets: IDENTIFIER Intersection of lookahead[2] sets: LPARENTHESIS COLON AMPERSAND MINUS STAR PLUSPLUS MINUSMINUS ONESCOMPLEMENT NOT SIZEOF OCTALINT DECIMALINT HEXADECIMALINT FLOATONE FLOATTWO IDENTIFIER STRING CHARACTER -------------------------------------------------------------------------#145. (Documentation) Generation of Expression Trees Item #99 was misleading because it implied that the optimization for tree expressions was available only for trees created by predicate expressions and neglected to mention that it required the use of "-mrhoist on". The optimization applies to tree expressions created for grammars with k>1 and for predicates with lookahead depth >1. In MR11 the optimized version is always used so the -mrhoist on option need not be specified.#144. (Changed in MR11) Incorrect test for exception group In testing for a rule's exception group the label a pointer is compared against '\0'. The intention is "*pointer". Reported by Jeffrey C. Fried (Jeff@Fried.net).#143. (Changed in MR11) Optional ";" at end of #token statement Fixes problem of: #token X "x" << parser action >> Being confused with: #token X "x" <<lexical action>>#142. (Changed in MR11) class BufFileInput subclass of DLGInputStream Alexey Demakov (demakov@kazbek.ispras.ru) has supplied class BufFileInput derived from DLGInputStream which provides a function lookahead(char *string) to test characters in the input stream more than one character ahead. The default amount of lookahead is specified by the constructor and defaults to 8 characters. This does *not* include the one character of lookahead maintained internally by DLG in member "ch" and which is not available for testing via BufFileInput::lookahead(). This is a useful class for overcoming the one-character-lookahead limitation of DLG without resorting to a lexer capable of backtracking (like flex) which is not integrated with antlr as is DLG. There are no restrictions on copying or using BufFileInput.* except that the authorship and related information must be retained in the source code. The class is located in pccts/h/BufFileInput.* of the kit.#141. (Changed in MR11) ZZDEBUG_CONSUME for ANTLRParser::consume() A debug aid has been added to file ANTLRParser::consume() in file AParser.cpp: #ifdef ZZDEBUG_CONSUME_ACTION zzdebug_consume_action(); #endif Suggested by Sramji Ramanathan (ps@kumaran.com).#140. (Changed in MR11) #pred to define predicates +---------------------------------------------------+ | Note: Assume "-prc on" for this entire discussion | +---------------------------------------------------+ A problem with predicates is that each one is regarded as unique and capable of disambiguating cases where two alternatives have identical lookahead. For example: rule : <<pred(LATEXT(1))>>? A | <<pred(LATEXT(1))>>? A ; will not cause any error messages or warnings to be issued by earlier versions of pccts. To compare the text of the predicates is an incomplete solution. In 1.33MR11 I am introducing the #pred statement in order to solve some problems with predicates. The #pred statement allows one to give a symbolic name to a "predicate literal" or a "predicate expression" in order to refer to it in other predicate expressions or in the rules of the grammar. The predicate literal associated with a predicate symbol is C or C++ code which can be used to test the condition. A predicate expression defines a predicate symbol in terms of other predicate symbols using "!", "&&", and "||". A predicate symbol can be defined in terms of a predicate literal, a predicate expression, or *both*. When a predicate symbol is defined with both a predicate literal and a predicate expression, the predicate literal is used to generate code, but the predicate expression is used to check for two alternatives with identical predicates in both alternatives. Here are some examples of #pred statements: #pred IsLabel <<isLabel(LATEXT(1))>>? #pred IsLocalVar <<isLocalVar(LATEXT(1))>>? #pred IsGlobalVar <<isGlobalVar(LATEXT(1)>>? #pred IsVar <<isVar(LATEXT(1))>>? IsLocalVar || IsGlobalVar #pred IsScoped <<isScoped(LATEXT(1))>>? IsLabel || IsLocalVar I hope that the use of EBNF notation to describe the syntax of the #pred statement will not cause problems for my readers (joke). predStatement : "#pred" CapitalizedName ( "<<predicate_literal>>?" | "<<predicate_literal>>?" predOrExpr | predOrExpr ) ; predOrExpr : predAndExpr ( "||" predAndExpr ) * ; predAndExpr : predPrimary ( "&&" predPrimary ) * ; predPrimary : CapitalizedName | "!" predPrimary | "(" predOrExpr ")" ; What is the purpose of this nonsense ? To understand how predicate symbols help, you need to realize that predicate symbols are used in two different ways with two different goals. a. Allow simplification of predicates which have been combined during predicate hoisting. b. Allow recognition of identical predicates which can't disambiguate alternatives with common lookahead. First we will discuss goal (a). Consider the following rule: rule0: rule1 | ID | ... ; rule1: rule2 | rule3 ; rule2: <<isX(LATEXT(1))>>? ID ; rule3: <<!isX(LATEXT(1)>>? ID ; When the predicates in rule2 and rule3 are combined by hoisting to create a prediction expression for rule1 the result is: if ( LA(1)==ID && ( isX(LATEXT(1) || !isX(LATEXT(1) ) ) { rule1(); ... This is inefficient, but more importantly, can lead to false assumptions that the predicate expression distinguishes the rule1 alternative with some other alternative with lookahead ID. In MR11 one can write: #pred IsX <<isX(LATEXT(1))>>? ... rule2: <<IsX>>? ID ; rule3: <<!IsX>>? ID ; During hoisting MR11 recognizes this as a special case and eliminates the predicates. The result is a prediction expression like the following: if ( LA(1)==ID ) { rule1(); ... Please note that the following cases which appear to be equivalent *cannot* be simplified by MR11 during hoisting because the hoisting logic only checks for a "!" in the predicate action, not in the predicate expression for a predicate symbol. *Not* equivalent and is not simplified during hoisting: #pred IsX <<isX(LATEXT(1))>>? #pred NotX <<!isX(LATEXT(1))>>? ... rule2: <<IsX>>? ID ; rule3: <<NotX>>? ID ; *Not* equivalent and is not simplified during hoisting: #pred IsX <<isX(LATEXT(1))>>? #pred NotX !IsX ... rule2: <<IsX>>? ID ; rule3: <<NotX>>? ID ; Now we will discuss goal (b). When antlr discovers that there is a lookahead ambiguity between two alternatives it attempts to resolve the ambiguity by searching for predicates in both alternatives. In the past any predicate would do, even if the same one appeared in both alternatives: rule: <<p(LATEXT(1))>>? X | <<p(LATEXT(1))>>? X ; The #pred statement is a start towards solving this problem. During ambiguity resolution (*not* predicate hoisting) the predicates for the two alternatives are expanded and compared. Consider the following example: #pred Upper <<isUpper(LATEXT(1))>>? #pred Lower <<isLower(LATEXT(1))>>? #pred Alpha <<isAlpha(LATEXT(1))>>? Upper || Lower rule0: rule1 | <<Alpha>>? ID ; rule1: | rule2 | rule3 ... ; rule2: <<Upper>>? ID; rule3: <<Lower>>? ID; The definition of #pred Alpha expresses: a. to test the predicate use the C code "isAlpha(LATEXT(1))" b. to analyze the predicate use the information that Alpha is equivalent to the union of Upper and Lower, During ambiguity resolution the definition of Alpha is expanded into "Upper || Lower" and compared with the predicate in the other alternative, which is also "Upper || Lower". Because they are identical MR11 will report a problem. ------------------------------------------------------------------------- t10.g, line 5: warning: the predicates used to disambiguate rule rule0 (file t10.g alt 1 line 5 and alt 2 line 6) are identical when compared without context and may have no
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -