📄 lookahead.html
字号:
You can provide the generated parser with some hints to help it outin the non-LL(1) situations that the warning messages bring to yourattention.All such hints are specified using either setting the global LOOKAHEADvalue to a larger value (see below) or by using the LOOKAHEAD(...)construct to provide a local hint.A design decision must be made to determine if Option 1 or Option 2 isthe right one to take. The only advantage of choosing Option 1 isthat it makes your grammar perform better. JavaCC generated parserscan handle LL(1) constructs much faster than other constructs.However, the advantage of choosing Option 2 is that you have a simplergrammar - one that is easier to develop and maintain - one thatfocuses on human-friendliness and not machine-friendliness.Sometimes Option 2 is the only choice - especially in the presence ofuser actions. Suppose Example3.jj contained actions as shown below: void basic_expr() : {} { { initMethodTables(); } <ID> "(" expr() ")" | "(" expr() ")" | "new" <ID> | { initObjectTables(); } <ID> "." <ID> }Since the actions are different, left-factoring cannot be performed.----------------------------------------------------------------4.1. SETTING A GLOBAL LOOKAHEAD SPECIFICATIONYou can set a global LOOKAHEAD specification by using the option"LOOKAHEAD" either from the command line, or at the beginning of thegrammar file in the options section. The value of this option is aninteger which is the number of tokens to look ahead when making choicedecisions. As you may have guessed, the default value of this optionis 1 - which derives the default LOOKAHEAD algorithm described above.Suppose you set the value of this option to 2. Then the LOOKAHEADalgorithm derived from this looks at two tokens (instead of just onetoken) before making a choice decision. Hence, in Example3.jj, choice1 will be taken only if the next two tokens are <ID> and "(", whilechoice 4 will be taken only if the next two tokens are <ID> and ".".Hence, the parser will now work properly for Example3.jj. Similarly,the problem with Example5.jj also goes away since the parser goes intothe (...)* construct only when the next two tokens are "," and <ID>.By setting the global LOOKAHEAD to 2, the parsing algorithmessentially becomes LL(2). Since you can set the global LOOKAHEAD toany value, parsers generated by Java Compiler Compiler are calledLL(k) parsers.----------------------------------------------------------------4.2. SETTING A LOCAL LOOKAHEAD SPECIFICATIONYou can also set a local LOOKAHEAD specification that affects only aspecific choice point. This way, the majority of the grammar canremain LL(1) and hence perform better, while at the same time one getsthe flexibility of LL(k) grammars. Here's how Example3.jj is modifiedwith local LOOKAHEAD to fix the choice ambiguity problem (fileExample8.jj): void basic_expr() : {} { LOOKAHEAD(2) <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 | <ID> "." <ID> // Choice 4 }Only the first choice (the first condition in the translation below)is affected by the LOOKAHEAD specification. All others continue touse a single token of LOOKAHEAD: if (next 2 tokens are <ID> and "(" ) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else if (next token is <ID>) { choose Choice 4 } else { produce an error message }Similarly, Example5.jj can be modified as shown below (fileExample9.jj): void identifier_list() : {} { <ID> ( LOOKAHEAD(2) "," <ID> )* }Note, the LOOKAHEAD specification has to occur inside the (...)* whichis the choice is being made. The translation for this construct isshown below (after the first <ID> has been consumed): while (next 2 tokens are "," and <ID>) { choose the nested expansion (i.e., go into the (...)* construct) consume the "," token consume the <ID> token }----------------------------------------------------------------We strongly discourage you from modifying the global LOOKAHEADdefault. Most grammars are predominantly LL(1), hence you will beunnecessarily degrading performance by converting the entire grammarto LL(k) to facilitate just some portions of the grammar that are notLL(1). If your grammar and input files being parsed are very small,then this is okay.You should also keep in mind that the warning messages JavaCC printswhen it detects ambiguities at choice points (such as the two messagesshown earlier) simply tells you that the specified choice points arenot LL(1). JavaCC does not verify the correctness of your localLOOKAHEAD specification - it assumes you know what you are doing, infact, it really cannot verify the correctness of local LOOKAHEAD's asthe following example of if statements illustrates (fileExample10.jj): void IfStm() : {} { "if" C() S() [ "else" S() ] } void S() : {} { ... | IfStm() }This example is the famous "dangling else" problem. If you have aprogram that looks like: "if C1 if C2 S1 else S2"The "else S2" can be bound to either of the two if statements. Thestandard interpretation is that it is bound to the inner if statement(the one closest to it). The default choice determination algorithmhappens to do the right thing, but it still prints the followingwarning message:Warning: Choice conflict in [...] construct at line 25, column 15. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "else" Consider using a lookahead of 2 or more for nested expansion.To suppress the warning message, you could simply tell JavaCC thatyou know what you are doing as follows: void IfStm() : {} { "if" C() S() [ LOOKAHEAD(1) "else" S() ] }To force lookahead ambiguity checking in such instances, set the optionFORCE_LA_CHECK to true.----------------------------------------------------------------5. SYNTACTIC LOOKAHEADConsider the following production taken from the Java grammar: void TypeDeclaration() : {} { ClassDeclaration() | InterfaceDeclaration() }At the syntactic level, ClassDeclaration can start with any number of"abstract"s, "final"s, and "public"s. While a subsequent semanticcheck will produce error messages for multiple uses of the samemodifier, this does not happen until parsing is completely over.Similarly, InterfaceDeclaration can start with any number of"abstract"s and "public"s.What if the next tokens in the input stream are a very large number of"abstract"s (say 100 of them) followed by "interface"? It is clearthat a fixed amount of LOOKAHEAD (such as LOOKAHEAD(100) for example)will not suffice. One can argue that this is such a weird situationthat it does not warrant any reasonable error message and that it isokay to make the wrong choice in some pathological situations. Butsuppose one wanted to be precise about this.The solution here is to set the LOOKAHEAD to infinity - that is set nobounds on the number of tokens to look ahead. One way to do this isto use a very large integer value (such as the largest possibleinteger) as follows: void TypeDeclaration() : {} { LOOKAHEAD(2147483647) ClassDeclaration() | InterfaceDeclaration() }One can also achieve the same effect with "syntactic LOOKAHEAD". Insyntactic LOOKAHEAD, you specify an expansion to try out and it thatsucceeds, then the following choice is taken. The above example isrewritten using syntactic LOOKAHEAD below: void TypeDeclaration() : {} { LOOKAHEAD(ClassDeclaration()) ClassDeclaration() | InterfaceDeclaration() }Essentially, what this is saying is: if (the tokens from the input stream match ClassDeclaration) { choose ClassDeclaration() } else if (next token matches InterfaceDeclaration) { choose InterfaceDeclaration() } else { produce an error message }The problem with the above syntactic LOOKAHEAD specification is thatthe LOOKAHEAD calculation takes too much time and does a lot ofunnecessary checking. In this case, the LOOKAHEAD calculation canstop as soon as the token "class" is encountered, but thespecification forces the calculation to continue until the end of theclass declaration has been reached - which is rather time consuming.This problem can be solved by placing a shorter expansion to try outin the syntactic LOOKAHEAD specification as in the following example: void TypeDeclaration() : {} { LOOKAHEAD( ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() }Essentially, what this is saying is: if (the nest set of tokens from the input stream are a sequence of "abstract"s, "final"s, and "public"s followed by a "class") { choose ClassDeclaration() } else if (next token matches InterfaceDeclaration) { choose InterfaceDeclaration() } else { produce an error message }By doing this, you make the choice determination algorithm stop assoon as it sees "class" - i.e., make its decision at the earliestpossible time.You can place a bound on the number of tokens to consume duringsyntactic lookahead as follows: void TypeDeclaration() : {} { LOOKAHEAD(10, ( "abstract" | "final" | "public" )* "class" ) ClassDeclaration() | InterfaceDeclaration() }In this case, the LOOKAHEAD determination is not permitted to go beyond10 tokens. If it reaches this limit and is still successfully matching( "abstract" | "final" | "public" )* "class", then ClassDeclaration isselected.Actually, when such a limit is not specified, it defaults to the largestinteger value (2147483647).----------------------------------------------------------------6. SEMANTIC LOOKAHEADLet us go back to Example1.jj: void Input() : {} { "a" BC() "c" } void BC() : {} { "b" [ "c" ] }Let us suppose that there is a good reason for writing a grammar thisway (maybe the way actions are embedded). As noted earlier, thisgrammar recognizes two string "abc" and "abcc". The problem here isthat the default LL(1) algorithm will choose the [ "c" ] every timeit sees a "c" and therefore "abc" will never be matched. We need tospecify that this choice must be made only when the next token is a"c", and the token following that is not a "c". This is a negativestatement - one that cannot be made using syntactic LOOKAHEAD.We can use semantic LOOKAHEAD for this purpose. With semanticLOOKAHEAD, you can specify any arbitrary boolean expression whoseevaluation determines which choice to take at a choice point. Theabove example can be instrumented with semantic LOOKAHEAD as follows: void BC() : {} { "b" [ LOOKAHEAD( { getToken(1).kind == C && getToken(2).kind != C } ) <C:"c"> ] }First we give the token "c" a label C so that we can refer to it fromthe semantic LOOKAHEAD. The boolean expression essentially states thedesired property. The choice determination decision is therefore: if (next token is "c" and following token is not "c") { choose the nested expansion (i.e., go into the [...] construct) } else { go beyond the [...] construct without entering it. }This example can be rewritten to combine both syntactic and semanticLOOKAHEAD as follows (recognize the first "c" using syntacticLOOKAHEAD and the absence of the second using semantic LOOKAHEAD): void BC() : {} { "b" [ LOOKAHEAD( "c", { getToken(2).kind != C } ) <C:"c"> ] }----------------------------------------------------------------7. GENERAL STRUCTURE OF LOOKAHEADWe've pretty much covered the various aspects of LOOKAHEAD in theprevious sections. A couple of advanced topics follow. However,we shall now present a formal language reference for LOOKAHEAD inJava Compiler Compiler:The general structure of a LOOKAHEAD specification is: LOOKAHEAD( amount, expansion, { boolean_expression } )"amount" specifies the number of tokens to LOOKAHEAD,"expansion"specifies the expansion to use to perform syntactic LOOKAHEAD, and"boolean_expression" is the expression to use for semanticLOOKAHEAD.At least one of the three entries must be present. If more thanone are present, they are separated by commas. The default valuesfor each of these entities is defined below:"amount": - if "expansion is present, this defaults to 2147483647. - otherwise ("boolean_expression" must be present then) this defaults to 0.Note: When "amount" is 0, no syntactic LOOKAHEAD is performed. Also,"amount" does not affect the semantic LOOKAHEAD."expansion":- defaults to the expansion being considered."boolean_expression":- defaults to true.----------------------------------------------------------------8. NESTED EVALUATION OF SEMANTIC LOOKAHEADTBD----------------------------------------------------------------9. JAVACODE PRODUCTIONSTBD----------------------------------------------------------------</PRE><P></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -