📄 lookahead.html
字号:
<HTML><!--Copyright 漏 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. hasintellectual property rights relating to technology embodied in the productthat is described in this document. In particular, and without limitation,these intellectual property rights may include one or more of the U.S.patents listed at http://www.sun.com/patents and one or more additionalpatents or pending patent applications in the U.S. and in other countries.U.S. Government Rights - Commercial software. Government users are subjectto the Sun Microsystems, Inc. standard license agreement and applicableprovisions of the FAR and its supplements. Use is subject to license terms.Sun, Sun Microsystems, the Sun logo and Java are trademarks or registeredtrademarks of Sun Microsystems, Inc. in the U.S. and other countries. Thisproduct is covered and controlled by U.S. Export Control laws and may besubject to the export or import laws in other countries. Nuclear, missile,chemical biological weapons or nuclear maritime end uses or end users, whetherdirect or indirect, are strictly prohibited. Export or reexport to countriessubject to U.S. embargo or to entities identified on U.S. export exclusionlists, including, but not limited to, the denied persons and speciallydesignated nationals lists is strictly prohibited.--><HEAD> <title>JavaCC: LOOKAHEAD MiniTutorial</title><!-- Changed by: Michael Van De Vanter, 14-Jan-2003 --></HEAD><BODY bgcolor="#FFFFFF" ><H1>JavaCC [tm]: LOOKAHEAD MiniTutorial</H1><HR><P><STRONG>This minitutorial is under preparation. This tutorial refersto examples that are available in the Lookahead directory under theexamples directory of the release. Currently, this page is a copy ofthe contents of the README file within that directory.</STRONG><P><PRE>This directory contains the tutorial on LOOKAHEAD along with allexamples used in the tutorial.We assume that you have already taken a look at some of the simpleexamples provided in the release before you read this section.1. WHAT IS LOOKAHEAD?The job of a parser is to read an input stream and determine whetheror not the input stream conforms to the grammar.This determination in its most general form can be quite timeconsuming. Consider the following example (file Example1.jj):---------------------------------------------------------------- void Input() : {} { "a" BC() "c" } void BC() : {} { "b" [ "c" ] }In this simple example, it is quite clear that there are exactly twostrings that match the above grammar, namely: abc abccThe general way to perform this match is to walk through the grammarbased on the string as follows. Here, we use "abc" as the inputstring:Step 1. There is only one choice here - the first input charactermust be 'a' - and since that is indeed the case, we are OK.Step 2. We now proceed on to non-terminal BC. Here again, there isonly one choice for the next input character - it must be 'b'. Theinput matches this one too, so we are still OK.Step 3. We now come to a "choice point" in the grammar. We can eithergo inside the [...] and match it, or ignore it altogether. We decideto go inside. So the next input character must be a 'c'. We areagain OK.Step 4. Now we have completed with non-terminal BC and go back tonon-terminal Input. Now the grammar says the next character must beyet another 'c'. But there are no more input characters. So we havea problem.Step 5. When we have such a problem in the general case, we concludethat we may have made a bad choice somewhere. In this case, we madethe bad choice in Step 3. So we retrace our steps back to step 3 andmake another choice and try that. This process is called"backtracking".Step 6. We have now backtracked and made the other choice we couldhave made at Step 3 - namely, ignore the [...]. Now we have completedwith non-terminal BC and go back to non-terminal Input. Now thegrammar says the next character must be yet another 'c'. The nextinput character is a 'c', so we are OK now.Step 7. We realize we have reached the end of the grammar (end ofnon-terminal Input) successfully. This means we have successfullymatched the string "abc" to the grammar.----------------------------------------------------------------As the above example indicates, the general problem of matching aninput with a grammar may result in large amounts of backtracking andmaking new choices and this can consume a lot of time. The amount oftime taken can also be a function of how the grammar is written. Notethat many grammars can be written to cover the same set of inputs - orthe same language (i.e., there can be multiple equivalent grammars forthe same input language).----------------------------------------------------------------For example, the following grammar would speed up the parsing of thesame language as compared to the previous grammar: void Input() : {} { "a" "b" "c" [ "c" ] }while the following grammar slows it down even more since the parserhas to backtrack all the way to the beginning: void Input() : {} { "a" "b" "c" "c" | "a" "b" "c" }One can even have a grammar that looks like the following: void Input() : {} { "a" ( BC1() | BC2() ) } void BC1() : {} { "b" "c" "c" } void BC2() : {} { "b" "c" [ "c" ] }This grammar can match "abcc" in two ways, and is therefore considered"ambiguous".----------------------------------------------------------------The performance hit from such backtracking is unacceptable for mostsystems that include a parser. Hence most parsers do not backtrack inthis general manner (or do not backtrack at all), rather they makedecisions at choice points based on limited information and thencommit to it.Parsers generated by Java Compiler Compiler [tm] make decisions at choicepoints based on some exploration of tokens further ahead in the inputstream, and once they make such a decision, they commit to it. i.e.,No backtracking is performed once a decision is made.The process of exploring tokens further in the input stream is termed"looking ahead" into the input stream - hence our use of the term"LOOKAHEAD".Since some of these decisions may be made with less than perfectinformation (JavaCC [tm] will warn you in these situations, so you don'thave to worry), you need to know something about LOOKAHEAD to makeyour grammar work correctly.The two ways in which you make the choice decisions work properly are:. Modify the grammar to make it simpler.. Insert hints at the more complicated choice points to help the parser make the right choices.----------------------------------------------------------------2. CHOICE POINTS IN JAVACC GRAMMARSThere are 4 different kinds of choice points in JavaCC:. An expansion of the form: ( exp1 | exp2 | ... ). In this case, the generated parser has to somehow determine which of exp1, exp2, etc. to select to continue parsing.. An expansion of the form: ( exp )?. In this case, the generated parser must somehow determine whether to choose exp or to continue beyond the ( exp )? without choosing exp. Note: ( exp )? may also be written as [ exp ].. An expansion of the form ( exp )*. In this case, the generated parser must do the same thing as in the previous case, and furthermore, after each time a successful match of exp (if exp was chosen) is completed, this choice determination must be made again.. An expansion of the form ( exp )+. This is essentially similar to the previous case with a mandatory first match to exp.Remember that token specifications that occur within angularbrackets <...> also have choice points. But these choices are madein different ways and are the subject of a different tutorial.----------------------------------------------------------------3. THE DEFAULT CHOICE DETERMINATION ALGORITHMThe default choice determination algorithm looks ahead 1 token in theinput stream and uses this to help make its choice at choice points.The following examples will describe the default algorithm fully:----------------------------------------------------------------Consider the following grammar (file Example2.jj): void basic_expr() : {} { <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 }The choice determination algorithm works as follows: if (next token is <ID>) { choose Choice 1 } else if (next token is "(") { choose Choice 2 } else if (next token is "new") { choose Choice 3 } else { produce an error message }----------------------------------------------------------------In the above example, the grammar has been written such that thedefault choice determination algorithm does the right thing. Anotherthing to note is that the choice determination algorithm works in atop to bottom order - if Choice 1 was selected, the other choices arenot even considered. While this is not an issue in this example(except for performance), it will become important later below whenlocal ambiguities require the insertion of LOOKAHEAD hints.Suppose the above grammar was modified to (file Example3.jj): void basic_expr() : {} { <ID> "(" expr() ")" // Choice 1 | "(" expr() ")" // Choice 2 | "new" <ID> // Choice 3 | <ID> "." <ID> // Choice 4 }Then the default algorithm will always choose Choice 1 when the nextinput token is <ID> and never choose Choice 4 even if the tokenfollowing <ID> is a ".". More on this later.You can try running the parser generated from Example3.jj on the input"id1.id2". It will complain that it encountered a "." when it wasexpecting a "(". Note - when you built the parser, it would havegiven you the following warning message:Warning: Choice conflict involving two expansions at line 25, column 3 and line 31, column 3 respectively. A common prefix is: <ID> Consider using a lookahead of 2 for earlier expansion.Essentially, JavaCC is saying it has detected a situation in yourgrammar which may cause the default lookahead algorithm to do strangethings. The generated parser will still work using the defaultlookahead algorithm - except that it may not do what you expect of it.----------------------------------------------------------------Now consider the following example (file Example 4.jj): void identifier_list() : {} { <ID> ( "," <ID> )* }Suppose the first <ID> has already been matched and that the parserhas reached the choice point (the (...)* construct). Here's how thechoice determination algorithm works: while (next token is ",") { choose the nested expansion (i.e., go into the (...)* construct) consume the "," token if (next token is <ID>) consume it, otherwise report error }----------------------------------------------------------------In the above example, note that the choice determination algorithmdoes not look beyond the (...)* construct to make its decision.Suppose there was another production in that same grammar as follows(file Example5.jj): void funny_list() : {} { identifier_list() "," <INT> }When the default algorithm is making a choice at ( "," <ID> )*, itwill always go into the (...)* construct if the next token is a ",".It will do this even when identifier_list was called from funny_listand the token after the "," is an <INT>. Intuitively, the right thingto do in this situation is to skip the (...)* construct and return tofunny_list. More on this later.As a concrete example, suppose your input was "id1, id2, 5", theparser will complain that it encountered a 5 when it was expecting an<ID>. Note - when you built the parser, it would have given you thefollowing warning message:Warning: Choice conflict in (...)* construct at line 25, column 8. Expansion nested within construct and expansion following construct have common prefixes, one of which is: "," Consider using a lookahead of 2 or more for nested expansion.Essentially, JavaCC is saying it has detected a situation in yourgrammar which may cause the default lookahead algorithm to do strangethings. The generated parser will still work using the defaultlookahead algorithm - except that it may not do what you expect of it.----------------------------------------------------------------We have shown you examples of two kinds of choice points in theexamples above - "exp1 | exp2 | ...", and "(exp)*". The other twokinds of choice points - "(exp)+" and "(exp)?" - behave similarly to(exp)* and we will not be providing examples of their use here.----------------------------------------------------------------4. MULTIPLE TOKEN LOOKAHEAD SPECIFICATIONSSo far, we have described the default lookahead algorithm of thegenerated parsers. In the majority of situations, the defaultalgorithm works just fine. In situations where it does not workwell, Java Compiler Compiler provides you with warning messages likethe ones shown above. If you have a grammar that goes throughJava Compiler Compiler without producing any warnings, then thegrammar is a LL(1) grammar. Essentially, LL(1) grammars are thosethat can be handled by top-down parsers (such as those generatedby Java Compiler Compiler) using at most one token of LOOKAHEAD.When you get these warning messages, you can do one of two things.----------------------------------------------------------------Option 1You can modify your grammar so that the warning messages go away.That is, you can attempt to make your grammar LL(1) by making somechanges to it.The following (file Example6.jj) shows how you may change Example3.jjto make it LL(1): void basic_expr() : {} { <ID> ( "(" expr() ")" | "." <ID> ) | "(" expr() ")" | "new" <ID> }What we have done here is to factor the fourth choice into the firstchoice. Note how we have placed their common first token <ID> outsidethe parentheses, and then within the parentheses, we have yet anotherchoice which can now be performed by looking at only one token in theinput stream and comparing it with "(" and ".". This process ofmodifying grammars to make them LL(1) is called "left factoring".The following (file Example7.jj) shows how Example5.jj may be changedto make it LL(1): void funny_list() : {} { <ID> "," ( <ID> "," )* <INT> }Note that this change is somewhat more drastic.----------------------------------------------------------------Option 2
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -