📄 tokenmanager.html
字号:
<HTML><!--Copyright 漏 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. hasintellectual property rights relating to technology embodied in the productthat is described in this document. In particular, and without limitation,these intellectual property rights may include one or more of the U.S.patents listed at http://www.sun.com/patents and one or more additionalpatents or pending patent applications in the U.S. and in other countries.U.S. Government Rights - Commercial software. Government users are subjectto the Sun Microsystems, Inc. standard license agreement and applicableprovisions of the FAR and its supplements. Use is subject to license terms.Sun, Sun Microsystems, the Sun logo and Java are trademarks or registeredtrademarks of Sun Microsystems, Inc. in the U.S. and other countries. Thisproduct is covered and controlled by U.S. Export Control laws and may besubject to the export or import laws in other countries. Nuclear, missile,chemical biological weapons or nuclear maritime end uses or end users, whetherdirect or indirect, are strictly prohibited. Export or reexport to countriessubject to U.S. embargo or to entities identified on U.S. export exclusionlists, including, but not limited to, the denied persons and speciallydesignated nationals lists is strictly prohibited.--><HEAD> <title>JavaCC: TokenManager MiniTutorial</title><!-- Changed by: Michael Van De Vanter, 14-Jan-2003 --></HEAD><BODY bgcolor="#FFFFFF" ><H1>JavaCC [tm]: TokenManager MiniTutorial</H1><PRE>The JavaCC [tm] lexical specification is organized into a set of "lexicalstates". Each lexical state is named with an identifier. There is astandard lexical state called DEFAULT. The generated token manager isat any moment in one of these lexical states. When the token manageris initialized, it starts off in the DEFAULT state, by default. Thestarting lexical state can also be specified as a parameter whileconstructing a token manager object.Each lexical state contains an ordered list of regular expressions;the order is derived from the order of occurrence in the input file.There are four kinds of regular expressions: SKIP, MORE, TOKEN, andSPECIAL_TOKEN.All regular expressions that occur as expansion units in the grammarare considered to be in the DEFAULT lexical state and their order ofoccurrence is determined by their position in the grammar file.A token is matched as follows: All regular expressions in the currentlexical state are considered as potential match candidates. Thetoken manager consumes the maximum number of characters from the inputstream possible that match one of these regular expressions. That is,the token manager prefers the longest possible match. If there aremultiple longest matches (of the same length), the regular expressionthat is matched is the one with the earliest order of occurrence inthe grammar file.As mentioned above, the token manager is in exactly one state at anymoment. At this moment, the token manager only considers the regularexpressions defined in this state for matching purposes. After a match,one can specify an action to be executed as well as a new lexicalstate to move to. If a new lexical state is not specified, the tokenmanager remains in the current state.The regular expression kind specifies what to do when a regularexpression has been successfully matched:SKIP: Simply throw away the matched string (after executing any lexical action).MORE: Continue (to whatever the next state is) taking the matched string along. This string will be a prefix of the new matched string.TOKEN: Create a token using the matched string and send it to the parser (or any caller).SPECIAL_TOKEN: Creates a special token that does not participate in parsing. Already described earlier.(The mechanism of accessing special tokens is at the end of thispage)Whenever the end of file <EOF> is detected, it causes the creation ofan <EOF> token (regardless of the current state of the lexicalanalyzer). However, if an <EOF> is detected in the middle of a matchfor a regular expression, or immediately after a MORE regularexpression has been matched, an error is reported.After the regular expression is matched, the lexical action isexecuted. All the variables (and methods) declared in theTOKEN_MGR_DECLS region (see below) are available here for use. Inaddition, the variables and methods listed below are also availablefor use.Immediately after this, the token manager changes state to thatspecified (if any).After that the action specified by the kind of the regular expressionis taken (SKIP, MORE, ... ). If the kind is TOKEN, the matched tokenis returned. If the kind is SPECIAL_TOKEN, the matched token is savedto be returned along with the next TOKEN that is matched.-------------------------------------------------------------------The following variables are available for use within lexical actions:1. StringBuffer image (READ/WRITE):"image" (different from the "image" field of the matched token) is aStringBuffer variable that contains all the characters that have beenmatched since the last SKIP, TOKEN, or SPECIAL_TOKEN. You are freeto make whatever changes you wish to it so long as you do not assignit to null (since this variable is used by the generated token manageralso). If you make changes to "image", this change is passed on tosubsequent matches (if the current match is a MORE). The content of"image" *does not* automatically get assigned to the "image" fieldof the matched token. If you wish this to happen, you must explicitlyassign it in a lexical action of a TOKEN or SPECIAL_TOKEN regularexpression.Example:<DEFAULT> MORE : { "a" : S1 }<S1> MORE :{ "b" { int l = image.length()-1; image.setCharAt(l, image.charAt(l).toUpperCase()); } ^1 ^2 : S2}<S2> TOKEN :{ "cd" { x = image; } : DEFAULT ^3}In the above example, the value of "image" at the 3 points marked by^1, ^2, and ^3 are:At ^1: "ab"At ^2: "aB"At ^3: "aBcd"2. int lengthOfMatch (READ ONLY):This is the length of the current match (is not cumulative over MORE's).See example below. You should not modify this variable.Example:Using the same example as above, the values of "lengthOfMatch" are:At ^1: 1 (the size of "b")At ^2: 1 (does not change due to lexical actions)At ^3: 2 (the size of "cd")3. int curLexState (READ ONLY):This is the index of the current lexical state. You should not modifythis variable. Integer constants whose names are those of the lexicalstate are generated into the ...Constants file, so you can refer tolexical states without worrying about their actual index value.4. inputStream (READ ONLY):This is an input stream of the appropriate type (one ofASCII_CharStream, ASCII_UCodeESC_CharStream, UCode_CharStream, orUCode_UCodeESC_CharStream depending on the values of optionsUNICODE_INPUT and JAVA_UNICODE_ESCAPE). The stream is currently atthe last character consumed for this match. Methods of inputStreamcan be called. For example, getEndLine and getEndColumn can be calledto get the line and column number information for the current match.inputStream may not be modified.5. Token matchedToken (READ/WRITE):This variable may be used only in actions associated with TOKEN andSPECIAL_TOKEN regular expressions. This is set to be the token thatwill get returned to the parser. You may change this variable andthereby cause the changed token to be returned to the parser insteadof the original one. It is here that you can assign the value ofvariable "image" to "matchedToken.image". Typically that's how yourchanges to "image" has effect outside the lexical actions.Example:If we modify the last regular expression specification of theabove example to:<S2> TOKEN :{ "cd" { matchedToken.image = image.toString(); } : DEFAULT}Then the token returned to the parser will have its ".image" fieldset to "aBcd". If this assignment was not performed, then the".image" field will remain as "abcd".6. void SwitchTo(int):Calling this method switches you to the specified lexical state. Thismethod may be called from parser actions also (in addition to beingcalled from lexical actions). However, care must be taken when usingthis method to switch states from the parser since the lexicalanalysis could be many tokens ahead of the parser in the presence oflarge lookaheads. When you use this method within a lexical action,you must ensure that it is the last statement executed in the action(otherwise, strange things could happen). If there is a state changespecified using the ": state" syntax, it overrides all switchTo calls,hence there is no point having a switchTo call when there is anexplicit state change specified. In general, calling this methodshould be resorted to only when you cannot do it any other way. Usingthis method of switching states also causes you to lose some of thesemantic checking that JavaCC does when you use the standard syntax.-------------------------------------------------------------------Lexical actions have access to a set of class level declarations.These declarations are introduced within the JavaCC file using thefollowing syntax:token_manager_decls ::= "TOKEN_MGR_DECLS" ":" "{" java_declarations_and_code "}"These declarations are accessible from all lexical actions.EXAMPLES--------Example 1: CommentsSKIP :{ "/*" : WithinComment}<WithinComment> SKIP :{ "*/" : DEFAULT}<WithinComment> MORE :{ <~[]>}Example 2: String Literals with actions to print the length of thestring:TOKEN_MGR_DECLS :{ int stringSize;}MORE :{ "\"" {stringSize = 0;} : WithinString}<WithinString> TOKEN :{ <STRLIT: "\""> {System.out.println("Size = " + stringSize);} : DEFAULT}<WithinString> MORE :{ <~["\n","\r"]> {stringSize++;}}HOW SPECIAL TOKENS ARE SENT TO THE PARSER:Special tokens are like tokens, except that they are permitted toappear anywhere in the input file (between any two tokens). Specialtokens can be specified in the grammar input file using the reservedword "SPECIAL_TOKEN" instead of "TOKEN" as in:SPECIAL_TOKEN :{ <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>}Any regular expression defined to be a SPECIAL_TOKEN may be accessedin a special manner from user actions in the lexical and grammarspecifications. This allows these tokens to be recovered duringparsing while at the same time these tokens do not participate in theparsing.JavaCC has been bootstrapped to use this feature to automaticallycopy relevant comments from the input grammar file into the generatedfiles.Details:The class Token now has an additional field: Token specialToken;This field points to the special token immediately prior to thecurrent token (special or otherwise). If the token immediately priorto the current token is a regular token (and not a special token),then this field is set to null. The "next" fields of regular tokenscontinue to have the same meaning - i.e., they point to the nextregular token except in the case of the EOF token where the "next"field is null. The "next" field of special tokens point to thespecial token immediately following the current token. If the tokenimmediately following the current token is a regular token, the "next"field is set to null.This is clarified by the following example. Suppose you wish to printall special tokens prior to the regular token "t" (but only those thatare after the regular token before "t"): if (t.specialToken == null) return; // The above statement determines that there are no special tokens // and returns control to the caller. Token tmp_t = t.specialToken; while (tmp_t.specialToken != null) tmp_t = tmp_t.specialToken; // The above line walks back the special token chain until it // reaches the first special token after the previous regular // token. while (tmp_t != null) { System.out.println(tmp_t.image); tmp_t = tmp_t.next; } // The above loop now walks the special token chain in the forward // direction printing them in the process. </PRE></BODY></HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -