📄 flex.texi

📁 Flex词法/语法分析器源码
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
allows actions to begin with @samp{%@{} and will consider theaction to be all the text up to the next @samp{%@}} (regardless ofordinary braces inside the action).An action consisting solely of a vertical bar ('|') means"same as the action for the next rule." See below for anillustration.Actions can include arbitrary C code, including @code{return}statements to return a value to whatever routine called@samp{yylex()}.  Each time @samp{yylex()} is called it continuesprocessing tokens from where it last left off until it eitherreaches the end of the file or executes a return.Actions are free to modify @code{yytext} except for lengtheningit (adding characters to its end--these will overwritelater characters in the input stream).  This however doesnot apply when using @samp{%array} (see above); in that case,@code{yytext} may be freely modified in any way.Actions are free to modify @code{yyleng} except they should notdo so if the action also includes use of @samp{yymore()} (seebelow).There are a number of special directives which can beincluded within an action:@itemize -@item@samp{ECHO} copies yytext to the scanner's output.@item@code{BEGIN} followed by the name of a start conditionplaces the scanner in the corresponding startcondition (see below).@item@code{REJECT} directs the scanner to proceed on to the"second best" rule which matched the input (or aprefix of the input).  The rule is chosen asdescribed above in "How the Input is Matched", and@code{yytext} and @code{yyleng} set up appropriately.  It mayeither be one which matched as much text as theoriginally chosen rule but came later in the @code{flex}input file, or one which matched less text.  Forexample, the following will both count the words inthe input and call the routine special() whenever"frob" is seen:@example        int word_count = 0;%%frob        special(); REJECT;[^ \t\n]+   ++word_count;@end exampleWithout the @code{REJECT}, any "frob"'s in the input wouldnot be counted as words, since the scanner normallyexecutes only one action per token.  Multiple@code{REJECT's} are allowed, each one finding the nextbest choice to the currently active rule.  Forexample, when the following scanner scans the token"abcd", it will write "abcdabcaba" to the output:@example%%a        |ab       |abc      |abcd     ECHO; REJECT;.|\n     /* eat up any unmatched character */@end example(The first three rules share the fourth's actionsince they use the special '|' action.)  @code{REJECT} isa particularly expensive feature in terms ofscanner performance; if it is used in @emph{any} of thescanner's actions it will slow down @emph{all} of thescanner's matching.  Furthermore, @code{REJECT} cannot be usedwith the @samp{-Cf} or @samp{-CF} options (see below).Note also that unlike the other special actions,@code{REJECT} is a @emph{branch}; code immediately following itin the action will @emph{not} be executed.@item@samp{yymore()} tells the scanner that the next time itmatches a rule, the corresponding token should be@emph{appended} onto the current value of @code{yytext} ratherthan replacing it.  For example, given the input"mega-kludge" the following will write"mega-mega-kludge" to the output:@example%%mega-    ECHO; yymore();kludge   ECHO;@end exampleFirst "mega-" is matched and echoed to the output.Then "kludge" is matched, but the previous "mega-"is still hanging around at the beginning of @code{yytext}so the @samp{ECHO} for the "kludge" rule will actuallywrite "mega-kludge".@end itemizeTwo notes regarding use of @samp{yymore()}.  First, @samp{yymore()}depends on the value of @code{yyleng} correctly reflecting thesize of the current token, so you must not modify @code{yyleng}if you are using @samp{yymore()}.  Second, the presence of@samp{yymore()} in the scanner's action entails a minorperformance penalty in the scanner's matching speed.@itemize -@item@samp{yyless(n)} returns all but the first @var{n} characters ofthe current token back to the input stream, wherethey will be rescanned when the scanner looks forthe next match.  @code{yytext} and @code{yyleng} are adjustedappropriately (e.g., @code{yyleng} will now be equal to @var{n}).  For example, on the input "foobar" thefollowing will write out "foobarbar":@example%%foobar    ECHO; yyless(3);[a-z]+    ECHO;@end exampleAn argument of 0 to @code{yyless} will cause the entirecurrent input string to be scanned again.  Unlessyou've changed how the scanner will subsequentlyprocess its input (using @code{BEGIN}, for example), thiswill result in an endless loop.Note that @code{yyless} is a macro and can only be used in theflex input file, not from other source files.@item@samp{unput(c)} puts the character @code{c} back onto the inputstream.  It will be the next character scanned.The following action will take the current tokenand cause it to be rescanned enclosed inparentheses.@example@{int i;/* Copy yytext because unput() trashes yytext */char *yycopy = strdup( yytext );unput( ')' );for ( i = yyleng - 1; i >= 0; --i )    unput( yycopy[i] );unput( '(' );free( yycopy );@}@end exampleNote that since each @samp{unput()} puts the givencharacter back at the @emph{beginning} of the input stream,pushing back strings must be done back-to-front.An important potential problem when using @samp{unput()} is thatif you are using @samp{%pointer} (the default), a call to @samp{unput()}@emph{destroys} the contents of @code{yytext}, starting with itsrightmost character and devouring one character to the leftwith each call.  If you need the value of yytext preservedafter a call to @samp{unput()} (as in the above example), youmust either first copy it elsewhere, or build your scannerusing @samp{%array} instead (see How The Input Is Matched).Finally, note that you cannot put back @code{EOF} to attempt tomark the input stream with an end-of-file.@item@samp{input()} reads the next character from the inputstream.  For example, the following is one way toeat up C comments:@example%%"/*"        @{            register int c;            for ( ; ; )                @{                while ( (c = input()) != '*' &&                        c != EOF )                    ;    /* eat up text of comment */                if ( c == '*' )                    @{                    while ( (c = input()) == '*' )                        ;                    if ( c == '/' )                        break;    /* found the end */                    @}                if ( c == EOF )                    @{                    error( "EOF in comment" );                    break;                    @}                @}            @}@end example(Note that if the scanner is compiled using @samp{C++},then @samp{input()} is instead referred to as @samp{yyinput()},in order to avoid a name clash with the @samp{C++} streamby the name of @code{input}.)@item YY_FLUSH_BUFFERflushes the scanner's internal buffer so that the next time the scannerattempts to match a token, it will first refill the buffer using@code{YY_INPUT} (see The Generated Scanner, below).  This action isa special case of the more general @samp{yy_flush_buffer()} function,described below in the section Multiple Input Buffers.@item@samp{yyterminate()} can be used in lieu of a returnstatement in an action.  It terminates the scannerand returns a 0 to the scanner's caller, indicating"all done".  By default, @samp{yyterminate()} is alsocalled when an end-of-file is encountered.  It is amacro and may be redefined.@end itemize@node Generated scanner, Start conditions, Actions, Top@section The generated scannerThe output of @code{flex} is the file @file{lex.yy.c}, which containsthe scanning routine @samp{yylex()}, a number of tables used byit for matching tokens, and a number of auxiliary routinesand macros.  By default, @samp{yylex()} is declared as follows:@exampleint yylex()    @{    @dots{} various definitions and the actions in here @dots{}    @}@end example(If your environment supports function prototypes, then itwill be "int yylex( void  )".)   This  definition  may  bechanged by defining the "YY_DECL" macro.  For example, youcould use:@example#define YY_DECL float lexscan( a, b ) float a, b;@end exampleto give the scanning routine the name @code{lexscan}, returning afloat, and taking two floats as arguments.  Note that ifyou give arguments to the scanning routine using aK&R-style/non-prototyped function declaration, you mustterminate the definition with a semi-colon (@samp{;}).Whenever @samp{yylex()} is called, it scans tokens from theglobal input file @code{yyin} (which defaults to stdin).  Itcontinues until it either reaches an end-of-file (at whichpoint it returns the value 0) or one of its actionsexecutes a @code{return} statement.If the scanner reaches an end-of-file, subsequent calls are undefinedunless either @code{yyin} is pointed at a new input file (in which casescanning continues from that file), or @samp{yyrestart()} is called.@samp{yyrestart()} takes one argument, a @samp{FILE *} pointer (whichcan be nil, if you've set up @code{YY_INPUT} to scan from a sourceother than @code{yyin}), and initializes @code{yyin} for scanning fromthat file.  Essentially there is no difference between just assigning@code{yyin} to a new input file or using @samp{yyrestart()} to do so;the latter is available for compatibility with previous versions of@code{flex}, and because it can be used to switch input files in themiddle of scanning.  It can also be used to throw away the currentinput buffer, by calling it with an argument of @code{yyin}; butbetter is to use @code{YY_FLUSH_BUFFER} (see above).  Note that@samp{yyrestart()} does @emph{not} reset the start condition to@code{INITIAL} (see Start Conditions, below).If @samp{yylex()} stops scanning due to executing a @code{return}statement in one of the actions, the scanner may then be calledagain and it will resume scanning where it left off.By default (and for purposes of efficiency), the scanneruses block-reads rather than simple @samp{getc()} calls to readcharacters from @code{yyin}.  The nature of how it gets its inputcan be controlled by defining the @code{YY_INPUT} macro.YY_INPUT's calling sequence is"YY_INPUT(buf,result,max_size)".  Its action is to placeup to @var{max_size} characters in the character array @var{buf} andreturn in the integer variable @var{result} either the number ofcharacters read or the constant YY_NULL (0 on Unixsystems) to indicate EOF.  The default YY_INPUT reads fromthe global file-pointer "yyin".A sample definition of YY_INPUT (in the definitionssection of the input file):@example%@{#define YY_INPUT(buf,result,max_size) \    @{ \    int c = getchar(); \    result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \    @}%@}@end exampleThis definition will change the input processing to occurone character at a time.When the scanner receives an end-of-file indication fromYY_INPUT, it then checks the @samp{yywrap()} function.  If@samp{yywrap()} returns false (zero), then it is assumed that thefunction has gone ahead and set up @code{yyin} to point toanother input file, and scanning continues.  If it returnstrue (non-zero), then the scanner terminates, returning 0to its caller.  Note that in either case, the startcondition remains unchanged; it does @emph{not} revert to @code{INITIAL}.If you do not supply your own version of @samp{yywrap()}, then youmust either use @samp{%option noyywrap} (in which case the scannerbehaves as though @samp{yywrap()} returned 1), or you must link with@samp{-lfl} to obtain the default version of the routine, which alwaysreturns 1.Three routines are available for scanning from in-memorybuffers rather than files: @samp{yy_scan_string()},@samp{yy_scan_bytes()}, and @samp{yy_scan_buffer()}.  See the discussionof them below in the section Multiple Input Buffers.The scanner writes its @samp{ECHO} output to the @code{yyout} global(default, stdout), which may be redefined by the usersimply by assigning it to some other @code{FILE} pointer.@node Start conditions, Multiple buffers, Generated scanner, Top@section Start conditions@code{flex} provides a mechanism for conditionally activatingrules.  Any rule whose pattern is prefixed with "<sc>"will only be active when the scanner is in the startcondition named "sc".  For example,@example<STRING>[^"]*        @{ /* eat up the string body ... */            @dots{}            @}@end example@noindentwill be active only when the scanner is in the "STRING"start condition, and@example<INITIAL,STRING,QUOTE>\.        @{ /* handle an escape ... */            @dots{}            @}@end example@noindentwill be active only when the current start condition iseither "INITIAL", "STRING", or "QUOTE".Start conditions are declared in the definitions (first)section of the input using unindented lines beginning witheither @samp{%s} or @samp{%x} followed by a list of names.  The formerdeclares @emph{inclusive} start conditions, the latter @emph{exclusive}start conditions.  A start condition is activated usingthe @code{BEGIN} action.  Until the next @code{BEGIN} action isexecuted, rules with the given start condition will be activeand rules with other start conditions will be inactive.If the start condition is @emph{inclusive}, then rules with nostart conditions at all will also be active.  If it is@emph{exclusive}, then @emph{only} rules qualified with the startcondition will be active.  A set of rules contingent on thesame exclusive start condition describe a scanner which isindependent of any of the other rules in the @code{flex} input.Because of this, exclusive start conditions make it easyto specify "mini-scanners" which scan portions of theinput that are syntactically different from the rest(e.g., comments).If the distinction between inclusive and exclusive startconditions is still a little vague, here's a simpleexample illustrating the connection between the two.  The setof rules:@example%s example%%<example>foo   do_something();bar            something_else();@end example@noindentis equivalent to@example%x example%%
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -