📄 flex.1
字号:
.B [:upper:]and.B [:lower:]are equivalent to.B [:alpha:]..PPSome notes on patterns:.IP -A negated character class such as the example "[^A-Z]"above.I will match a newlineunless "\\n" (or an equivalent escape sequence) is one of thecharacters explicitly present in the negated character class(e.g., "[^A-Z\\n]"). This is unlike how many other regularexpression tools treat negated character classes, but unfortunatelythe inconsistency is historically entrenched.Matching newlines means that a pattern like [^"]* can match the entireinput unless there's another quote in the input..IP -A rule can have at most one instance of trailing context (the '/' operatoror the '$' operator). The start condition, '^', and "<<EOF>>" patternscan only occur at the beginning of a pattern, and, as well as with '/' and '$',cannot be grouped inside parentheses. A '^' which does not occur atthe beginning of a rule or a '$' which does not occur at the end ofa rule loses its special properties and is treated as a normal character..IPThe following are illegal:.nf foo/bar$ <sc1>foo<sc2>bar.fiNote that the first of these, can be written "foo/bar\\n"..IPThe following will result in '$' or '^' being treated as a normal character:.nf foo|(bar$) foo|^bar.fiIf what's wanted is a "foo" or a bar-followed-by-a-newline, the followingcould be used (the special '|' action is explained below):.nf foo | bar$ /* action goes here */.fiA similar trick will work for matching a foo or abar-at-the-beginning-of-a-line..SH HOW THE INPUT IS MATCHEDWhen the generated scanner is run, it analyzes its input lookingfor strings which match any of its patterns. If it finds more thanone match, it takes the one matching the most text (for trailingcontext rules, this includes the length of the trailing part, eventhough it will then be returned to the input). If it finds twoor more matches of the same length, therule listed first in the.I flexinput file is chosen..PPOnce the match is determined, the text corresponding to the match(called the.I token)is made available in the global character pointer.B yytext,and its length in the global integer.B yyleng.The.I actioncorresponding to the matched pattern is then executed (a moredetailed description of actions follows), and then the remaininginput is scanned for another match..PPIf no match is found, then the.I default ruleis executed: the next character in the input is considered matched andcopied to the standard output. Thus, the simplest legal.I flexinput is:.nf %%.fiwhich generates a scanner that simply copies its input (one characterat a time) to its output..PPNote that.B yytextcan be defined in two different ways: either as a character.I pointeror as a character.I array.You can control which definition.I flexuses by including one of the special directives.B %pointeror.B %arrayin the first (definitions) section of your flex input. The default is.B %pointer,unless you use the.B -llex compatibility option, in which case.B yytextwill be an array.The advantage of using.B %pointeris substantially faster scanning and no buffer overflow when matchingvery large tokens (unless you run out of dynamic memory). The disadvantageis that you are restricted in how your actions can modify.B yytext(see the next section), and calls to the.B unput()function destroys the present contents of.B yytext,which can be a considerable porting headache when moving between different.I lexversions..PPThe advantage of.B %arrayis that you can then modify.B yytextto your heart's content, and calls to.B unput()do not destroy.B yytext(see below). Furthermore, existing.I lexprograms sometimes access.B yytextexternally using declarations of the form:.nf extern char yytext[];.fiThis definition is erroneous when used with.B %pointer,but correct for.B %array..PP.B %arraydefines.B yytextto be an array of.B YYLMAXcharacters, which defaults to a fairly large value. You can changethe size by simply #define'ing.B YYLMAXto a different value in the first section of your.I flexinput. As mentioned above, with.B %pointeryytext grows dynamically to accommodate large tokens. While this means your.B %pointerscanner can accommodate very large tokens (such as matching entire blocksof comments), bear in mind that each time the scanner must resize.B yytextit also must rescan the entire token from the beginning, so matching suchtokens can prove slow..B yytextpresently does.I notdynamically grow if a call to.B unput()results in too much text being pushed back; instead, a run-time error results..PPAlso note that you cannot use.B %arraywith C++ scanner classes(the.B c++option; see below)..SH ACTIONSEach pattern in a rule has a corresponding action, which can be anyarbitrary C statement. The pattern ends at the first non-escapedwhitespace character; the remainder of the line is its action. If theaction is empty, then when the pattern is matched the input tokenis simply discarded. For example, here is the specification for a programwhich deletes all occurrences of "zap me" from its input:.nf %% "zap me".fi(It will copy all other characters in the input to the output sincethey will be matched by the default rule.).PPHere is a program which compresses multiple blanks and tabs down toa single blank, and throws away whitespace found at the end of a line:.nf %% [ \\t]+ putchar( ' ' ); [ \\t]+$ /* ignore this token */.fi.PPIf the action contains a '{', then the action spans till the balancing '}'is found, and the action may cross multiple lines..I flex knows about C strings and comments and won't be fooled by braces foundwithin them, but also allows actions to begin with.B %{and will consider the action to be all the text up to the next.B %}(regardless of ordinary braces inside the action)..PPAn action consisting solely of a vertical bar ('|') means "same asthe action for the next rule." See below for an illustration..PPActions can include arbitrary C code, including.B returnstatements to return a value to whatever routine called.B yylex().Each time.B yylex()is called it continues processing tokens from where it last leftoff until it either reachesthe end of the file or executes a return..PPActions are free to modify.B yytextexcept for lengthening it (addingcharacters to its end--these will overwrite later characters in theinput stream). This however does not apply when using.B %array(see above); in that case,.B yytextmay be freely modified in any way..PPActions are free to modify.B yylengexcept they should not do so if the action also includes use of.B yymore()(see below)..PPThere are a number of special directives which can be included withinan action:.IP -.B ECHOcopies yytext to the scanner's output..IP -.B BEGINfollowed by the name of a start condition places the scanner in thecorresponding start condition (see below)..IP -.B REJECTdirects the scanner to proceed on to the "second best" rule which matched theinput (or a prefix of the input). The rule is chosen as describedabove in "How the Input is Matched", and.B yytextand.B yylengset up appropriately.It may either be one which matched as much textas the originally chosen rule but came later in the.I flexinput file, or one which matched less text.For example, the following will both count thewords in the input and call the routine special() whenever "frob" is seen:.nf int word_count = 0; %% frob special(); REJECT; [^ \\t\\n]+ ++word_count;.fiWithout the.B REJECT,any "frob"'s in the input would not be counted as words, since thescanner normally executes only one action per token.Multiple.B REJECT'sare allowed, each one finding the next best choice to the currentlyactive rule. For example, when the following scanner scans the token"abcd", it will write "abcdabcaba" to the output:.nf %% a | ab | abc | abcd ECHO; REJECT; .|\\n /* eat up any unmatched character */.fi(The first three rules share the fourth's action since they usethe special '|' action.).B REJECTis a particularly expensive feature in terms of scanner performance;if it is used in.I anyof the scanner's actions it will slow down.I allof the scanner's matching. Furthermore,.B REJECTcannot be used with the.I -Cfor.I -CFoptions (see below)..IPNote also that unlike the other special actions,.B REJECTis a.I branch;code immediately following it in the action will.I notbe executed..IP -.B yymore()tells the scanner that the next time it matches a rule, the correspondingtoken should be.I appendedonto the current value of.B yytextrather than replacing it. For example, given the input "mega-kludge"the following will write "mega-mega-kludge" to the output:.nf %% mega- ECHO; yymore(); kludge ECHO;.fiFirst "mega-" is matched and echoed to the output. Then "kludge"is matched, but the previous "mega-" is still hanging around at thebeginning of.B yytextso the.B ECHOfor the "kludge" rule will actually write "mega-kludge"..PPTwo notes regarding use of.B yymore().First,.B yymore()depends on the value of.I yylengcorrectly reflecting the size of the current token, so you must notmodify.I yylengif you are using.B yymore().Second, the presence of.B yymore()in the scanner's action entails a minor performance penalty in thescanner's matching speed..IP -.B yyless(n)returns all but the first.I ncharacters of the current token back to the input stream, where theywill be rescanned when the scanner looks for the next match..B yytextand.B yylengare adjusted appropriately (e.g.,.B yylengwill now be equal to.I n). For example, on the input "foobar" the following will write out"foobarbar":.nf %% foobar ECHO; yyless(3); [a-z]+ ECHO;.fiAn argument of 0 to.B yylesswill cause the entire current input string to be scanned again. Unless you'vechanged how the scanner will subsequently process its input (using.B BEGIN,for example), this will result in an endless loop..PPNote that.B yylessis a macro and can only be used in the flex input file, not fromother source files..IP -.B unput(c)puts the character.I cback onto the input stream. It will be the next character scanned.The following action will take the current token and cause itto be rescanned enclosed in parentheses..nf { int i; /* Copy yytext because unput() trashes yytext */ char *yycopy = strdup( yytext ); unput( ')' ); for ( i = yyleng - 1; i >= 0; --i ) unput( yycopy[i] ); unput( '(' ); free( yycopy ); }.fiNote that since each.B unput()puts the given character back at the.I beginningof the input stream, pushing back strings must be done back-to-front..PPAn important potential problem when using.B unput()is that if you are using.B %pointer(the default), a call to.B unput().I destroysthe contents of.I yytext,starting with its rightmost character and devouring one character tothe left with each call. If you need the value of yytext preservedafter a call to.B unput()(as in the above example),you must either first copy it elsewhere, or build your scanner using.B %arrayinstead (see How The Input Is Matched)..PPFinally, note that you cannot put back.B EOFto attempt to mark the input stream with an end-of-file..IP -.B input()reads the next character from the input stream. For example,the following is one way to eat up C comments:.nf %% "/*" { register int c; for ( ; ; ) { while ( (c = input()) != '*' && c != EOF ) ; /* eat up text of comment */ if ( c == '*' ) { while ( (c = input()) == '*' ) ; if ( c == '/' ) break; /* found the end */ } if ( c == EOF ) { error( "EOF in comment" ); break; } } }.fi(Note that if the scanner is compiled using.B C++,then.B input()is instead referred to as.B yyinput(),in order to avoid a name clash with the.B C++stream by the name of.I input.).IP -.B YY_FLUSH_BUFFERflushes the scanner's internal bufferso that the next time the scanner attempts to match a token, it willfirst refill the buffer using.B YY_INPUT(see The Generated Scanner, below). This action is a special caseof the more general.B yy_flush_buffer()function, described below in the section Multiple Input Buffers..IP -.B yyterminate()can be used in lieu of a return statement in an action. It terminatesthe scanner and returns a 0 to the scanner's caller, indicating "all done".By default,.B yyterminate()is also called when an end-of-file is encountered. It is a macro andmay be redefined..SH THE GENERATED SCANNERThe output of.I flex
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -