📄 flexdoc.1
字号:
class (e.g., "[^A-Z\n]"). This is unlike how many other regular
expression tools treat negated character classes, but unfortunately
the inconsistency is historically entrenched. Matching newlines
means that a pattern like [^"]* can match an entire input
(overflowing the scanner's input buffer) unless there's another
quote in the input.
- A rule can have at most one instance of trailing context (the '/'
operator or the '$' operator). The start condition, '^', and
"<<EOF>>" patterns can only occur at the beginning of a pattern,
and, as well as with '/' and '$', cannot be grouped inside
parentheses. A '^' which does not occur at the beginning of a rule
or a '$' which does not occur at the end of a rule loses its special
properties and is treated as a normal character.
The following are illegal:
foo/bar$
<sc1>foo<sc2>bar
Note that the first of these, can be written "foo/bar\n".
26 May 1990 6
FLEX(1) Minix Programmer's Manual FLEX(1)
The following will result in '$' or '^' being treated as a normal
character:
foo|(bar$)
foo|^bar
If what's wanted is a "foo" or a bar-followed-by-a-newline, the
following could be used (the special '|' action is explained below):
foo |
bar$ /* action goes here */
A similar trick will work for matching a foo or a bar-at-the-
beginning-of-a-line.
HOW THE INPUT IS MATCHED
When the generated scanner is run, it analyzes its input looking for
strings which match any of its patterns. If it finds more than one
match, it takes the one matching the most text (for trailing context
rules, this includes the length of the trailing part, even though it will
then be returned to the input). If it finds two or more matches of the
same length, the rule listed first in the flex input file is chosen.
Once the match is determined, the text corresponding to the match (called
the token) is made available in the global character pointer yytext, and
its length in the global integer yyleng. The action corresponding to the
matched pattern is then executed (a more detailed description of actions
follows), and then the remaining input is scanned for another match.
If no match is found, then the default rule is executed: the next
character in the input is considered matched and copied to the standard
output. Thus, the simplest legal flex input is:
%%
which generates a scanner that simply copies its input (one character at
a time) to its output.
ACTIONS
Each pattern in a rule has a corresponding action, which can be any
arbitrary C statement. The pattern ends at the first non-escaped
whitespace character; the remainder of the line is its action. If the
action is empty, then when the pattern is matched the input token is
simply discarded. For example, here is the specification for a program
which deletes all occurrences of "zap me" from its input:
%%
"zap me"
(It will copy all other characters in the input to the output since they
26 May 1990 7
FLEX(1) Minix Programmer's Manual FLEX(1)
will be matched by the default rule.)
Here is a program which compresses multiple blanks and tabs down to a
single blank, and throws away whitespace found at the end of a line:
%%
[ \t]+ putchar( ' ' );
[ \t]+$ /* ignore this token */
If the action contains a '{', then the action spans till the balancing
'}' is found, and the action may cross multiple lines. flex knows about
C strings and comments and won't be fooled by braces found within them,
but also allows actions to begin with %{ and will consider the action to
be all the text up to the next %} (regardless of ordinary braces inside
the action).
An action consisting solely of a vertical bar ('|') means "same as the
action for the next rule." See below for an illustration.
Actions can include arbitrary C code, including return statements to
return a value to whatever routine called yylex(). Each time yylex() is
called it continues processing tokens from where it last left off until
it either reaches the end of the file or executes a return. Once it
reaches an end-of-file, however, then any subsequent call to yylex() will
simply immediately return, unless yyrestart() is first called (see
below).
Actions are not allowed to modify yytext or yyleng.
There are a number of special directives which can be included within an
action:
- ECHO copies yytext to the scanner's output.
- BEGIN followed by the name of a start condition places the scanner
in the corresponding start condition (see below).
- REJECT directs the scanner to proceed on to the "second best" rule
which matched the input (or a prefix of the input). The rule is
chosen as described above in "How the Input is Matched", and yytext
and yyleng set up appropriately. It may either be one which matched
as much text as the originally chosen rule but came later in the
flex input file, or one which matched less text. For example, the
following will both count the words in the input and call the
routine special() whenever "frob" is seen:
int word_count = 0;
%%
26 May 1990 8
FLEX(1) Minix Programmer's Manual FLEX(1)
frob special(); REJECT;
[^ \t\n]+ ++word_count;
Without the REJECT, any "frob"'s in the input would not be counted
as words, since the scanner normally executes only one action per
token. Multiple REJECT's are allowed, each one finding the next
best choice to the currently active rule. For example, when the
following scanner scans the token "abcd", it will write "abcdabcaba"
to the output:
%%
a |
ab |
abc |
abcd ECHO; REJECT;
.|\n /* eat up any unmatched character */
(The first three rules share the fourth's action since they use the
special '|' action.) REJECT is a particularly expensive feature in
terms scanner performance; if it is used in any of the scanner's
actions it will slow down all of the scanner's matching.
Furthermore, REJECT cannot be used with the -f or -F options (see
below).
Note also that unlike the other special actions, REJECT is a branch;
code immediately following it in the action will not be executed.
- yymore() tells the scanner that the next time it matches a rule, the
corresponding token should be appended onto the current value of
yytext rather than replacing it. For example, given the input
"mega-kludge" the following will write "mega-mega-kludge" to the
output:
%%
mega- ECHO; yymore();
kludge ECHO;
First "mega-" is matched and echoed to the output. Then "kludge" is
matched, but the previous "mega-" is still hanging around at the
beginning of yytext so the ECHO for the "kludge" rule will actually
write "mega-kludge". The presence of yymore() in the scanner's
action entails a minor performance penalty in the scanner's matching
speed.
- yyless(n) returns all but the first n characters of the current
token back to the input stream, where they will be rescanned when
the scanner looks for the next match. yytext and yyleng are
adjusted appropriately (e.g., yyleng will now be equal to n ). For
example, on the input "foobar" the following will write out
"foobarbar":
26 May 1990 9
FLEX(1) Minix Programmer's Manual FLEX(1)
%%
foobar ECHO; yyless(3);
[a-z]+ ECHO;
An argument of 0 to yyless will cause the entire current input
string to be scanned again. Unless you've changed how the scanner
will subsequently process its input (using BEGIN, for example), this
will result in an endless loop.
- unput(c) puts the character c back onto the input stream. It will
be the next character scanned. The following action will take the
current token and cause it to be rescanned enclosed in parentheses.
{
int i;
unput( ')' );
for ( i = yyleng - 1; i >= 0; --i )
unput( yytext[i] );
unput( '(' );
}
Note that since each unput() puts the given character back at the
beginning of the input stream, pushing back strings must be done
back-to-front.
- input() reads the next character from the input stream. For
example, the following is one way to eat up C comments:
%%
"/*" {
register int c;
for ( ; ; )
{
while ( (c = input()) != '*' &&
c != EOF )
; /* eat up text of comment */
if ( c == '*' )
{
while ( (c = input()) == '*' )
;
if ( c == '/' )
break; /* found the end */
}
if ( c == EOF )
{
error( "EOF in comment" );
break;
26 May 1990 10
FLEX(1) Minix Programmer's Manual FLEX(1)
}
}
}
(Note that if the scanner is compiled using C++, then input() is
instead referred to as yyinput(), in order to avoid a name clash
with the C++ stream by the name of input.)
- yyterminate() can be used in lieu of a return statement in an
action. It terminates the scanner and returns a 0 to the scanner's
caller, indicating "all done". Subsequent calls to the scanner will
immediately return unless preceded by a call to yyrestart() (see
below). By default, yyterminate() is also called when an end-of-
file is encountered. It is a macro and may be redefined.
THE GENERATED SCANNER
The output of flex is the file lex.yy.c, which contains the scanning
routine yylex(), a number of tables used by it for matching tokens, and a
number of auxiliary routines and macros. By default, yylex() is declared
as follows:
int yylex()
{
... various definitions and the actions in here ...
}
(If your environment supports function prototypes, then it will be "int
yylex( void )".) This definition may be changed by redefining the
"YY_DECL" macro. For example, you could use:
#undef YY_DECL
#define YY_DECL float lexscan( a, b ) float a, b;
to give the scanning routine the name lexscan, returning a float, and
taking two floats as arguments. Note that if you give arguments to the
scanning routine using a K&R-style/non-prototyped function declaration,
you must terminate the definition with a semi-colon (;).
Whenever yylex() is called, it scans tokens from the global input file
yyin (which defaults to stdin). It continues until it either reaches an
end-of-file (at which point it returns the value 0) or one of its actions
executes a return statement. In the former case, when called again the
scanner will immediately return unless yyrestart() is called to point
yyin at the new input file. ( yyrestart() takes one argument, a FILE *
pointer.) In the latter case (i.e., when an action executes a return),
the scanner may then be called again and it will resume scanning where it
left off.
26 May 1990 11
FLEX(1) Minix Programmer's Manual FLEX(1)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -