📄 flex.man
字号:
"[xyz]\"foo"
the literal string: [xyz]"foo
\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
\0 a NUL character (ASCII code 0)
\123 the character with octal value 123
\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
Version 2.5 Last change: April 1995 6
FLEX(1) USER COMMANDS FLEX(1)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
r/s an r but only if it is followed by an s. The
text matched by s is included when determining
whether this rule is the "longest match",
but is then returned to the input before
the action is executed. So the action only
sees the text matched by r. This type
of pattern is called trailing context".
(There are some combinations of r/s that flex
cannot match correctly; see notes in the
Deficiencies / Bugs section below regarding
"dangerous trailing context".)
^r an r, but only at the beginning of a line (i.e.,
which just starting to scan, or right after a
newline has been scanned).
r$ an r, but only at the end of a line (i.e., just
before a newline). Equivalent to "r/\n".
Note that flex's notion of "newline" is exactly
whatever the C compiler used to compile flex
interprets '\n' as; in particular, on some DOS
systems you must either filter out \r's in the
input yourself, or explicitly use r/\r\n for "r$".
<s>r an r, but only in start condition s (see
below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
<*>r an r in any start condition, even an exclusive one.
<<EOF>> an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2
Note that inside of a character class, all regular expres-
sion operators lose their special meaning except escape
('\') and the character class operators, '-', ']', and, at
the beginning of the class, '^'.
The regular expressions listed above are grouped according
to precedence, from highest precedence at the top to lowest
at the bottom. Those grouped together have equal pre-
cedence. For example,
Version 2.5 Last change: April 1995 7
FLEX(1) USER COMMANDS FLEX(1)
foo|bar*
is the same as
(foo)|(ba(r*))
since the '*' operator has higher precedence than concatena-
tion, and concatenation higher than alternation ('|'). This
pattern therefore matches either the string "foo" or the
string "ba" followed by zero-or-more r's. To match "foo" or
zero-or-more "bar"'s, use:
foo|(bar)*
and to match zero-or-more "foo"'s-or-"bar"'s:
(foo|bar)*
In addition to characters and ranges of characters, charac-
ter classes can also contain character class expressions.
These are expressions enclosed inside [: and :] delimiters
(which themselves must appear between the '[' and ']' of the
character class; other elements may occur inside the charac-
ter class, too). The valid expressions are:
[:alnum:] [:alpha:] [:blank:]
[:cntrl:] [:digit:] [:graph:]
[:lower:] [:print:] [:punct:]
[:space:] [:upper:] [:xdigit:]
These expressions all designate a set of characters
equivalent to the corresponding standard C isXXX function.
For example, [:alnum:] designates those characters for which
isalnum() returns true - i.e., any alphabetic or numeric.
Some systems don't provide isblank(), so flex defines
[:blank:] as a blank or a tab.
For example, the following character classes are all
equivalent:
[[:alnum:]]
[[:alpha:][:digit:]
[[:alpha:]0-9]
[a-zA-Z0-9]
If your scanner is case-insensitive (the -i flag), then
[:upper:] and [:lower:] are equivalent to [:alpha:].
Some notes on patterns:
- A negated character class such as the example "[^A-Z]"
Version 2.5 Last change: April 1995 8
FLEX(1) USER COMMANDS FLEX(1)
above will match a newline unless "\n" (or an
equivalent escape sequence) is one of the characters
explicitly present in the negated character class
(e.g., "[^A-Z\n]"). This is unlike how many other reg-
ular expression tools treat negated character classes,
but unfortunately the inconsistency is historically
entrenched. Matching newlines means that a pattern
like [^"]* can match the entire input unless there's
another quote in the input.
- A rule can have at most one instance of trailing con-
text (the '/' operator or the '$' operator). The start
condition, '^', and "<<EOF>>" patterns can only occur
at the beginning of a pattern, and, as well as with '/'
and '$', cannot be grouped inside parentheses. A '^'
which does not occur at the beginning of a rule or a
'$' which does not occur at the end of a rule loses its
special properties and is treated as a normal charac-
ter.
The following are illegal:
foo/bar$
<sc1>foo<sc2>bar
Note that the first of these, can be written
"foo/bar\n".
The following will result in '$' or '^' being treated
as a normal character:
foo|(bar$)
foo|^bar
If what's wanted is a "foo" or a bar-followed-by-a-
newline, the following could be used (the special '|'
action is explained below):
foo |
bar$ /* action goes here */
A similar trick will work for matching a foo or a bar-
at-the-beginning-of-a-line.
HOW THE INPUT IS MATCHED
When the generated scanner is run, it analyzes its input
looking for strings which match any of its patterns. If it
finds more than one match, it takes the one matching the
most text (for trailing context rules, this includes the
length of the trailing part, even though it will then be
returned to the input). If it finds two or more matches of
the same length, the rule listed first in the flex input
Version 2.5 Last change: April 1995 9
FLEX(1) USER COMMANDS FLEX(1)
file is chosen.
Once the match is determined, the text corresponding to the
match (called the token) is made available in the global
character pointer yytext, and its length in the global
integer yyleng. The action corresponding to the matched pat-
tern is then executed (a more detailed description of
actions follows), and then the remaining input is scanned
for another match.
If no match is found, then the default rule is executed: the
next character in the input is considered matched and copied
to the standard output. Thus, the simplest legal flex input
is:
%%
which generates a scanner that simply copies its input (one
character at a time) to its output.
Note that yytext can be defined in two different ways:
either as a character pointer or as a character array. You
can control which definition flex uses by including one of
the special directives %pointer or %array in the first
(definitions) section of your flex input. The default is
%pointer, unless you use the -l lex compatibility option, in
which case yytext will be an array. The advantage of using
%pointer is substantially faster scanning and no buffer
overflow when matching very large tokens (unless you run out
of dynamic memory). The disadvantage is that you are res-
tricted in how your actions can modify yytext (see the next
section), and calls to the unput() function destroys the
present contents of yytext, which can be a considerable
porting headache when moving between different lex versions.
The advantage of %array is that you can then modify yytext
to your heart's content, and calls to unput() do not destroy
yytext (see below). Furthermore, existing lex programs
sometimes access yytext externally using declarations of the
form:
extern char yytext[];
This definition is erroneous when used with %pointer, but
correct for %array.
%array defines yytext to be an array of YYLMAX characters,
which defaults to a fairly large value. You can change the
size by simply #define'ing YYLMAX to a different value in
the first section of your flex input. As mentioned above,
with %pointer yytext grows dynamically to accommodate large
tokens. While this means your %pointer scanner can accommo-
date very large tokens (such as matching entire blocks of
comments), bear in mind that each time the scanner must
Version 2.5 Last change: April 1995 10
FLEX(1) USER COMMANDS FLEX(1)
resize yytext it also must rescan the entire token from the
beginning, so matching such tokens can prove slow. yytext
presently does not dynamically grow if a call to unput()
results in too much text being pushed back; instead, a run-
time error results.
Also note that you cannot use %array with C++ scanner
classes (the c++ option; see below).
ACTIONS
Each pattern in a rule has a corresponding action, which can
be any arbitrary C statement. The pattern ends at the first
non-escaped whitespace character; the remainder of the line
is its action. If the action is empty, then when the pat-
tern is matched the input token is simply discarded. For
example, here is the specification for a program which
deletes all occurrences of "zap me" from its input:
%%
"zap me"
(It will copy all other characters in the input to the out-
put since they will be matched by the default rule.)
Here is a program which compresses multiple blanks and tabs
down to a single blank, and throws away whitespace found at
the end of a line:
%%
[ \t]+ putchar( ' ' );
[ \t]+$ /* ignore this token */
If the action contains a '{', then the action spans till the
balancing '}' is found, and the action may cross multiple
lines. flex knows about C strings and comments and won't be
fooled by braces found within them, but also allows actions
to begin with %{ and will consider the action to be all the
text up to the next %} (regardless of ordinary braces inside
the action).
An action consisting solely of a vertical bar ('|') means
"same as the action for the next rule." See below for an
illustration.
Actions can include arbitrary C code, including return
statements to return a value to whatever routine called
yylex(). Each time yylex() is called it continues processing
tokens from where it last left off until it either reaches
the end of the file or executes a return.
Version 2.5 Last change: April 1995 11
FLEX(1) USER COMMANDS FLEX(1)
Actions are free to modify yytext except for lengthening it
(adding characters to its end--these will overwrite later
characters in the input stream). This however does not
apply when using %array (see above); in that case, yytext
may be freely modified in any way.
Actions are free to modify yyleng except they should not do
so if the action also includes use of yymore() (see below).
There are a number of special directives which can be
included within an action:
- ECHO copies yytext to the scanner's output.
- BEGIN followed by the name of a start condition places
the scanner in the corresponding start condition (see
below).
- REJECT directs the scanner to proceed on to the "second
best" rule which matched the input (or a prefix of the
input). The rule is chosen as described above in "How
the Input is Matched", and yytext and yyleng set up
appropriately. It may either be one which matched as
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -