📄 flex.1
字号:
.TH FLEX 1 "26 May 1990" "Version 2.3"
.SH NAME
flex - fast lexical analyzer generator
.SH SYNOPSIS
.B flex
.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
.I [filename ...]
.SH DESCRIPTION
.I flex
is a tool for generating
.I scanners:
programs which recognized lexical patterns in text.
.I flex
reads
the given input files, or its standard input if no file names are given,
for a description of a scanner to generate. The description is in
the form of pairs
of regular expressions and C code, called
.I rules. flex
generates as output a C source file,
.B lex.yy.c,
which defines a routine
.B yylex().
This file is compiled and linked with the
.B -lfl
library to produce an executable. When the executable is run,
it analyzes its input for occurrences
of the regular expressions. Whenever it finds one, it executes
the corresponding C code.
.LP
For full documentation, see
.B flexdoc(1).
This manual entry is intended for use as a quick reference.
.SH OPTIONS
.I flex
has the following options:
.TP
.B -b
Generate backtracking information to
.I lex.backtrack.
This is a list of scanner states which require backtracking
and the input characters on which they do so. By adding rules one
can remove backtracking states. If all backtracking states
are eliminated and
.B -f
or
.B -F
is used, the generated scanner will run faster.
.TP
.B -c
is a do-nothing, deprecated option included for POSIX compliance.
.IP
.B NOTE:
in previous releases of
.I flex
.B -c
specified table-compression options. This functionality is
now given by the
.B -C
flag. To ease the the impact of this change, when
.I flex
encounters
.B -c,
it currently issues a warning message and assumes that
.B -C
was desired instead. In the future this "promotion" of
.B -c
to
.B -C
will go away in the name of full POSIX compliance (unless
the POSIX meaning is removed first).
.TP
.B -d
makes the generated scanner run in
.I debug
mode. Whenever a pattern is recognized and the global
.B yy_flex_debug
is non-zero (which is the default), the scanner will
write to
.I stderr
a line of the form:
.nf
--accepting rule at line 53 ("the matched text")
.fi
The line number refers to the location of the rule in the file
defining the scanner (i.e., the file that was fed to flex). Messages
are also generated when the scanner backtracks, accepts the
default rule, reaches the end of its input buffer (or encounters
a NUL; the two look the same as far as the scanner's concerned),
or reaches an end-of-file.
.TP
.B -f
specifies (take your pick)
.I full table
or
.I fast scanner.
No table compression is done. The result is large but fast.
This option is equivalent to
.B -Cf
(see below).
.TP
.B -i
instructs
.I flex
to generate a
.I case-insensitive
scanner. The case of letters given in the
.I flex
input patterns will
be ignored, and tokens in the input will be matched regardless of case. The
matched text given in
.I yytext
will have the preserved case (i.e., it will not be folded).
.TP
.B -n
is another do-nothing, deprecated option included only for
POSIX compliance.
.TP
.B -p
generates a performance report to stderr. The report
consists of comments regarding features of the
.I flex
input file which will cause a loss of performance in the resulting scanner.
.TP
.B -s
causes the
.I default rule
(that unmatched scanner input is echoed to
.I stdout)
to be suppressed. If the scanner encounters input that does not
match any of its rules, it aborts with an error.
.TP
.B -t
instructs
.I flex
to write the scanner it generates to standard output instead
of
.B lex.yy.c.
.TP
.B -v
specifies that
.I flex
should write to
.I stderr
a summary of statistics regarding the scanner it generates.
.TP
.B -F
specifies that the
.ul
fast
scanner table representation should be used. This representation is
about as fast as the full table representation
.ul
(-f),
and for some sets of patterns will be considerably smaller (and for
others, larger). See
.B flexdoc(1)
for details.
.IP
This option is equivalent to
.B -CF
(see below).
.TP
.B -I
instructs
.I flex
to generate an
.I interactive
scanner, that is, a scanner which stops immediately rather than
looking ahead if it knows
that the currently scanned text cannot be part of a longer rule's match.
Again, see
.B flexdoc(1)
for details.
.IP
Note,
.B -I
cannot be used in conjunction with
.I full
or
.I fast tables,
i.e., the
.B -f, -F, -Cf,
or
.B -CF
flags.
.TP
.B -L
instructs
.I flex
not to generate
.B #line
directives in
.B lex.yy.c.
The default is to generate such directives so error
messages in the actions will be correctly
located with respect to the original
.I flex
input file, and not to
the fairly meaningless line numbers of
.B lex.yy.c.
.TP
.B -T
makes
.I flex
run in
.I trace
mode. It will generate a lot of messages to
.I stdout
concerning
the form of the input and the resultant non-deterministic and deterministic
finite automata. This option is mostly for use in maintaining
.I flex.
.TP
.B -8
instructs
.I flex
to generate an 8-bit scanner.
On some sites, this is the default. On others, the default
is 7-bit characters. To see which is the case, check the verbose
.B (-v)
output for "equivalence classes created". If the denominator of
the number shown is 128, then by default
.I flex
is generating 7-bit characters. If it is 256, then the default is
8-bit characters.
.TP
.B -C[efmF]
controls the degree of table compression.
.IP
.B -Ce
directs
.I flex
to construct
.I equivalence classes,
i.e., sets of characters
which have identical lexical properties.
Equivalence classes usually give
dramatic reductions in the final table/object file sizes (typically
a factor of 2-5) and are pretty cheap performance-wise (one array
look-up per character scanned).
.IP
.B -Cf
specifies that the
.I full
scanner tables should be generated -
.I flex
should not compress the
tables by taking advantages of similar transition functions for
different states.
.IP
.B -CF
specifies that the alternate fast scanner representation (described in
.B flexdoc(1))
should be used.
.IP
.B -Cm
directs
.I flex
to construct
.I meta-equivalence classes,
which are sets of equivalence classes (or characters, if equivalence
classes are not being used) that are commonly used together. Meta-equivalence
classes are often a big win when using compressed tables, but they
have a moderate performance impact (one or two "if" tests and one
array look-up per character scanned).
.IP
A lone
.B -C
specifies that the scanner tables should be compressed but neither
equivalence classes nor meta-equivalence classes should be used.
.IP
The options
.B -Cf
or
.B -CF
and
.B -Cm
do not make sense together - there is no opportunity for meta-equivalence
classes if the table is not being compressed. Otherwise the options
may be freely mixed.
.IP
The default setting is
.B -Cem,
which specifies that
.I flex
should generate equivalence classes
and meta-equivalence classes. This setting provides the highest
degree of table compression. You can trade off
faster-executing scanners at the cost of larger tables with
the following generally being true:
.nf
slowest & smallest
-Cem
-Cm
-Ce
-C
-C{f,F}e
-C{f,F}
fastest & largest
.fi
.IP
.B -C
options are not cumulative; whenever the flag is encountered, the
previous -C settings are forgotten.
.TP
.B -Sskeleton_file
overrides the default skeleton file from which
.I flex
constructs its scanners. You'll never need this option unless you are doing
.I flex
maintenance or development.
.SH SUMMARY OF FLEX REGULAR EXPRESSIONS
The patterns in the input are written using an extended set of regular
expressions. These are:
.nf
x match the character 'x'
. any character except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\\n] any character EXCEPT an uppercase letter or
a newline
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
"[xyz]\\"foo"
the literal string: [xyz]"foo
\\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \\x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
\\123 the character with octal value 123
\\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
r/s an r but only if it is followed by an s. The
s is not part of the matched text. This type
of pattern is called as "trailing context".
^r an r, but only at the beginning of a line
r$ an r, but only at the end of a line. Equivalent
to "r/\\n".
<s>r an r, but only in start condition s (see
below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
<<EOF>> an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2
.fi
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence.
.LP
Some notes on patterns:
.IP -
Negated character classes
.I match newlines
unless "\\n" (or an equivalent escape sequence) is one of the
characters explicitly present in the negated character class
(e.g., "[^A-Z\\n]").
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -