📄 flex.man
字号:
FLEX(1) USER COMMANDS FLEX(1)
NAME
flex - fast lexical analyzer generator
SYNOPSIS
flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix
-Sskeleton] [--help --version] [filename ...]
OVERVIEW
This manual describes flex, a tool for generating programs
that perform pattern-matching on text. The manual includes
both tutorial and reference sections:
Description
a brief overview of the tool
Some Simple Examples
Format Of The Input File
Patterns
the extended regular expressions used by flex
How The Input Is Matched
the rules for determining what has been matched
Actions
how to specify what to do when a pattern is matched
The Generated Scanner
details regarding the scanner that flex produces;
how to control the input source
Start Conditions
introducing context into your scanners, and
managing "mini-scanners"
Multiple Input Buffers
how to manipulate multiple input sources; how to
scan from strings instead of files
End-of-file Rules
special rules for matching the end of the input
Miscellaneous Macros
a summary of macros available to the actions
Values Available To The User
a summary of values available to the actions
Interfacing With Yacc
connecting flex scanners together with yacc parsers
Version 2.5 Last change: April 1995 1
FLEX(1) USER COMMANDS FLEX(1)
Options
flex command-line options, and the "%option"
directive
Performance Considerations
how to make your scanner go as fast as possible
Generating C++ Scanners
the (experimental) facility for generating C++
scanner classes
Incompatibilities With Lex And POSIX
how flex differs from AT&T lex and the POSIX lex
standard
Diagnostics
those error messages produced by flex (or scanners
it generates) whose meanings might not be apparent
Files
files used by flex
Deficiencies / Bugs
known problems with flex
See Also
other documentation, related tools
Author
includes contact information
DESCRIPTION
flex is a tool for generating scanners: programs which
recognized lexical patterns in text. flex reads the given
input files, or its standard input if no file names are
given, for a description of a scanner to generate. The
description is in the form of pairs of regular expressions
and C code, called rules. flex generates as output a C
source file, lex.yy.c, which defines a routine yylex(). This
file is compiled and linked with the -lfl library to produce
an executable. When the executable is run, it analyzes its
input for occurrences of the regular expressions. Whenever
it finds one, it executes the corresponding C code.
SOME SIMPLE EXAMPLES
First some simple examples to get the flavor of how one uses
flex. The following flex input specifies a scanner which
whenever it encounters the string "username" will replace it
with the user's login name:
%%
Version 2.5 Last change: April 1995 2
FLEX(1) USER COMMANDS FLEX(1)
username printf( "%s", getlogin() );
By default, any text not matched by a flex scanner is copied
to the output, so the net effect of this scanner is to copy
its input file to its output with each occurrence of "user-
name" expanded. In this input, there is just one rule.
"username" is the pattern and the "printf" is the action.
The "%%" marks the beginning of the rules.
Here's another simple example:
int num_lines = 0, num_chars = 0;
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
}
This scanner counts the number of characters and the number
of lines in its input (it produces no output other than the
final report on the counts). The first line declares two
globals, "num_lines" and "num_chars", which are accessible
both inside yylex() and in the main() routine declared after
the second "%%". There are two rules, one which matches a
newline ("\n") and increments both the line count and the
character count, and one which matches any character other
than a newline (indicated by the "." regular expression).
A somewhat more complicated example:
/* scanner for a toy Pascal-like language */
%{
/* need this for the call to atof() below */
#include <math.h>
%}
DIGIT [0-9]
ID [a-z][a-z0-9]*
%%
{DIGIT}+ {
printf( "An integer: %s (%d)\n", yytext,
atoi( yytext ) );
Version 2.5 Last change: April 1995 3
FLEX(1) USER COMMANDS FLEX(1)
}
{DIGIT}+"."{DIGIT}* {
printf( "A float: %s (%g)\n", yytext,
atof( yytext ) );
}
if|then|begin|end|procedure|function {
printf( "A keyword: %s\n", yytext );
}
{ID} printf( "An identifier: %s\n", yytext );
"+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
"{"[^}\n]*"}" /* eat up one-line comments */
[ \t\n]+ /* eat up whitespace */
. printf( "Unrecognized character: %s\n", yytext );
%%
main( argc, argv )
int argc;
char **argv;
{
++argv, --argc; /* skip over program name */
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yylex();
}
This is the beginnings of a simple scanner for a language
like Pascal. It identifies different types of tokens and
reports on what it has seen.
The details of this example will be explained in the follow-
ing sections.
FORMAT OF THE INPUT FILE
The flex input file consists of three sections, separated by
a line with just %% in it:
definitions
%%
rules
%%
user code
Version 2.5 Last change: April 1995 4
FLEX(1) USER COMMANDS FLEX(1)
The definitions section contains declarations of simple name
definitions to simplify the scanner specification, and
declarations of start conditions, which are explained in a
later section.
Name definitions have the form:
name definition
The "name" is a word beginning with a letter or an under-
score ('_') followed by zero or more letters, digits, '_',
or '-' (dash). The definition is taken to begin at the
first non-white-space character following the name and con-
tinuing to the end of the line. The definition can subse-
quently be referred to using "{name}", which will expand to
"(definition)". For example,
DIGIT [0-9]
ID [a-z][a-z0-9]*
defines "DIGIT" to be a regular expression which matches a
single digit, and "ID" to be a regular expression which
matches a letter followed by zero-or-more letters-or-digits.
A subsequent reference to
{DIGIT}+"."{DIGIT}*
is identical to
([0-9])+"."([0-9])*
and matches one-or-more digits followed by a '.' followed by
zero-or-more digits.
The rules section of the flex input contains a series of
rules of the form:
pattern action
where the pattern must be unindented and the action must
begin on the same line.
See below for a further description of patterns and actions.
Finally, the user code section is simply copied to lex.yy.c
verbatim. It is used for companion routines which call or
are called by the scanner. The presence of this section is
optional; if it is missing, the second %% in the input file
may be skipped, too.
In the definitions and rules sections, any indented text or
text enclosed in %{ and %} is copied verbatim to the output
Version 2.5 Last change: April 1995 5
FLEX(1) USER COMMANDS FLEX(1)
(with the %{}'s removed). The %{}'s must appear unindented
on lines by themselves.
In the rules section, any indented or %{} text appearing
before the first rule may be used to declare variables which
are local to the scanning routine and (after the declara-
tions) code which is to be executed whenever the scanning
routine is entered. Other indented or %{} text in the rule
section is still copied to the output, but its meaning is
not well-defined and it may well cause compile-time errors
(this feature is present for POSIX compliance; see below for
other such features).
In the definitions section (but not in the rules section),
an unindented comment (i.e., a line beginning with "/*") is
also copied verbatim to the output up to the next "*/".
PATTERNS
The patterns in the input are written using an extended set
of regular expressions. These are:
x match the character 'x'
. any character (byte) except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -