📄 flexdoc.1
字号:
FLEX(1) Minix Programmer's Manual FLEX(1)
NAME
flexdoc - fast lexical analyzer generator
SYNOPSIS
flex [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] [filename ...]
DESCRIPTION
flex is a tool for generating scanners: programs which recognized lexical
patterns in text. flex reads the given input files, or its standard
input if no file names are given, for a description of a scanner to
generate. The description is in the form of pairs of regular expressions
and C code, called rules. flex generates as output a C source file,
lex.yy.c, which defines a routine yylex(). This file is compiled and
linked with the -lfl library to produce an executable. When the
executable is run, it analyzes its input for occurrences of the regular
expressions. Whenever it finds one, it executes the corresponding C
code.
SOME SIMPLE EXAMPLES
First some simple examples to get the flavor of how one uses flex. The
following flex input specifies a scanner which whenever it encounters the
string "username" will replace it with the user's login name:
%%
username printf( "%s", getlogin() );
By default, any text not matched by a flex scanner is copied to the
output, so the net effect of this scanner is to copy its input file to
its output with each occurrence of "username" expanded. In this input,
there is just one rule. "username" is the pattern and the "printf" is
the action. The "%%" marks the beginning of the rules.
Here's another simple example:
int num_lines = 0, num_chars = 0;
%%
\n ++num_lines; ++num_chars;
. ++num_chars;
%%
main()
{
yylex();
printf( "# of lines = %d, # of chars = %d\n",
num_lines, num_chars );
}
This scanner counts the number of characters and the number of lines in
26 May 1990 1
FLEX(1) Minix Programmer's Manual FLEX(1)
its input (it produces no output other than the final report on the
counts). The first line declares two globals, "num_lines" and
"num_chars", which are accessible both inside yylex() and in the main()
routine declared after the second "%%". There are two rules, one which
matches a newline ("\n") and increments both the line count and the
character count, and one which matches any character other than a newline
(indicated by the "." regular expression).
A somewhat more complicated example:
/* scanner for a toy Pascal-like language */
%{
/* need this for the call to atof() below */
#include <math.h>
%}
DIGIT [0-9]
ID [a-z][a-z0-9]*
%%
{DIGIT}+ {
printf( "An integer: %s (%d)\n", yytext,
atoi( yytext ) );
}
{DIGIT}+"."{DIGIT}* {
printf( "A float: %s (%g)\n", yytext,
atof( yytext ) );
}
if|then|begin|end|procedure|function {
printf( "A keyword: %s\n", yytext );
}
{ID} printf( "An identifier: %s\n", yytext );
"+"|"-"|"*"|"/" printf( "An operator: %s\n", yytext );
"{"[^}\n]*"}" /* eat up one-line comments */
[ \t\n]+ /* eat up whitespace */
. printf( "Unrecognized character: %s\n", yytext );
%%
main( argc, argv )
int argc;
26 May 1990 2
FLEX(1) Minix Programmer's Manual FLEX(1)
char **argv;
{
++argv, --argc; /* skip over program name */
if ( argc > 0 )
yyin = fopen( argv[0], "r" );
else
yyin = stdin;
yylex();
}
This is the beginnings of a simple scanner for a language like Pascal.
It identifies different types of tokens and reports on what it has seen.
The details of this example will be explained in the following sections.
FORMAT OF THE INPUT FILE
The flex input file consists of three sections, separated by a line with
just %% in it:
definitions
%%
rules
%%
user code
The definitions section contains declarations of simple name definitions
to simplify the scanner specification, and declarations of start
conditions, which are explained in a later section.
Name definitions have the form:
name definition
The "name" is a word beginning with a letter or an underscore ('_')
followed by zero or more letters, digits, '_', or '-' (dash). The
definition is taken to begin at the first non-white-space character
following the name and continuing to the end of the line. The definition
can subsequently be referred to using "{name}", which will expand to
"(definition)". For example,
DIGIT [0-9]
ID [a-z][a-z0-9]*
defines "DIGIT" to be a regular expression which matches a single digit,
and "ID" to be a regular expression which matches a letter followed by
zero-or-more letters-or-digits. A subsequent reference to
{DIGIT}+"."{DIGIT}*
26 May 1990 3
FLEX(1) Minix Programmer's Manual FLEX(1)
is identical to
([0-9])+"."([0-9])*
and matches one-or-more digits followed by a '.' followed by zero-or-more
digits.
The rules section of the flex input contains a series of rules of the
form:
pattern action
where the pattern must be unindented and the action must begin on the
same line.
See below for a further description of patterns and actions.
Finally, the user code section is simply copied to lex.yy.c verbatim. It
is used for companion routines which call or are called by the scanner.
The presence of this section is optional; if it is missing, the second %%
in the input file may be skipped, too.
In the definitions and rules sections, any indented text or text enclosed
in %{ and %} is copied verbatim to the output (with the %{}'s removed).
The %{}'s must appear unindented on lines by themselves.
In the rules section, any indented or %{} text appearing before the first
rule may be used to declare variables which are local to the scanning
routine and (after the declarations) code which is to be executed
whenever the scanning routine is entered. Other indented or %{} text in
the rule section is still copied to the output, but its meaning is not
well-defined and it may well cause compile-time errors (this feature is
present for POSIX compliance; see below for other such features).
In the definitions section, an unindented comment (i.e., a line beginning
with "/*") is also copied verbatim to the output up to the next "*/".
Also, any line in the definitions section beginning with '#' is ignored,
though this style of comment is deprecated and may go away in the future.
PATTERNS
The patterns in the input are written using an extended set of regular
expressions. These are:
x match the character 'x'
. any character except newline
[xyz] a "character class"; in this case, the pattern
matches either an 'x', a 'y', or a 'z'
[abj-oZ] a "character class" with a range in it; matches
an 'a', a 'b', any letter from 'j' through 'o',
or a 'Z'
26 May 1990 4
FLEX(1) Minix Programmer's Manual FLEX(1)
[^A-Z] a "negated character class", i.e., any character
but those in the class. In this case, any
character EXCEPT an uppercase letter.
[^A-Z\n] any character EXCEPT an uppercase letter or
a newline
r* zero or more r's, where r is any regular expression
r+ one or more r's
r? zero or one r's (that is, "an optional r")
r{2,5} anywhere from two to five r's
r{2,} two or more r's
r{4} exactly 4 r's
{name} the expansion of the "name" definition
(see above)
"[xyz]\"foo"
the literal string: [xyz]"foo
\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
then the ANSI-C interpretation of \x.
Otherwise, a literal 'X' (used to escape
operators such as '*')
\123 the character with octal value 123
\x2a the character with hexadecimal value 2a
(r) match an r; parentheses are used to override
precedence (see below)
rs the regular expression r followed by the
regular expression s; called "concatenation"
r|s either an r or an s
r/s an r but only if it is followed by an s. The
s is not part of the matched text. This type
of pattern is called as "trailing context".
^r an r, but only at the beginning of a line
r$ an r, but only at the end of a line. Equivalent
to "r/\n".
<s>r an r, but only in start condition s (see
below for discussion of start conditions)
<s1,s2,s3>r
same, but in any of start conditions s1,
s2, or s3
<<EOF>> an end-of-file
<s1,s2><<EOF>>
an end-of-file when in start condition s1 or s2
26 May 1990 5
FLEX(1) Minix Programmer's Manual FLEX(1)
The regular expressions listed above are grouped according to precedence,
from highest precedence at the top to lowest at the bottom. Those
grouped together have equal precedence. For example,
foo|bar*
is the same as
(foo)|(ba(r*))
since the '*' operator has higher precedence than concatenation, and
concatenation higher than alternation ('|'). This pattern therefore
matches either the string "foo" or the string "ba" followed by zero-or-
more r's. To match "foo" or zero-or-more "bar"'s, use:
foo|(bar)*
and to match zero-or-more "foo"'s-or-"bar"'s:
(foo|bar)*
Some notes on patterns:
- A negated character class such as the example "[^A-Z]" above will
match a newline unless "\n" (or an equivalent escape sequence) is
one of the characters explicitly present in the negated character
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -