📄 flex.texi
字号:
scanner which expands include files (the @samp{<<EOF>>} feature
is discussed below):
@example
/* the "incl" state is used for picking up the name
* of an include file
*/
%x incl
%@{
#define MAX_INCLUDE_DEPTH 10
YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
int include_stack_ptr = 0;
%@}
%%
include BEGIN(incl);
[a-z]+ ECHO;
[^a-z\n]*\n? ECHO;
<incl>[ \t]* /* eat the whitespace */
<incl>[^ \t\n]+ @{ /* got the include file name */
if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
@{
fprintf( stderr, "Includes nested too deeply" );
exit( 1 );
@}
include_stack[include_stack_ptr++] =
YY_CURRENT_BUFFER;
yyin = fopen( yytext, "r" );
if ( ! yyin )
error( @dots{} );
yy_switch_to_buffer(
yy_create_buffer( yyin, YY_BUF_SIZE ) );
BEGIN(INITIAL);
@}
<<EOF>> @{
if ( --include_stack_ptr < 0 )
@{
yyterminate();
@}
else
@{
yy_delete_buffer( YY_CURRENT_BUFFER );
yy_switch_to_buffer(
include_stack[include_stack_ptr] );
@}
@}
@end example
Three routines are available for setting up input buffers
for scanning in-memory strings instead of files. All of
them create a new input buffer for scanning the string,
and return a corresponding @code{YY_BUFFER_STATE} handle (which
you should delete with @samp{yy_delete_buffer()} when done with
it). They also switch to the new buffer using
@samp{yy_switch_to_buffer()}, so the next call to @samp{yylex()} will
start scanning the string.
@table @samp
@item yy_scan_string(const char *str)
scans a NUL-terminated string.
@item yy_scan_bytes(const char *bytes, int len)
scans @code{len} bytes (including possibly NUL's) starting
at location @var{bytes}.
@end table
Note that both of these functions create and scan a @emph{copy}
of the string or bytes. (This may be desirable, since
@samp{yylex()} modifies the contents of the buffer it is
scanning.) You can avoid the copy by using:
@table @samp
@item yy_scan_buffer(char *base, yy_size_t size)
which scans in place the buffer starting at @var{base},
consisting of @var{size} bytes, the last two bytes of
which @emph{must} be @code{YY_END_OF_BUFFER_CHAR} (ASCII NUL).
These last two bytes are not scanned; thus,
scanning consists of @samp{base[0]} through @samp{base[size-2]},
inclusive.
If you fail to set up @var{base} in this manner (i.e.,
forget the final two @code{YY_END_OF_BUFFER_CHAR} bytes),
then @samp{yy_scan_buffer()} returns a nil pointer instead
of creating a new input buffer.
The type @code{yy_size_t} is an integral type to which you
can cast an integer expression reflecting the size
of the buffer.
@end table
@node End-of-file rules, Miscellaneous, Multiple buffers, Top
@section End-of-file rules
The special rule "<<EOF>>" indicates actions which are to
be taken when an end-of-file is encountered and yywrap()
returns non-zero (i.e., indicates no further files to
process). The action must finish by doing one of four
things:
@itemize -
@item
assigning @code{yyin} to a new input file (in previous
versions of flex, after doing the assignment you
had to call the special action @code{YY_NEW_FILE}; this is
no longer necessary);
@item
executing a @code{return} statement;
@item
executing the special @samp{yyterminate()} action;
@item
or, switching to a new buffer using
@samp{yy_switch_to_buffer()} as shown in the example
above.
@end itemize
<<EOF>> rules may not be used with other patterns; they
may only be qualified with a list of start conditions. If
an unqualified <<EOF>> rule is given, it applies to @emph{all}
start conditions which do not already have <<EOF>>
actions. To specify an <<EOF>> rule for only the initial
start condition, use
@example
<INITIAL><<EOF>>
@end example
These rules are useful for catching things like unclosed
comments. An example:
@example
%x quote
%%
@dots{}other rules for dealing with quotes@dots{}
<quote><<EOF>> @{
error( "unterminated quote" );
yyterminate();
@}
<<EOF>> @{
if ( *++filelist )
yyin = fopen( *filelist, "r" );
else
yyterminate();
@}
@end example
@node Miscellaneous, User variables, End-of-file rules, Top
@section Miscellaneous macros
The macro @code{YY_USER_ACTION} can be defined to provide an
action which is always executed prior to the matched
rule's action. For example, it could be #define'd to call
a routine to convert yytext to lower-case. When
@code{YY_USER_ACTION} is invoked, the variable @code{yy_act} gives the
number of the matched rule (rules are numbered starting
with 1). Suppose you want to profile how often each of
your rules is matched. The following would do the trick:
@example
#define YY_USER_ACTION ++ctr[yy_act]
@end example
where @code{ctr} is an array to hold the counts for the different
rules. Note that the macro @code{YY_NUM_RULES} gives the total number
of rules (including the default rule, even if you use @samp{-s}, so
a correct declaration for @code{ctr} is:
@example
int ctr[YY_NUM_RULES];
@end example
The macro @code{YY_USER_INIT} may be defined to provide an action
which is always executed before the first scan (and before
the scanner's internal initializations are done). For
example, it could be used to call a routine to read in a
data table or open a logging file.
The macro @samp{yy_set_interactive(is_interactive)} can be used
to control whether the current buffer is considered
@emph{interactive}. An interactive buffer is processed more slowly,
but must be used when the scanner's input source is indeed
interactive to avoid problems due to waiting to fill
buffers (see the discussion of the @samp{-I} flag below). A
non-zero value in the macro invocation marks the buffer as
interactive, a zero value as non-interactive. Note that
use of this macro overrides @samp{%option always-interactive} or
@samp{%option never-interactive} (see Options below).
@samp{yy_set_interactive()} must be invoked prior to beginning to
scan the buffer that is (or is not) to be considered
interactive.
The macro @samp{yy_set_bol(at_bol)} can be used to control
whether the current buffer's scanning context for the next
token match is done as though at the beginning of a line.
A non-zero macro argument makes rules anchored with
The macro @samp{YY_AT_BOL()} returns true if the next token
scanned from the current buffer will have '^' rules
active, false otherwise.
In the generated scanner, the actions are all gathered in
one large switch statement and separated using @code{YY_BREAK},
which may be redefined. By default, it is simply a
"break", to separate each rule's action from the following
rule's. Redefining @code{YY_BREAK} allows, for example, C++
users to #define YY_BREAK to do nothing (while being very
careful that every rule ends with a "break" or a
"return"!) to avoid suffering from unreachable statement
warnings where because a rule's action ends with "return",
the @code{YY_BREAK} is inaccessible.
@node User variables, YACC interface, Miscellaneous, Top
@section Values available to the user
This section summarizes the various values available to
the user in the rule actions.
@itemize -
@item
@samp{char *yytext} holds the text of the current token.
It may be modified but not lengthened (you cannot
append characters to the end).
If the special directive @samp{%array} appears in the
first section of the scanner description, then
@code{yytext} is instead declared @samp{char yytext[YYLMAX]},
where @code{YYLMAX} is a macro definition that you can
redefine in the first section if you don't like the
default value (generally 8KB). Using @samp{%array}
results in somewhat slower scanners, but the value
of @code{yytext} becomes immune to calls to @samp{input()} and
@samp{unput()}, which potentially destroy its value when
@code{yytext} is a character pointer. The opposite of
@samp{%array} is @samp{%pointer}, which is the default.
You cannot use @samp{%array} when generating C++ scanner
classes (the @samp{-+} flag).
@item
@samp{int yyleng} holds the length of the current token.
@item
@samp{FILE *yyin} is the file which by default @code{flex} reads
from. It may be redefined but doing so only makes
sense before scanning begins or after an EOF has
been encountered. Changing it in the midst of
scanning will have unexpected results since @code{flex}
buffers its input; use @samp{yyrestart()} instead. Once
scanning terminates because an end-of-file has been
seen, you can assign @code{yyin} at the new input file and
then call the scanner again to continue scanning.
@item
@samp{void yyrestart( FILE *new_file )} may be called to
point @code{yyin} at the new input file. The switch-over
to the new file is immediate (any previously
buffered-up input is lost). Note that calling
@samp{yyrestart()} with @code{yyin} as an argument thus throws
away the current input buffer and continues
scanning the same input file.
@item
@samp{FILE *yyout} is the file to which @samp{ECHO} actions are
done. It can be reassigned by the user.
@item
@code{YY_CURRENT_BUFFER} returns a @code{YY_BUFFER_STATE} handle
to the current buffer.
@item
@code{YY_START} returns an integer value corresponding to
the current start condition. You can subsequently
use this value with @code{BEGIN} to return to that start
condition.
@end itemize
@node YACC interface, Options, User variables, Top
@section Interfacing with @code{yacc}
One of the main uses of @code{flex} is as a companion to the @code{yacc}
parser-generator. @code{yacc} parsers expect to call a routine
named @samp{yylex()} to find the next input token. The routine
is supposed to return the type of the next token as well
as putting any associated value in the global @code{yylval}. To
use @code{flex} with @code{yacc}, one specifies the @samp{-d} option to @code{yacc} to
instruct it to generate the file @file{y.tab.h} containing
definitions of all the @samp{%tokens} appearing in the @code{yacc} input.
This file is then included in the @code{flex} scanner. For
example, if one of the tokens is "TOK_NUMBER", part of the
scanner might look like:
@example
%@{
#include "y.tab.h"
%@}
%%
[0-9]+ yylval = atoi( yytext ); return TOK_NUMBER;
@end example
@node Options, Performance, YACC interface, Top
@section Options
@code{flex} has the following options:
@table @samp
@item -b
Generate backing-up information to @file{lex.backup}.
This is a list of scanner states which require
backing up and the input characters on which they
do so. By adding rules one can remove backing-up
states. If @emph{all} backing-up states are eliminated
and @samp{-Cf} or @samp{-CF} is used, the generated scanner will
run faster (see the @samp{-p} flag). Only users who wish
to squeeze every last cycle out of their scanners
need worry about this option. (See the section on
Performance Considerations below.)
@item -c
is a do-nothing, deprecated option included for
POSIX compliance.
@item -d
makes the generated scanner run in @dfn{debug} mode.
Whenever a pattern is recognized and the global
@code{yy_flex_debug} is non-zero (which is the default),
the scanner will write to @code{stderr} a line of the
form:
@example
--accepting rule at line 53 ("the matched text")
@end example
The line number refers to the location of the rule
in the file defining the scanner (i.e., the file
that was fed to flex). Messages are also generated
when the scanner backs up, accepts the default
rule, reaches the end of its input buffer (or
encounters a NUL; at this point, the two look the
same as far as the scanner's concerned), or reaches
an end-of-file.
@item -f
specifies @dfn{fast scanner}. No table compression is
done and stdio is bypassed. The result is large
but fast. This option is equivalent to @samp{-Cfr} (see
below).
@item -h
generates a "help" summary of @code{flex's} options to
@code{stdout} and then exits. @samp{-?} and @samp{--help} are synonyms
for @samp{-h}.
@item -i
instructs @code{flex} to generate a @emph{case-insensitive}
scanner. The case of letters given in the @code{flex} input
patterns will be ignored, and tokens in the input
will be matched regardless of case. The matched
text given in @code{yytext} will have the preserved case
(i.e., it will not be folded).
@item -l
turns on maximum compatibility with the original
AT&T @code{lex} implementation. Note that this does not
mean @emph{full} compatibility. Use of this option costs
a considerable amount of performance, and it cannot
be used with the @samp{-+, -f, -F, -Cf}, or @samp{-CF} options.
For details on the compatibilities it provides, see
the section "Incompatibilities With Lex And POSIX"
below. This option also results in the name
@code{YY_FLEX_LEX_COMPAT} being #define'd in the generated
scanner.
@item -n
is another do-nothing, deprecated option included
only for POSIX compliance.
@item -p
generates a performance report to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -