📄 flex.texi
字号:
start conditions at all will also be active. If it is
@emph{exclusive}, then @emph{only} rules qualified with the start
condition will be active. A set of rules contingent on the
same exclusive start condition describe a scanner which is
independent of any of the other rules in the @code{flex} input.
Because of this, exclusive start conditions make it easy
to specify "mini-scanners" which scan portions of the
input that are syntactically different from the rest
(e.g., comments).
If the distinction between inclusive and exclusive start
conditions is still a little vague, here's a simple
example illustrating the connection between the two. The set
of rules:
@example
%s example
%%
<example>foo do_something();
bar something_else();
@end example
@noindent
is equivalent to
@example
%x example
%%
<example>foo do_something();
<INITIAL,example>bar something_else();
@end example
Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern
in the second example wouldn't be active (i.e., couldn't match) when
in start condition @samp{example}. If we just used @samp{<example>}
to qualify @samp{bar}, though, then it would only be active in
@samp{example} and not in @code{INITIAL}, while in the first example
it's active in both, because in the first example the @samp{example}
starting condition is an @emph{inclusive} (@samp{%s}) start condition.
Also note that the special start-condition specifier @samp{<*>}
matches every start condition. Thus, the above example
could also have been written;
@example
%x example
%%
<example>foo do_something();
<*>bar something_else();
@end example
The default rule (to @samp{ECHO} any unmatched character) remains
active in start conditions. It is equivalent to:
@example
<*>.|\\n ECHO;
@end example
@samp{BEGIN(0)} returns to the original state where only the
rules with no start conditions are active. This state can
also be referred to as the start-condition "INITIAL", so
@samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}. (The
parentheses around the start condition name are not required but
are considered good style.)
@code{BEGIN} actions can also be given as indented code at the
beginning of the rules section. For example, the
following will cause the scanner to enter the "SPECIAL" start
condition whenever @samp{yylex()} is called and the global
variable @code{enter_special} is true:
@example
int enter_special;
%x SPECIAL
%%
if ( enter_special )
BEGIN(SPECIAL);
<SPECIAL>blahblahblah
@dots{}more rules follow@dots{}
@end example
To illustrate the uses of start conditions, here is a
scanner which provides two different interpretations of a
string like "123.456". By default it will treat it as as
three tokens, the integer "123", a dot ('.'), and the
integer "456". But if the string is preceded earlier in
the line by the string "expect-floats" it will treat it as
a single token, the floating-point number 123.456:
@example
%@{
#include <math.h>
%@}
%s expect
%%
expect-floats BEGIN(expect);
<expect>[0-9]+"."[0-9]+ @{
printf( "found a float, = %f\n",
atof( yytext ) );
@}
<expect>\n @{
/* that's the end of the line, so
* we need another "expect-number"
* before we'll recognize any more
* numbers
*/
BEGIN(INITIAL);
@}
[0-9]+ @{
Version 2.5 December 1994 18
printf( "found an integer, = %d\n",
atoi( yytext ) );
@}
"." printf( "found a dot\n" );
@end example
Here is a scanner which recognizes (and discards) C
comments while maintaining a count of the current input line.
@example
%x comment
%%
int line_num = 1;
"/*" BEGIN(comment);
<comment>[^*\n]* /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
<comment>\n ++line_num;
<comment>"*"+"/" BEGIN(INITIAL);
@end example
This scanner goes to a bit of trouble to match as much
text as possible with each rule. In general, when
attempting to write a high-speed scanner try to match as
much possible in each rule, as it's a big win.
Note that start-conditions names are really integer values
and can be stored as such. Thus, the above could be
extended in the following fashion:
@example
%x comment foo
%%
int line_num = 1;
int comment_caller;
"/*" @{
comment_caller = INITIAL;
BEGIN(comment);
@}
@dots{}
<foo>"/*" @{
comment_caller = foo;
BEGIN(comment);
@}
<comment>[^*\n]* /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]* /* eat up '*'s not followed by '/'s */
<comment>\n ++line_num;
<comment>"*"+"/" BEGIN(comment_caller);
@end example
Furthermore, you can access the current start condition
using the integer-valued @code{YY_START} macro. For example, the
above assignments to @code{comment_caller} could instead be
written
@example
comment_caller = YY_START;
@end example
Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that
is what's used by AT&T @code{lex}).
Note that start conditions do not have their own
name-space; %s's and %x's declare names in the same fashion as
#define's.
Finally, here's an example of how to match C-style quoted
strings using exclusive start conditions, including
expanded escape sequences (but not including checking for
a string that's too long):
@example
%x str
%%
char string_buf[MAX_STR_CONST];
char *string_buf_ptr;
\" string_buf_ptr = string_buf; BEGIN(str);
<str>\" @{ /* saw closing quote - all done */
BEGIN(INITIAL);
*string_buf_ptr = '\0';
/* return string constant token type and
* value to parser
*/
@}
<str>\n @{
/* error - unterminated string constant */
/* generate error message */
@}
<str>\\[0-7]@{1,3@} @{
/* octal escape sequence */
int result;
(void) sscanf( yytext + 1, "%o", &result );
if ( result > 0xff )
/* error, constant is out-of-bounds */
*string_buf_ptr++ = result;
@}
<str>\\[0-9]+ @{
/* generate error - bad escape sequence; something
* like '\48' or '\0777777'
*/
@}
<str>\\n *string_buf_ptr++ = '\n';
<str>\\t *string_buf_ptr++ = '\t';
<str>\\r *string_buf_ptr++ = '\r';
<str>\\b *string_buf_ptr++ = '\b';
<str>\\f *string_buf_ptr++ = '\f';
<str>\\(.|\n) *string_buf_ptr++ = yytext[1];
<str>[^\\\n\"]+ @{
char *yptr = yytext;
while ( *yptr )
*string_buf_ptr++ = *yptr++;
@}
@end example
Often, such as in some of the examples above, you wind up
writing a whole bunch of rules all preceded by the same
start condition(s). Flex makes this a little easier and
cleaner by introducing a notion of start condition @dfn{scope}.
A start condition scope is begun with:
@example
<SCs>@{
@end example
@noindent
where SCs is a list of one or more start conditions.
Inside the start condition scope, every rule automatically
has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which
matches the initial @samp{@{}. So, for example,
@example
<ESC>@{
"\\n" return '\n';
"\\r" return '\r';
"\\f" return '\f';
"\\0" return '\0';
@}
@end example
@noindent
is equivalent to:
@example
<ESC>"\\n" return '\n';
<ESC>"\\r" return '\r';
<ESC>"\\f" return '\f';
<ESC>"\\0" return '\0';
@end example
Start condition scopes may be nested.
Three routines are available for manipulating stacks of
start conditions:
@table @samp
@item void yy_push_state(int new_state)
pushes the current start condition onto the top of
the start condition stack and switches to @var{new_state}
as though you had used @samp{BEGIN new_state} (recall that
start condition names are also integers).
@item void yy_pop_state()
pops the top of the stack and switches to it via
@code{BEGIN}.
@item int yy_top_state()
returns the top of the stack without altering the
stack's contents.
@end table
The start condition stack grows dynamically and so has no
built-in size limitation. If memory is exhausted, program
execution aborts.
To use start condition stacks, your scanner must include a
@samp{%option stack} directive (see Options below).
@node Multiple buffers, End-of-file rules, Start conditions, Top
@section Multiple input buffers
Some scanners (such as those which support "include"
files) require reading from several input streams. As
@code{flex} scanners do a large amount of buffering, one cannot
control where the next input will be read from by simply
writing a @code{YY_INPUT} which is sensitive to the scanning
context. @code{YY_INPUT} is only called when the scanner reaches
the end of its buffer, which may be a long time after
scanning a statement such as an "include" which requires
switching the input source.
To negotiate these sorts of problems, @code{flex} provides a
mechanism for creating and switching between multiple
input buffers. An input buffer is created by using:
@example
YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
@end example
@noindent
which takes a @code{FILE} pointer and a size and creates a buffer
associated with the given file and large enough to hold
@var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the
size). It returns a @code{YY_BUFFER_STATE} handle, which may
then be passed to other routines (see below). The
@code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct}
@code{yy_buffer_state} structure, so you may safely initialize
YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you
wish, and also refer to the opaque structure in order to
correctly declare input buffers in source files other than
that of your scanner. Note that the @code{FILE} pointer in the
call to @code{yy_create_buffer} is only used as the value of @code{yyin}
seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer
uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to
@code{yy_create_buffer}. You select a particular buffer to scan
from using:
@example
void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
@end example
switches the scanner's input buffer so subsequent tokens
will come from @var{new_buffer}. Note that
@samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set
things up for continued scanning, instead of opening a new
file and pointing @code{yyin} at it. Note also that switching
input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()}
does @emph{not} change the start condition.
@example
void yy_delete_buffer( YY_BUFFER_STATE buffer )
@end example
@noindent
is used to reclaim the storage associated with a buffer.
You can also clear the current contents of a buffer using:
@example
void yy_flush_buffer( YY_BUFFER_STATE buffer )
@end example
This function discards the buffer's contents, so the next time the
scanner attempts to match a token from the buffer, it will first fill
the buffer anew using @code{YY_INPUT}.
@samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()},
provided for compatibility with the C++ use of @code{new} and @code{delete}
for creating and destroying dynamic objects.
Finally, the @code{YY_CURRENT_BUFFER} macro returns a
@code{YY_BUFFER_STATE} handle to the current buffer.
Here is an example of using these features for writing a
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -