📄 flex.texi

📁 flex编译器的源代码
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
start conditions at all will also be active.  If it is
@emph{exclusive}, then @emph{only} rules qualified with the start
condition will be active.  A set of rules contingent on the
same exclusive start condition describe a scanner which is
independent of any of the other rules in the @code{flex} input.
Because of this, exclusive start conditions make it easy
to specify "mini-scanners" which scan portions of the
input that are syntactically different from the rest
(e.g., comments).

If the distinction between inclusive and exclusive start
conditions is still a little vague, here's a simple
example illustrating the connection between the two.  The set
of rules:

@example
%s example
%%

<example>foo   do_something();

bar            something_else();
@end example

@noindent
is equivalent to

@example
%x example
%%

<example>foo   do_something();

<INITIAL,example>bar    something_else();
@end example

Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern
in the second example wouldn't be active (i.e., couldn't match) when
in start condition @samp{example}.  If we just used @samp{<example>}
to qualify @samp{bar}, though, then it would only be active in
@samp{example} and not in @code{INITIAL}, while in the first example
it's active in both, because in the first example the @samp{example}
starting condition is an @emph{inclusive} (@samp{%s}) start condition.

Also note that the special start-condition specifier @samp{<*>}
matches every start condition.  Thus, the above example
could also have been written;

@example
%x example
%%

<example>foo   do_something();

<*>bar    something_else();
@end example

The default rule (to @samp{ECHO} any unmatched character) remains
active in start conditions.  It is equivalent to:

@example
<*>.|\\n     ECHO;
@end example

@samp{BEGIN(0)} returns to the original state where only the
rules with no start conditions are active.  This state can
also be referred to as the start-condition "INITIAL", so
@samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}.  (The
parentheses around the start condition name are not required but
are considered good style.)

@code{BEGIN} actions can also be given as indented code at the
beginning of the rules section.  For example, the
following will cause the scanner to enter the "SPECIAL" start
condition whenever @samp{yylex()} is called and the global
variable @code{enter_special} is true:

@example
        int enter_special;

%x SPECIAL
%%
        if ( enter_special )
            BEGIN(SPECIAL);

<SPECIAL>blahblahblah
@dots{}more rules follow@dots{}
@end example

To illustrate the uses of start conditions, here is a
scanner which provides two different interpretations of a
string like "123.456".  By default it will treat it as as
three tokens, the integer "123", a dot ('.'), and the
integer "456".  But if the string is preceded earlier in
the line by the string "expect-floats" it will treat it as
a single token, the floating-point number 123.456:

@example
%@{
#include <math.h>
%@}
%s expect

%%
expect-floats        BEGIN(expect);

<expect>[0-9]+"."[0-9]+      @{
            printf( "found a float, = %f\n",
                    atof( yytext ) );
            @}
<expect>\n           @{
            /* that's the end of the line, so
             * we need another "expect-number"
             * before we'll recognize any more
             * numbers
             */
            BEGIN(INITIAL);
            @}

[0-9]+      @{

Version 2.5               December 1994                        18

            printf( "found an integer, = %d\n",
                    atoi( yytext ) );
            @}

"."         printf( "found a dot\n" );
@end example

Here is a scanner which recognizes (and discards) C
comments while maintaining a count of the current input line.

@example
%x comment
%%
        int line_num = 1;

"/*"         BEGIN(comment);

<comment>[^*\n]*        /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
<comment>\n             ++line_num;
<comment>"*"+"/"        BEGIN(INITIAL);
@end example

This scanner goes to a bit of trouble to match as much
text as possible with each rule.  In general, when
attempting to write a high-speed scanner try to match as
much possible in each rule, as it's a big win.

Note that start-conditions names are really integer values
and can be stored as such.  Thus, the above could be
extended in the following fashion:

@example
%x comment foo
%%
        int line_num = 1;
        int comment_caller;

"/*"         @{
             comment_caller = INITIAL;
             BEGIN(comment);
             @}

@dots{}

<foo>"/*"    @{
             comment_caller = foo;
             BEGIN(comment);
             @}

<comment>[^*\n]*        /* eat anything that's not a '*' */
<comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
<comment>\n             ++line_num;
<comment>"*"+"/"        BEGIN(comment_caller);
@end example

Furthermore, you can access the current start condition
using the integer-valued @code{YY_START} macro.  For example, the
above assignments to @code{comment_caller} could instead be
written

@example
comment_caller = YY_START;
@end example

Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that
is what's used by AT&T @code{lex}).

Note that start conditions do not have their own
name-space; %s's and %x's declare names in the same fashion as
#define's.

Finally, here's an example of how to match C-style quoted
strings using exclusive start conditions, including
expanded escape sequences (but not including checking for
a string that's too long):

@example
%x str

%%
        char string_buf[MAX_STR_CONST];
        char *string_buf_ptr;

\"      string_buf_ptr = string_buf; BEGIN(str);

<str>\"        @{ /* saw closing quote - all done */
        BEGIN(INITIAL);
        *string_buf_ptr = '\0';
        /* return string constant token type and
         * value to parser
         */
        @}

<str>\n        @{
        /* error - unterminated string constant */
        /* generate error message */
        @}

<str>\\[0-7]@{1,3@} @{
        /* octal escape sequence */
        int result;

        (void) sscanf( yytext + 1, "%o", &result );

        if ( result > 0xff )
                /* error, constant is out-of-bounds */

        *string_buf_ptr++ = result;
        @}

<str>\\[0-9]+ @{
        /* generate error - bad escape sequence; something
         * like '\48' or '\0777777'
         */
        @}

<str>\\n  *string_buf_ptr++ = '\n';
<str>\\t  *string_buf_ptr++ = '\t';
<str>\\r  *string_buf_ptr++ = '\r';
<str>\\b  *string_buf_ptr++ = '\b';
<str>\\f  *string_buf_ptr++ = '\f';

<str>\\(.|\n)  *string_buf_ptr++ = yytext[1];

<str>[^\\\n\"]+        @{
        char *yptr = yytext;

        while ( *yptr )
                *string_buf_ptr++ = *yptr++;
        @}
@end example

Often, such as in some of the examples above, you wind up
writing a whole bunch of rules all preceded by the same
start condition(s).  Flex makes this a little easier and
cleaner by introducing a notion of start condition @dfn{scope}.
A start condition scope is begun with:

@example
<SCs>@{
@end example

@noindent
where SCs is a list of one or more start conditions.
Inside the start condition scope, every rule automatically
has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which
matches the initial @samp{@{}.  So, for example,

@example
<ESC>@{
    "\\n"   return '\n';
    "\\r"   return '\r';
    "\\f"   return '\f';
    "\\0"   return '\0';
@}
@end example

@noindent
is equivalent to:

@example
<ESC>"\\n"  return '\n';
<ESC>"\\r"  return '\r';
<ESC>"\\f"  return '\f';
<ESC>"\\0"  return '\0';
@end example

Start condition scopes may be nested.

Three routines are available for manipulating stacks of
start conditions:

@table @samp
@item void yy_push_state(int new_state)
pushes the current start condition onto the top of
the start condition stack and switches to @var{new_state}
as though you had used @samp{BEGIN new_state} (recall that
start condition names are also integers).

@item void yy_pop_state()
pops the top of the stack and switches to it via
@code{BEGIN}.

@item int yy_top_state()
returns the top of the stack without altering the
stack's contents.
@end table

The start condition stack grows dynamically and so has no
built-in size limitation.  If memory is exhausted, program
execution aborts.

To use start condition stacks, your scanner must include a
@samp{%option stack} directive (see Options below).

@node Multiple buffers, End-of-file rules, Start conditions, Top
@section Multiple input buffers

Some scanners (such as those which support "include"
files) require reading from several input streams.  As
@code{flex} scanners do a large amount of buffering, one cannot
control where the next input will be read from by simply
writing a @code{YY_INPUT} which is sensitive to the scanning
context.  @code{YY_INPUT} is only called when the scanner reaches
the end of its buffer, which may be a long time after
scanning a statement such as an "include" which requires
switching the input source.

To negotiate these sorts of problems, @code{flex} provides a
mechanism for creating and switching between multiple
input buffers.  An input buffer is created by using:

@example
YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
@end example

@noindent
which takes a @code{FILE} pointer and a size and creates a buffer
associated with the given file and large enough to hold
@var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the
size).  It returns a @code{YY_BUFFER_STATE} handle, which may
then be passed to other routines (see below).  The
@code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct}
@code{yy_buffer_state} structure, so you may safely initialize
YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you
wish, and also refer to the opaque structure in order to
correctly declare input buffers in source files other than
that of your scanner.  Note that the @code{FILE} pointer in the
call to @code{yy_create_buffer} is only used as the value of @code{yyin}
seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer
uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to
@code{yy_create_buffer}.  You select a particular buffer to scan
from using:

@example
void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
@end example

switches the scanner's input buffer so subsequent tokens
will come from @var{new_buffer}.  Note that
@samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set
things up for continued scanning, instead of opening a new
file and pointing @code{yyin} at it.  Note also that switching
input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()}
does @emph{not} change the start condition.

@example
void yy_delete_buffer( YY_BUFFER_STATE buffer )
@end example

@noindent
is used to reclaim the storage associated with a buffer.
You can also clear the current contents of a buffer using:

@example
void yy_flush_buffer( YY_BUFFER_STATE buffer )
@end example

This function discards the buffer's contents, so the next time the
scanner attempts to match a token from the buffer, it will first fill
the buffer anew using @code{YY_INPUT}.

@samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()},
provided for compatibility with the C++ use of @code{new} and @code{delete}
for creating and destroying dynamic objects.

Finally, the @code{YY_CURRENT_BUFFER} macro returns a
@code{YY_BUFFER_STATE} handle to the current buffer.

Here is an example of using these features for writing a
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -