📄 flex.1

📁 flex编译器的源代码
💻 1
📖 第 1 页 / 共 5 页
字号:
            /* return string constant token type and
             * value to parser
             */
            }

    <str>\\n        {
            /* error - unterminated string constant */
            /* generate error message */
            }

    <str>\\\\[0-7]{1,3} {
            /* octal escape sequence */
            int result;

            (void) sscanf( yytext + 1, "%o", &result );

            if ( result > 0xff )
                    /* error, constant is out-of-bounds */

            *string_buf_ptr++ = result;
            }

    <str>\\\\[0-9]+ {
            /* generate error - bad escape sequence; something
             * like '\\48' or '\\0777777'
             */
            }

    <str>\\\\n  *string_buf_ptr++ = '\\n';
    <str>\\\\t  *string_buf_ptr++ = '\\t';
    <str>\\\\r  *string_buf_ptr++ = '\\r';
    <str>\\\\b  *string_buf_ptr++ = '\\b';
    <str>\\\\f  *string_buf_ptr++ = '\\f';

    <str>\\\\(.|\\n)  *string_buf_ptr++ = yytext[1];

    <str>[^\\\\\\n\\"]+        {
            char *yptr = yytext;

            while ( *yptr )
                    *string_buf_ptr++ = *yptr++;
            }

.fi
.PP
Often, such as in some of the examples above, you wind up writing a
whole bunch of rules all preceded by the same start condition(s).  Flex
makes this a little easier and cleaner by introducing a notion of
start condition
.I scope.
A start condition scope is begun with:
.nf

    <SCs>{

.fi
where
.I SCs
is a list of one or more start conditions.  Inside the start condition
scope, every rule automatically has the prefix
.I <SCs>
applied to it, until a
.I '}'
which matches the initial
.I '{'.
So, for example,
.nf

    <ESC>{
        "\\\\n"   return '\\n';
        "\\\\r"   return '\\r';
        "\\\\f"   return '\\f';
        "\\\\0"   return '\\0';
    }

.fi
is equivalent to:
.nf

    <ESC>"\\\\n"  return '\\n';
    <ESC>"\\\\r"  return '\\r';
    <ESC>"\\\\f"  return '\\f';
    <ESC>"\\\\0"  return '\\0';

.fi
Start condition scopes may be nested.
.PP
Three routines are available for manipulating stacks of start conditions:
.TP
.B void yy_push_state(int new_state)
pushes the current start condition onto the top of the start condition
stack and switches to
.I new_state
as though you had used
.B BEGIN new_state
(recall that start condition names are also integers).
.TP
.B void yy_pop_state()
pops the top of the stack and switches to it via
.B BEGIN.
.TP
.B int yy_top_state()
returns the top of the stack without altering the stack's contents.
.PP
The start condition stack grows dynamically and so has no built-in
size limitation.  If memory is exhausted, program execution aborts.
.PP
To use start condition stacks, your scanner must include a
.B %option stack
directive (see Options below).
.SH MULTIPLE INPUT BUFFERS
Some scanners (such as those which support "include" files)
require reading from several input streams.  As
.I flex
scanners do a large amount of buffering, one cannot control
where the next input will be read from by simply writing a
.B YY_INPUT
which is sensitive to the scanning context.
.B YY_INPUT
is only called when the scanner reaches the end of its buffer, which
may be a long time after scanning a statement such as an "include"
which requires switching the input source.
.PP
To negotiate these sorts of problems,
.I flex
provides a mechanism for creating and switching between multiple
input buffers.  An input buffer is created by using:
.nf

    YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )

.fi
which takes a
.I FILE
pointer and a size and creates a buffer associated with the given
file and large enough to hold
.I size
characters (when in doubt, use
.B YY_BUF_SIZE
for the size).  It returns a
.B YY_BUFFER_STATE
handle, which may then be passed to other routines (see below).  The
.B YY_BUFFER_STATE
type is a pointer to an opaque
.B struct yy_buffer_state
structure, so you may safely initialize YY_BUFFER_STATE variables to
.B ((YY_BUFFER_STATE) 0)
if you wish, and also refer to the opaque structure in order to
correctly declare input buffers in source files other than that
of your scanner.  Note that the
.I FILE
pointer in the call to
.B yy_create_buffer
is only used as the value of
.I yyin
seen by
.B YY_INPUT;
if you redefine
.B YY_INPUT
so it no longer uses
.I yyin,
then you can safely pass a nil
.I FILE
pointer to
.B yy_create_buffer.
You select a particular buffer to scan from using:
.nf

    void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )

.fi
switches the scanner's input buffer so subsequent tokens will
come from
.I new_buffer.
Note that
.B yy_switch_to_buffer()
may be used by yywrap() to set things up for continued scanning, instead
of opening a new file and pointing
.I yyin
at it.  Note also that switching input sources via either
.B yy_switch_to_buffer()
or
.B yywrap()
does
.I not
change the start condition.
.nf

    void yy_delete_buffer( YY_BUFFER_STATE buffer )

.fi
is used to reclaim the storage associated with a buffer.  (
.B buffer
can be nil, in which case the routine does nothing.)
You can also clear the current contents of a buffer using:
.nf

    void yy_flush_buffer( YY_BUFFER_STATE buffer )

.fi
This function discards the buffer's contents,
so the next time the scanner attempts to match a token from the
buffer, it will first fill the buffer anew using
.B YY_INPUT.
.PP
.B yy_new_buffer()
is an alias for
.B yy_create_buffer(),
provided for compatibility with the C++ use of
.I new
and
.I delete
for creating and destroying dynamic objects.
.PP
Finally, the
.B YY_CURRENT_BUFFER
macro returns a
.B YY_BUFFER_STATE
handle to the current buffer.
.PP
Here is an example of using these features for writing a scanner
which expands include files (the
.B <<EOF>>
feature is discussed below):
.nf

    /* the "incl" state is used for picking up the name
     * of an include file
     */
    %x incl

    %{
    #define MAX_INCLUDE_DEPTH 10
    YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
    int include_stack_ptr = 0;
    %}

    %%
    include             BEGIN(incl);

    [a-z]+              ECHO;
    [^a-z\\n]*\\n?        ECHO;

    <incl>[ \\t]*      /* eat the whitespace */
    <incl>[^ \\t\\n]+   { /* got the include file name */
            if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
                {
                fprintf( stderr, "Includes nested too deeply" );
                exit( 1 );
                }

            include_stack[include_stack_ptr++] =
                YY_CURRENT_BUFFER;

            yyin = fopen( yytext, "r" );

            if ( ! yyin )
                error( ... );

            yy_switch_to_buffer(
                yy_create_buffer( yyin, YY_BUF_SIZE ) );

            BEGIN(INITIAL);
            }

    <<EOF>> {
            if ( --include_stack_ptr < 0 )
                {
                yyterminate();
                }

            else
                {
                yy_delete_buffer( YY_CURRENT_BUFFER );
                yy_switch_to_buffer(
                     include_stack[include_stack_ptr] );
                }
            }

.fi
Three routines are available for setting up input buffers for
scanning in-memory strings instead of files.  All of them create
a new input buffer for scanning the string, and return a corresponding
.B YY_BUFFER_STATE
handle (which you should delete with
.B yy_delete_buffer()
when done with it).  They also switch to the new buffer using
.B yy_switch_to_buffer(),
so the next call to
.B yylex()
will start scanning the string.
.TP
.B yy_scan_string(const char *str)
scans a NUL-terminated string.
.TP
.B yy_scan_bytes(const char *bytes, int len)
scans
.I len
bytes (including possibly NUL's)
starting at location
.I bytes.
.PP
Note that both of these functions create and scan a
.I copy
of the string or bytes.  (This may be desirable, since
.B yylex()
modifies the contents of the buffer it is scanning.)  You can avoid the
copy by using:
.TP
.B yy_scan_buffer(char *base, yy_size_t size)
which scans in place the buffer starting at
.I base,
consisting of
.I size
bytes, the last two bytes of which
.I must
be
.B YY_END_OF_BUFFER_CHAR
(ASCII NUL).
These last two bytes are not scanned; thus, scanning
consists of
.B base[0]
through
.B base[size-2],
inclusive.
.IP
If you fail to set up
.I base
in this manner (i.e., forget the final two
.B YY_END_OF_BUFFER_CHAR
bytes), then
.B yy_scan_buffer()
returns a nil pointer instead of creating a new input buffer.
.IP
The type
.B yy_size_t
is an integral type to which you can cast an integer expression
reflecting the size of the buffer.
.SH END-OF-FILE RULES
The special rule "<<EOF>>" indicates
actions which are to be taken when an end-of-file is
encountered and yywrap() returns non-zero (i.e., indicates
no further files to process).  The action must finish
by doing one of four things:
.IP -
assigning
.I yyin
to a new input file (in previous versions of flex, after doing the
assignment you had to call the special action
.B YY_NEW_FILE;
this is no longer necessary);
.IP -
executing a
.I return
statement;
.IP -
executing the special
.B yyterminate()
action;
.IP -
or, switching to a new buffer using
.B yy_switch_to_buffer()
as shown in the example above.
.PP
<<EOF>> rules may not be used with other
patterns; they may only be qualified with a list of start
conditions.  If an unqualified <<EOF>> rule is given, it
applies to
.I all
start conditions which do not already have <<EOF>> actions.  To
specify an <<EOF>> rule for only the initial start condition, use
.nf

    <INITIAL><<EOF>>

.fi
.PP
These rules are useful for catching things like unclosed comments.
An example:
.nf

    %x quote
    %%

    ...other rules for dealing with quotes...

    <quote><<EOF>>   {
             error( "unterminated quote" );
             yyterminate();
             }
    <<EOF>>  {
             if ( *++filelist )
                 yyin = fopen( *filelist, "r" );
             else
                yyterminate();
             }

.fi
.SH MISCELLANEOUS MACROS
The macro
.B YY_USER_ACTION
can be defined to provide an action
which is always executed prior to the matched rule's action.  For example,
it could be #define'd to call a routine to convert yytext to lower-case.
When
.B YY_USER_ACTION
is invoked, the variable
.I yy_act
gives the number of the matched rule (rules are numbered starting with 1).
Suppose you want to profile how often each of your rules is matched.  The
following would do the trick:
.nf

    #define YY_USER_ACTION ++ctr[yy_act]

.fi
where
.I ctr
is an array to hold the counts for the different rules.  Note that
the macro
.B YY_NUM_RULES
gives the total number of rules (including the default rule, even if
you use
.B \-s),
so a correct declaration for
.I ctr
is:
.nf

    int ctr[YY_NUM_RULES];

.fi
.PP
The macro
.B YY_USER_INIT
may be defined to provide an action which is always executed before
the first scan (and before the scanner's internal initializations are done).
For example, it could be used to call a routine to read
in a data table or open a logging file.
.PP
The macro
.B yy_set_interactive(is_interactive)
can be used to control whether the current buffer is considered
.I interactive.
An interactive buffer is processed more slowly,
but must be used when the scanner's input source is indeed
interactive to avoid problems due to waiting to fill buffers
(see the discussion of the
.B \-I
flag below).  A non-zero value
in the macro invocation marks the buffer as interactive, a zero  
value as non-interactive.  Note that use of this macro overrides
.B %option always-interactive
or
.B %option never-interactive
(see Options below).
.B yy_set_interactive()
must be invoked prior to beginning to scan the buffer that is
(or is not) to be considered interactive.
.PP
The macro
.B yy_set_bol(at_bol)
can be used to control whether the current buffer's scanning
context for the next token match is done as though at the
beginning of a line.  A non-zero macro argument makes rules anchored with
'^' active, while a zero argument makes '^' rules inactive.
.PP
The macro
.B YY_AT_BOL()
returns true if the next token scanned from the current buffer
will have '^' rules active, false otherwise.
.PP
In the generated scanner, the actions are all gathered in one large
switch statement and separated using
.B YY_BREAK,
which may be redefined.  By default, it is simply a "break", to separate
each rule's action from the following rule's.
Redefining
.B YY_BREAK
allows, for example, C++ users to
#define YY_BREAK to do nothing (while being very careful that every
rule ends with a "break" or a "return"!) to avoid suffering from
unreachable statement warnings where because a rule's action ends with
"return", the
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -