📄 flexdoc.1

📁 C++版词法分析、语法分析器
💻 1
📖 第 1 页 / 共 5 页
字号:
    %}
    %s expect

    %%
    expect-floats        BEGIN(expect);

    <expect>[0-9]+"."[0-9]+      {
                printf( "found a float, = %f\\n",
                        atof( yytext ) );
                }
    <expect>\\n           {
                /* that's the end of the line, so
                 * we need another "expect-number"
                 * before we'll recognize any more
                 * numbers
                 */
                BEGIN(INITIAL);
                }

    [0-9]+      {
                printf( "found an integer, = %d\\n",
                        atoi( yytext ) );
                }

    "."         printf( "found a dot\\n" );

.fi
Here is a scanner which recognizes (and discards) C comments while
maintaining a count of the current input line.
.nf

    %x comment
    %%
            int line_num = 1;

    "/*"         BEGIN(comment);

    <comment>[^*\\n]*        /* eat anything that's not a '*' */
    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
    <comment>\\n             ++line_num;
    <comment>"*"+"/"        BEGIN(INITIAL);

.fi
Note that start-conditions names are really integer values and
can be stored as such.  Thus, the above could be extended in the
following fashion:
.nf

    %x comment foo
    %%
            int line_num = 1;
            int comment_caller;

    "/*"         {
                 comment_caller = INITIAL;
                 BEGIN(comment);
                 }

    ...

    <foo>"/*"    {
                 comment_caller = foo;
                 BEGIN(comment);
                 }

    <comment>[^*\\n]*        /* eat anything that's not a '*' */
    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
    <comment>\\n             ++line_num;
    <comment>"*"+"/"        BEGIN(comment_caller);

.fi
One can then implement a "stack" of start conditions using an
array of integers.  (It is likely that such stacks will become
a full-fledged
.I flex
feature in the future.)  Note, though, that
start conditions do not have their own name-space; %s's and %x's
declare names in the same fashion as #define's.
.SH MULTIPLE INPUT BUFFERS
Some scanners (such as those which support "include" files)
require reading from several input streams.  As
.I flex
scanners do a large amount of buffering, one cannot control
where the next input will be read from by simply writing a
.B YY_INPUT
which is sensitive to the scanning context.
.B YY_INPUT
is only called when the scanner reaches the end of its buffer, which
may be a long time after scanning a statement such as an "include"
which requires switching the input source.
.LP
To negotiate these sorts of problems,
.I flex
provides a mechanism for creating and switching between multiple
input buffers.  An input buffer is created by using:
.nf

    YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )

.fi
which takes a
.I FILE
pointer and a size and creates a buffer associated with the given
file and large enough to hold
.I size
characters (when in doubt, use
.B YY_BUF_SIZE
for the size).  It returns a
.B YY_BUFFER_STATE
handle, which may then be passed to other routines:
.nf

    void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )

.fi
switches the scanner's input buffer so subsequent tokens will
come from
.I new_buffer.
Note that
.B yy_switch_to_buffer()
may be used by yywrap() to sets things up for continued scanning, instead
of opening a new file and pointing
.I yyin
at it.
.nf

    void yy_delete_buffer( YY_BUFFER_STATE buffer )

.fi
is used to reclaim the storage associated with a buffer.
.LP
.B yy_new_buffer()
is an alias for
.B yy_create_buffer(),
provided for compatibility with the C++ use of
.I new
and
.I delete
for creating and destroying dynamic objects.
.LP
Finally, the
.B YY_CURRENT_BUFFER
macro returns a
.B YY_BUFFER_STATE
handle to the current buffer.
.LP
Here is an example of using these features for writing a scanner
which expands include files (the
.B <<EOF>>
feature is discussed below):
.nf

    /* the "incl" state is used for picking up the name
     * of an include file
     */
    %x incl

    %{
    #define MAX_INCLUDE_DEPTH 10
    YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
    int include_stack_ptr = 0;
    %}

    %%
    include             BEGIN(incl);

    [a-z]+              ECHO;
    [^a-z\\n]*\\n?        ECHO;

    <incl>[ \\t]*      /* eat the whitespace */
    <incl>[^ \\t\\n]+   { /* got the include file name */
            if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
                {
                fprintf( stderr, "Includes nested too deeply" );
                exit( 1 );
                }

            include_stack[include_stack_ptr++] =
                YY_CURRENT_BUFFER;

            yyin = fopen( yytext, "r" );

            if ( ! yyin )
                error( ... );

            yy_switch_to_buffer(
                yy_create_buffer( yyin, YY_BUF_SIZE ) );

            BEGIN(INITIAL);
            }

    <<EOF>> {
            if ( --include_stack_ptr < 0 )
                {
                yyterminate();
                }

            else
                yy_switch_to_buffer(
                     include_stack[include_stack_ptr] );
            }

.fi
.SH END-OF-FILE RULES
The special rule "<<EOF>>" indicates
actions which are to be taken when an end-of-file is
encountered and yywrap() returns non-zero (i.e., indicates
no further files to process).  The action must finish
by doing one of four things:
.IP -
the special
.B YY_NEW_FILE
action, if
.I yyin
has been pointed at a new file to process;
.IP -
a
.I return
statement;
.IP -
the special
.B yyterminate()
action;
.IP -
or, switching to a new buffer using
.B yy_switch_to_buffer()
as shown in the example above.
.LP
<<EOF>> rules may not be used with other
patterns; they may only be qualified with a list of start
conditions.  If an unqualified <<EOF>> rule is given, it
applies to
.I all
start conditions which do not already have <<EOF>> actions.  To
specify an <<EOF>> rule for only the initial start condition, use
.nf

    <INITIAL><<EOF>>

.fi
.LP
These rules are useful for catching things like unclosed comments.
An example:
.nf

    %x quote
    %%

    ...other rules for dealing with quotes...

    <quote><<EOF>>   {
             error( "unterminated quote" );
             yyterminate();
             }
    <<EOF>>  {
             if ( *++filelist )
                 {
                 yyin = fopen( *filelist, "r" );
                 YY_NEW_FILE;
                 }
             else
                yyterminate();
             }

.fi
.SH MISCELLANEOUS MACROS
The macro
.bd
YY_USER_ACTION
can be redefined to provide an action
which is always executed prior to the matched rule's action.  For example,
it could be #define'd to call a routine to convert yytext to lower-case.
.LP
The macro
.B YY_USER_INIT
may be redefined to provide an action which is always executed before
the first scan (and before the scanner's internal initializations are done).
For example, it could be used to call a routine to read
in a data table or open a logging file.
.LP
In the generated scanner, the actions are all gathered in one large
switch statement and separated using
.B YY_BREAK,
which may be redefined.  By default, it is simply a "break", to separate
each rule's action from the following rule's.
Redefining
.B YY_BREAK
allows, for example, C++ users to
#define YY_BREAK to do nothing (while being very careful that every
rule ends with a "break" or a "return"!) to avoid suffering from
unreachable statement warnings where because a rule's action ends with
"return", the
.B YY_BREAK
is inaccessible.
.SH INTERFACING WITH YACC
One of the main uses of
.I flex
is as a companion to the
.I yacc
parser-generator.
.I yacc
parsers expect to call a routine named
.B yylex()
to find the next input token.  The routine is supposed to
return the type of the next token as well as putting any associated
value in the global
.B yylval.
To use
.I flex
with
.I yacc,
one specifies the
.B -d
option to
.I yacc
to instruct it to generate the file
.B y.tab.h
containing definitions of all the
.B %tokens
appearing in the
.I yacc
input.  This file is then included in the
.I flex
scanner.  For example, if one of the tokens is "TOK_NUMBER",
part of the scanner might look like:
.nf

    %{
    #include "y.tab.h"
    %}

    %%

    [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;

.fi
.SH TRANSLATION TABLE
In the name of POSIX compliance,
.I flex
supports a
.I translation table
for mapping input characters into groups.
The table is specified in the first section, and its format looks like:
.nf

    %t
    1        abcd
    2        ABCDEFGHIJKLMNOPQRSTUVWXYZ
    52       0123456789
    6        \\t\\ \\n
    %t

.fi
This example specifies that the characters 'a', 'b', 'c', and 'd'
are to all be lumped into group #1, upper-case letters
in group #2, digits in group #52, tabs, blanks, and newlines into
group #6, and
.I
no other characters will appear in the patterns.
The group numbers are actually disregarded by
.I flex;
.B %t
serves, though, to lump characters together.  Given the above
table, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0".
They both say, "match any character in group #1, followed by
zero-or-more pairs of characters
from group #2, followed by a character from group #52."  Thus
.B %t
provides a crude way for introducing equivalence classes into
the scanner specification.
.LP
Note that the
.B -i
option (see below) coupled with the equivalence classes which
.I flex
automatically generates take care of virtually all the instances
when one might consider using
.B %t.
But what the hell, it's there if you want it.
.SH OPTIONS
.I flex
has the following options:
.TP
.B -b
Generate backtracking information to
.I lex.backtrack.
This is a list of scanner states which require backtracking
and the input characters on which they do so.  By adding rules one
can remove backtracking states.  If all backtracking states
are eliminated and
.B -f
or
.B -F
is used, the generated scanner will run faster (see the
.B -p
flag).  Only users who wish to squeeze every last cycle out of their
scanners need worry about this option.  (See the section on PERFORMANCE
CONSIDERATIONS below.)
.TP
.B -c
is a do-nothing, deprecated option included for POSIX compliance.
.IP
.B NOTE:
in previous releases of
.I flex
.B -c
specified table-compression options.  This functionality is
now given by the
.B -C
flag.  To ease the the impact of this change, when
.I flex
encounters
.B -c,
it currently issues a warning message and assumes that
.B -C
was desired instead.  In the future this "promotion" of
.B -c
to
.B -C
will go away in the name of full POSIX compliance (unless
the POSIX meaning is removed first).
.TP
.B -d
makes the generated scanner run in
.I debug
mode.  Whenever a pattern is recognized and the global
.B yy_flex_debug
is non-zero (which is the default),
the scanner will write to
.I stderr
a line of the form:
.nf

    --accepting rule at line 53 ("the matched text")

.fi
The line number refers to the location of the rule in the file
defining the scanner (i.e., the file that was fed to flex).  Messages
are also generated when the scanner backtracks, accepts the
default rule, reaches the end of its input buffer (or encounters
a NUL; at this point, the two look the same as far as the scanner's concerned),
or reaches an end-of-file.
.TP
.B -f
specifies (take your pick)
.I full table
or
.I fast scanner.
No table compression is done.  The result is large but fast.
This option is equivalent to
.B -Cf
(see below).
.TP
.B -i
instructs
.I flex
to generate a
.I case-insensitive
scanner.  The case of letters given in the
.I flex
input patterns will
be ignored, and tokens in the input will be matched regardless of case.  The
matched text given in
.I yytext
will have the preserved case (i.e., it will not be folded).
.TP
.B -n
is another do-nothing, deprecated option included only for
POSIX compliance.
.TP
.B -p
generates a performance report to stderr.  The report
consists of comments regarding features of the
.I flex
input file which will cause a loss of performance in the resulting scanner.
Note that the use of
.I REJECT
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -