⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 flex.1

📁 另一版的词法分析器
💻 1
📖 第 1 页 / 共 5 页
字号:
.fi
(Note that if the scanner is compiled using
.B C++,
then
.B input()
is instead referred to as
.B yyinput(),
in order to avoid a name clash with the
.B C++
stream by the name of
.I input.)
.IP -
.B YY_FLUSH_BUFFER
flushes the scanner's internal buffer
so that the next time the scanner attempts to match a token, it will
first refill the buffer using
.B YY_INPUT
(see The Generated Scanner, below).  This action is a special case
of the more general
.B yy_flush_buffer()
function, described below in the section Multiple Input Buffers.
.IP -
.B yyterminate()
can be used in lieu of a return statement in an action.  It terminates
the scanner and returns a 0 to the scanner's caller, indicating "all done".
By default,
.B yyterminate()
is also called when an end-of-file is encountered.  It is a macro and
may be redefined.
.SH THE GENERATED SCANNER
The output of
.I flex
is the file
.B lex.yy.c,
which contains the scanning routine
.B yylex(),
a number of tables used by it for matching tokens, and a number
of auxiliary routines and macros.  By default,
.B yylex()
is declared as follows:
.nf

    int yylex()
        {
        ... various definitions and the actions in here ...
        }

.fi
(If your environment supports function prototypes, then it will
be "int yylex( void )".)  This definition may be changed by defining
the "YY_DECL" macro.  For example, you could use:
.nf

    #define YY_DECL float lexscan( a, b ) float a, b;

.fi
to give the scanning routine the name
.I lexscan,
returning a float, and taking two floats as arguments.  Note that
if you give arguments to the scanning routine using a
K&R-style/non-prototyped function declaration, you must terminate
the definition with a semi-colon (;).
.PP
Whenever
.B yylex()
is called, it scans tokens from the global input file
.I yyin
(which defaults to stdin).  It continues until it either reaches
an end-of-file (at which point it returns the value 0) or
one of its actions executes a
.I return
statement.
.PP
If the scanner reaches an end-of-file, subsequent calls are undefined
unless either
.I yyin
is pointed at a new input file (in which case scanning continues from
that file), or
.B yyrestart()
is called.
.B yyrestart()
takes one argument, a
.B FILE *
pointer (which can be nil, if you've set up
.B YY_INPUT
to scan from a source other than
.I yyin),
and initializes
.I yyin
for scanning from that file.  Essentially there is no difference between
just assigning
.I yyin
to a new input file or using
.B yyrestart()
to do so; the latter is available for compatibility with previous versions
of
.I flex,
and because it can be used to switch input files in the middle of scanning.
It can also be used to throw away the current input buffer, by calling
it with an argument of
.I yyin;
but better is to use
.B YY_FLUSH_BUFFER
(see above).
Note that
.B yyrestart()
does
.I not
reset the start condition to
.B INITIAL
(see Start Conditions, below).
.PP
If
.B yylex()
stops scanning due to executing a
.I return
statement in one of the actions, the scanner may then be called again and it
will resume scanning where it left off.
.PP
By default (and for purposes of efficiency), the scanner uses
block-reads rather than simple
.I getc()
calls to read characters from
.I yyin.
The nature of how it gets its input can be controlled by defining the
.B YY_INPUT
macro.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
action is to place up to
.I max_size
characters in the character array
.I buf
and return in the integer variable
.I result
either the
number of characters read or the constant YY_NULL (0 on Unix systems)
to indicate EOF.  The default YY_INPUT reads from the
global file-pointer "yyin".
.PP
A sample definition of YY_INPUT (in the definitions
section of the input file):
.nf

    %{
    #define YY_INPUT(buf,result,max_size) \\
        { \\
        int c = getchar(); \\
        result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
        }
    %}

.fi
This definition will change the input processing to occur
one character at a time.
.PP
When the scanner receives an end-of-file indication from YY_INPUT,
it then checks the
.B yywrap()
function.  If
.B yywrap()
returns false (zero), then it is assumed that the
function has gone ahead and set up
.I yyin
to point to another input file, and scanning continues.  If it returns
true (non-zero), then the scanner terminates, returning 0 to its
caller.  Note that in either case, the start condition remains unchanged;
it does
.I not
revert to
.B INITIAL.
.PP
If you do not supply your own version of
.B yywrap(),
then you must either use
.B %option noyywrap
(in which case the scanner behaves as though
.B yywrap()
returned 1), or you must link with
.B \-lfl
to obtain the default version of the routine, which always returns 1.
.PP
Three routines are available for scanning from in-memory buffers rather
than files:
.B yy_scan_string(), yy_scan_bytes(),
and
.B yy_scan_buffer().
See the discussion of them below in the section Multiple Input Buffers.
.PP
The scanner writes its
.B ECHO
output to the
.I yyout
global (default, stdout), which may be redefined by the user simply
by assigning it to some other
.B FILE
pointer.
.SH START CONDITIONS
.I flex
provides a mechanism for conditionally activating rules.  Any rule
whose pattern is prefixed with "<sc>" will only be active when
the scanner is in the start condition named "sc".  For example,
.nf

    <STRING>[^"]*        { /* eat up the string body ... */
                ...
                }

.fi
will be active only when the scanner is in the "STRING" start
condition, and
.nf

    <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
                ...
                }

.fi
will be active only when the current start condition is
either "INITIAL", "STRING", or "QUOTE".
.PP
Start conditions
are declared in the definitions (first) section of the input
using unindented lines beginning with either
.B %s
or
.B %x
followed by a list of names.
The former declares
.I inclusive
start conditions, the latter
.I exclusive
start conditions.  A start condition is activated using the
.B BEGIN
action.  Until the next
.B BEGIN
action is executed, rules with the given start
condition will be active and
rules with other start conditions will be inactive.
If the start condition is
.I inclusive,
then rules with no start conditions at all will also be active.
If it is
.I exclusive,
then
.I only
rules qualified with the start condition will be active.
A set of rules contingent on the same exclusive start condition
describe a scanner which is independent of any of the other rules in the
.I flex
input.  Because of this,
exclusive start conditions make it easy to specify "mini-scanners"
which scan portions of the input that are syntactically different
from the rest (e.g., comments).
.PP
If the distinction between inclusive and exclusive start conditions
is still a little vague, here's a simple example illustrating the
connection between the two.  The set of rules:
.nf

    %s example
    %%

    <example>foo   do_something();

    bar            something_else();

.fi
is equivalent to
.nf

    %x example
    %%

    <example>foo   do_something();

    <INITIAL,example>bar    something_else();

.fi
Without the
.B <INITIAL,example>
qualifier, the
.I bar
pattern in the second example wouldn't be active (i.e., couldn't match)
when in start condition
.B example.
If we just used
.B <example>
to qualify
.I bar,
though, then it would only be active in
.B example
and not in
.B INITIAL,
while in the first example it's active in both, because in the first
example the
.B example
startion condition is an
.I inclusive
.B (%s)
start condition.
.PP
Also note that the special start-condition specifier
.B <*>
matches every start condition.  Thus, the above example could also
have been written;
.nf

    %x example
    %%

    <example>foo   do_something();

    <*>bar    something_else();

.fi
.PP
The default rule (to
.B ECHO
any unmatched character) remains active in start conditions.  It
is equivalent to:
.nf

    <*>.|\\n     ECHO;

.fi
.PP
.B BEGIN(0)
returns to the original state where only the rules with
no start conditions are active.  This state can also be
referred to as the start-condition "INITIAL", so
.B BEGIN(INITIAL)
is equivalent to
.B BEGIN(0).
(The parentheses around the start condition name are not required but
are considered good style.)
.PP
.B BEGIN
actions can also be given as indented code at the beginning
of the rules section.  For example, the following will cause
the scanner to enter the "SPECIAL" start condition whenever
.B yylex()
is called and the global variable
.I enter_special
is true:
.nf

            int enter_special;

    %x SPECIAL
    %%
            if ( enter_special )
                BEGIN(SPECIAL);

    <SPECIAL>blahblahblah
    ...more rules follow...

.fi
.PP
To illustrate the uses of start conditions,
here is a scanner which provides two different interpretations
of a string like "123.456".  By default it will treat it as
three tokens, the integer "123", a dot ('.'), and the integer "456".
But if the string is preceded earlier in the line by the string
"expect-floats"
it will treat it as a single token, the floating-point number
123.456:
.nf

    %{
    #include <math.h>
    %}
    %s expect

    %%
    expect-floats        BEGIN(expect);

    <expect>[0-9]+"."[0-9]+      {
                printf( "found a float, = %f\\n",
                        atof( yytext ) );
                }
    <expect>\\n           {
                /* that's the end of the line, so
                 * we need another "expect-number"
                 * before we'll recognize any more
                 * numbers
                 */
                BEGIN(INITIAL);
                }

    [0-9]+      {
                printf( "found an integer, = %d\\n",
                        atoi( yytext ) );
                }

    "."         printf( "found a dot\\n" );

.fi
Here is a scanner which recognizes (and discards) C comments while
maintaining a count of the current input line.
.nf

    %x comment
    %%
            int line_num = 1;

    "/*"         BEGIN(comment);

    <comment>[^*\\n]*        /* eat anything that's not a '*' */
    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
    <comment>\\n             ++line_num;
    <comment>"*"+"/"        BEGIN(INITIAL);

.fi
This scanner goes to a bit of trouble to match as much
text as possible with each rule.  In general, when attempting to write
a high-speed scanner try to match as much possible in each rule, as
it's a big win.
.PP
Note that start-conditions names are really integer values and
can be stored as such.  Thus, the above could be extended in the
following fashion:
.nf

    %x comment foo
    %%
            int line_num = 1;
            int comment_caller;

    "/*"         {
                 comment_caller = INITIAL;
                 BEGIN(comment);
                 }

    ...

    <foo>"/*"    {
                 comment_caller = foo;
                 BEGIN(comment);
                 }

    <comment>[^*\\n]*        /* eat anything that's not a '*' */
    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
    <comment>\\n             ++line_num;
    <comment>"*"+"/"        BEGIN(comment_caller);

.fi
Furthermore, you can access the current start condition using
the integer-valued
.B YY_START
macro.  For example, the above assignments to
.I comment_caller
could instead be written
.nf

    comment_caller = YY_START;

.fi
Flex provides
.B YYSTATE
as an alias for
.B YY_START
(since that is what's used by AT&T
.I lex).
.PP
Note that start conditions do not have their own name-space; %s's and %x's
declare names in the same fashion as #define's.
.PP
Finally, here's an example of how to match C-style quoted strings using
exclusive start conditions, including expanded escape sequences (but
not including checking for a string that's too long):
.nf

    %x str

    %%
            char string_buf[MAX_STR_CONST];
            char *string_buf_ptr;


    \\"      string_buf_ptr = string_buf; BEGIN(str);

    <str>\\"        { /* saw closing quote - all done */
            BEGIN(INITIAL);
            *string_buf_ptr = '\\0';

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -