⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 flexdoc.1

📁 C++版 词法分析、语法分析器
💻 1
📖 第 1 页 / 共 5 页
字号:
.LP
Actions can include arbitrary C code, including
.B return
statements to return a value to whatever routine called
.B yylex().
Each time
.B yylex()
is called it continues processing tokens from where it last left
off until it either reaches
the end of the file or executes a return.  Once it reaches an end-of-file,
however, then any subsequent call to
.B yylex()
will simply immediately return, unless
.B yyrestart()
is first called (see below).
.LP
Actions are not allowed to modify yytext or yyleng.
.LP
There are a number of special directives which can be included within
an action:
.IP -
.B ECHO
copies yytext to the scanner's output.
.IP -
.B BEGIN
followed by the name of a start condition places the scanner in the
corresponding start condition (see below).
.IP -
.B REJECT
directs the scanner to proceed on to the "second best" rule which matched the
input (or a prefix of the input).  The rule is chosen as described
above in "How the Input is Matched", and
.B yytext
and
.B yyleng
set up appropriately.
It may either be one which matched as much text
as the originally chosen rule but came later in the
.I flex
input file, or one which matched less text.
For example, the following will both count the
words in the input and call the routine special() whenever "frob" is seen:
.nf

            int word_count = 0;
    %%

    frob        special(); REJECT;
    [^ \\t\\n]+   ++word_count;

.fi
Without the
.B REJECT,
any "frob"'s in the input would not be counted as words, since the
scanner normally executes only one action per token.
Multiple
.B REJECT's
are allowed, each one finding the next best choice to the currently
active rule.  For example, when the following scanner scans the token
"abcd", it will write "abcdabcaba" to the output:
.nf

    %%
    a        |
    ab       |
    abc      |
    abcd     ECHO; REJECT;
    .|\\n     /* eat up any unmatched character */

.fi
(The first three rules share the fourth's action since they use
the special '|' action.)
.B REJECT
is a particularly expensive feature in terms scanner performance;
if it is used in
.I any
of the scanner's actions it will slow down
.I all
of the scanner's matching.  Furthermore,
.B REJECT
cannot be used with the
.I -f
or
.I -F
options (see below).
.IP
Note also that unlike the other special actions,
.B REJECT
is a
.I branch;
code immediately following it in the action will
.I not
be executed.
.IP -
.B yymore()
tells the scanner that the next time it matches a rule, the corresponding
token should be
.I appended
onto the current value of
.B yytext
rather than replacing it.  For example, given the input "mega-kludge"
the following will write "mega-mega-kludge" to the output:
.nf

    %%
    mega-    ECHO; yymore();
    kludge   ECHO;

.fi
First "mega-" is matched and echoed to the output.  Then "kludge"
is matched, but the previous "mega-" is still hanging around at the
beginning of
.B yytext
so the
.B ECHO
for the "kludge" rule will actually write "mega-kludge".
The presence of
.B yymore()
in the scanner's action entails a minor performance penalty in the
scanner's matching speed.
.IP -
.B yyless(n)
returns all but the first
.I n
characters of the current token back to the input stream, where they
will be rescanned when the scanner looks for the next match.
.B yytext
and
.B yyleng
are adjusted appropriately (e.g.,
.B yyleng
will now be equal to
.I n
).  For example, on the input "foobar" the following will write out
"foobarbar":
.nf

    %%
    foobar    ECHO; yyless(3);
    [a-z]+    ECHO;

.fi
An argument of 0 to
.B yyless
will cause the entire current input string to be scanned again.  Unless you've
changed how the scanner will subsequently process its input (using
.B BEGIN,
for example), this will result in an endless loop.
.IP -
.B unput(c)
puts the character
.I c
back onto the input stream.  It will be the next character scanned.
The following action will take the current token and cause it
to be rescanned enclosed in parentheses.
.nf

    {
    int i;
    unput( ')' );
    for ( i = yyleng - 1; i >= 0; --i )
        unput( yytext[i] );
    unput( '(' );
    }

.fi
Note that since each
.B unput()
puts the given character back at the
.I beginning
of the input stream, pushing back strings must be done back-to-front.
.IP -
.B input()
reads the next character from the input stream.  For example,
the following is one way to eat up C comments:
.nf

    %%
    "/*"        {
                register int c;

                for ( ; ; )
                    {
                    while ( (c = input()) != '*' &&
                            c != EOF )
                        ;    /* eat up text of comment */

                    if ( c == '*' )
                        {
                        while ( (c = input()) == '*' )
                            ;
                        if ( c == '/' )
                            break;    /* found the end */
                        }

                    if ( c == EOF )
                        {
                        error( "EOF in comment" );
                        break;
                        }
                    }
                }

.fi
(Note that if the scanner is compiled using
.B C++,
then
.B input()
is instead referred to as
.B yyinput(),
in order to avoid a name clash with the
.B C++
stream by the name of
.I input.)
.IP -
.B yyterminate()
can be used in lieu of a return statement in an action.  It terminates
the scanner and returns a 0 to the scanner's caller, indicating "all done".
Subsequent calls to the scanner will immediately return unless preceded
by a call to
.B yyrestart()
(see below).
By default,
.B yyterminate()
is also called when an end-of-file is encountered.  It is a macro and
may be redefined.
.SH THE GENERATED SCANNER
The output of
.I flex
is the file
.B lex.yy.c,
which contains the scanning routine
.B yylex(),
a number of tables used by it for matching tokens, and a number
of auxiliary routines and macros.  By default,
.B yylex()
is declared as follows:
.nf

    int yylex()
        {
        ... various definitions and the actions in here ...
        }

.fi
(If your environment supports function prototypes, then it will
be "int yylex( void )".)  This definition may be changed by redefining
the "YY_DECL" macro.  For example, you could use:
.nf

    #undef YY_DECL
    #define YY_DECL float lexscan( a, b ) float a, b;

.fi
to give the scanning routine the name
.I lexscan,
returning a float, and taking two floats as arguments.  Note that
if you give arguments to the scanning routine using a
K&R-style/non-prototyped function declaration, you must terminate
the definition with a semi-colon (;).
.LP
Whenever
.B yylex()
is called, it scans tokens from the global input file
.I yyin
(which defaults to stdin).  It continues until it either reaches
an end-of-file (at which point it returns the value 0) or
one of its actions executes a
.I return
statement.
In the former case, when called again the scanner will immediately
return unless
.B yyrestart()
is called to point
.I yyin
at the new input file.  (
.B yyrestart()
takes one argument, a
.B FILE *
pointer.)
In the latter case (i.e., when an action
executes a return), the scanner may then be called again and it
will resume scanning where it left off.
.LP
By default (and for purposes of efficiency), the scanner uses
block-reads rather than simple
.I getc()
calls to read characters from
.I yyin.
The nature of how it gets its input can be controlled by redefining the
.B YY_INPUT
macro.
YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
action is to place up to
.I max_size
characters in the character array
.I buf
and return in the integer variable
.I result
either the
number of characters read or the constant YY_NULL (0 on Unix systems)
to indicate EOF.  The default YY_INPUT reads from the
global file-pointer "yyin".
.LP
A sample redefinition of YY_INPUT (in the definitions
section of the input file):
.nf

    %{
    #undef YY_INPUT
    #define YY_INPUT(buf,result,max_size) \\
        { \\
        int c = getchar(); \\
        result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
        }
    %}

.fi
This definition will change the input processing to occur
one character at a time.
.LP
You also can add in things like keeping track of the
input line number this way; but don't expect your scanner to
go very fast.
.LP
When the scanner receives an end-of-file indication from YY_INPUT,
it then checks the
.B yywrap()
function.  If
.B yywrap()
returns false (zero), then it is assumed that the
function has gone ahead and set up
.I yyin
to point to another input file, and scanning continues.  If it returns
true (non-zero), then the scanner terminates, returning 0 to its
caller.
.LP
The default
.B yywrap()
always returns 1.  Presently, to redefine it you must first
"#undef yywrap", as it is currently implemented as a macro.  As indicated
by the hedging in the previous sentence, it may be changed to
a true function in the near future.
.LP
The scanner writes its
.B ECHO
output to the
.I yyout
global (default, stdout), which may be redefined by the user simply
by assigning it to some other
.B FILE
pointer.
.SH START CONDITIONS
.I flex
provides a mechanism for conditionally activating rules.  Any rule
whose pattern is prefixed with "<sc>" will only be active when
the scanner is in the start condition named "sc".  For example,
.nf

    <STRING>[^"]*        { /* eat up the string body ... */
                ...
                }

.fi
will be active only when the scanner is in the "STRING" start
condition, and
.nf

    <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
                ...
                }

.fi
will be active only when the current start condition is
either "INITIAL", "STRING", or "QUOTE".
.LP
Start conditions
are declared in the definitions (first) section of the input
using unindented lines beginning with either
.B %s
or
.B %x
followed by a list of names.
The former declares
.I inclusive
start conditions, the latter
.I exclusive
start conditions.  A start condition is activated using the
.B BEGIN
action.  Until the next
.B BEGIN
action is executed, rules with the given start
condition will be active and
rules with other start conditions will be inactive.
If the start condition is
.I inclusive,
then rules with no start conditions at all will also be active.
If it is
.I exclusive,
then
.I only
rules qualified with the start condition will be active.
A set of rules contingent on the same exclusive start condition
describe a scanner which is independent of any of the other rules in the
.I flex
input.  Because of this,
exclusive start conditions make it easy to specify "mini-scanners"
which scan portions of the input that are syntactically different
from the rest (e.g., comments).
.LP
If the distinction between inclusive and exclusive start conditions
is still a little vague, here's a simple example illustrating the
connection between the two.  The set of rules:
.nf

    %s example
    %%
    <example>foo           /* do something */

.fi
is equivalent to
.nf

    %x example
    %%
    <INITIAL,example>foo   /* do something */

.fi
.LP
The default rule (to
.B ECHO
any unmatched character) remains active in start conditions.
.LP
.B BEGIN(0)
returns to the original state where only the rules with
no start conditions are active.  This state can also be
referred to as the start-condition "INITIAL", so
.B BEGIN(INITIAL)
is equivalent to
.B BEGIN(0).
(The parentheses around the start condition name are not required but
are considered good style.)
.LP
.B BEGIN
actions can also be given as indented code at the beginning
of the rules section.  For example, the following will cause
the scanner to enter the "SPECIAL" start condition whenever
.I yylex()
is called and the global variable
.I enter_special
is true:
.nf

            int enter_special;

    %x SPECIAL
    %%
            if ( enter_special )
                BEGIN(SPECIAL);

    <SPECIAL>blahblahblah
    ...more rules follow...

.fi
.LP
To illustrate the uses of start conditions,
here is a scanner which provides two different interpretations
of a string like "123.456".  By default it will treat it as
as three tokens, the integer "123", a dot ('.'), and the integer "456".
But if the string is preceded earlier in the line by the string
"expect-floats"
it will treat it as a single token, the floating-point number
123.456:
.nf

    %{
    #include <math.h>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -