📄 flexdoc.1

📁 C++版词法分析、语法分析器
💻 1
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
to keep the matched text as long as possible.  Note that
.I adding
rules does
.I not
slow down the scanner!  The speed of the scanner is independent
of the number of rules or (modulo the considerations given at the
beginning of this section) how complicated the rules are with
regard to operators such as '*' and '|'.
.LP
A final example in speeding up a scanner: suppose you want to scan
through a file containing identifiers and keywords, one per line
and with no other extraneous characters, and recognize all the
keywords.  A natural first approach is:
.nf

    %%
    asm      |
    auto     |
    break    |
    ... etc ...
    volatile |
    while    /* it's a keyword */

    .|\\n     /* it's not a keyword */

.fi
To eliminate the back-tracking, introduce a catch-all rule:
.nf

    %%
    asm      |
    auto     |
    break    |
    ... etc ...
    volatile |
    while    /* it's a keyword */

    [a-z]+   |
    .|\\n     /* it's not a keyword */

.fi
Now, if it's guaranteed that there's exactly one word per line,
then we can reduce the total number of matches by a half by
merging in the recognition of newlines with that of the other
tokens:
.nf

    %%
    asm\\n    |
    auto\\n   |
    break\\n  |
    ... etc ...
    volatile\\n |
    while\\n  /* it's a keyword */

    [a-z]+\\n |
    .|\\n     /* it's not a keyword */

.fi
One has to be careful here, as we have now reintroduced backtracking
into the scanner.  In particular, while
.I we
know that there will never be any characters in the input stream
other than letters or newlines,
.I flex
can't figure this out, and it will plan for possibly needing backtracking
when it has scanned a token like "auto" and then the next character
is something other than a newline or a letter.  Previously it would
then just match the "auto" rule and be done, but now it has no "auto"
rule, only a "auto\\n" rule.  To eliminate the possibility of backtracking,
we could either duplicate all rules but without final newlines, or,
since we never expect to encounter such an input and therefore don't
how it's classified, we can introduce one more catch-all rule, this
one which doesn't include a newline:
.nf

    %%
    asm\\n    |
    auto\\n   |
    break\\n  |
    ... etc ...
    volatile\\n |
    while\\n  /* it's a keyword */

    [a-z]+\\n |
    [a-z]+   |
    .|\\n     /* it's not a keyword */

.fi
Compiled with
.B -Cf,
this is about as fast as one can get a
.I flex 
scanner to go for this particular problem.
.LP
A final note:
.I flex
is slow when matching NUL's, particularly when a token contains
multiple NUL's.
It's best to write rules which match
.I short
amounts of text if it's anticipated that the text will often include NUL's.
.SH INCOMPATIBILITIES WITH LEX AND POSIX
.I flex
is a rewrite of the Unix
.I lex
tool (the two implementations do not share any code, though),
with some extensions and incompatibilities, both of which
are of concern to those who wish to write scanners acceptable
to either implementation.  At present, the POSIX
.I lex
draft is
very close to the original
.I lex
implementation, so some of these
incompatibilities are also in conflict with the POSIX draft.  But
the intent is that except as noted below,
.I flex
as it presently stands will
ultimately be POSIX conformant (i.e., that those areas of conflict with
the POSIX draft will be resolved in
.I flex's
favor).  Please bear in
mind that all the comments which follow are with regard to the POSIX
.I draft
standard of Summer 1989, and not the final document (or subsequent
drafts); they are included so
.I flex
users can be aware of the standardization issues and those areas where
.I flex
may in the near future undergo changes incompatible with
its current definition.
.LP
.I flex
is fully compatible with
.I lex
with the following exceptions:
.IP -
The undocumented
.I lex
scanner internal variable
.B yylineno
is not supported.  It is difficult to support this option efficiently,
since it requires examining every character scanned and reexamining
the characters when the scanner backs up.
Things get more complicated when the end of buffer or file is reached or a
NUL is scanned (since the scan must then be restarted with the proper line
number count), or the user uses the yyless(), unput(), or REJECT actions,
or the multiple input buffer functions.
.IP
The fix is to add rules which, upon seeing a newline, increment
yylineno.  This is usually an easy process, though it can be a drag if some
of the patterns can match multiple newlines along with other characters.
.IP
yylineno is not part of the POSIX draft.
.IP -
The
.B input()
routine is not redefinable, though it may be called to read characters
following whatever has been matched by a rule.  If
.B input()
encounters an end-of-file the normal
.B yywrap()
processing is done.  A ``real'' end-of-file is returned by
.B input()
as
.I EOF.
.IP
Input is instead controlled by redefining the
.B YY_INPUT
macro.
.IP
The
.I flex
restriction that
.B input()
cannot be redefined is in accordance with the POSIX draft, but
.B YY_INPUT
has not yet been accepted into the draft (and probably won't; it looks
like the draft will simply not specify any way of controlling the
scanner's input other than by making an initial assignment to
.I yyin).
.IP -
.I flex
scanners do not use stdio for input.  Because of this, when writing an
interactive scanner one must explicitly call fflush() on the
stream associated with the terminal after writing out a prompt.
With
.I lex
such writes are automatically flushed since
.I lex
scanners use
.B getchar()
for their input.  Also, when writing interactive scanners with
.I flex,
the
.B -I
flag must be used.
.IP -
.I flex
scanners are not as reentrant as
.I lex
scanners.  In particular, if you have an interactive scanner and
an interrupt handler which long-jumps out of the scanner, and
the scanner is subsequently called again, you may get the following
message:
.nf

    fatal flex scanner internal error--end of buffer missed

.fi
To reenter the scanner, first use
.nf

    yyrestart( yyin );

.fi
.IP -
.B output()
is not supported.
Output from the
.B ECHO
macro is done to the file-pointer
.I yyout
(default
.I stdout).
.IP
The POSIX draft mentions that an
.B output()
routine exists but currently gives no details as to what it does.
.IP -
.I lex
does not support exclusive start conditions (%x), though they
are in the current POSIX draft.
.IP -
When definitions are expanded,
.I flex
encloses them in parentheses.
With lex, the following:
.nf

    NAME    [A-Z][A-Z0-9]*
    %%
    foo{NAME}?      printf( "Found it\\n" );
    %%

.fi
will not match the string "foo" because when the macro
is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
and the precedence is such that the '?' is associated with
"[A-Z0-9]*".  With
.I flex,
the rule will be expanded to
"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
Note that because of this, the
.B ^, $, <s>, /,
and
.B <<EOF>>
operators cannot be used in a
.I flex
definition.
.IP
The POSIX draft interpretation is the same as
.I flex's.
.IP -
To specify a character class which matches anything but a left bracket (']'),
in
.I lex
one can use "[^]]" but with
.I flex
one must use "[^\\]]".  The latter works with
.I lex,
too.
.IP -
The
.I lex
.B %r
(generate a Ratfor scanner) option is not supported.  It is not part
of the POSIX draft.
.IP -
If you are providing your own yywrap() routine, you must include a
"#undef yywrap" in the definitions section (section 1).  Note that
the "#undef" will have to be enclosed in %{}'s.
.IP
The POSIX draft
specifies that yywrap() is a function and this is very unlikely to change; so
.I flex users are warned
that
.B yywrap()
is likely to be changed to a function in the near future.
.IP -
After a call to
.B unput(),
.I yytext
and
.I yyleng
are undefined until the next token is matched.  This is not the case with
.I lex
or the present POSIX draft.
.IP -
The precedence of the
.B {}
(numeric range) operator is different.
.I lex
interprets "abc{1,3}" as "match one, two, or
three occurrences of 'abc'", whereas
.I flex
interprets it as "match 'ab'
followed by one, two, or three occurrences of 'c'".  The latter is
in agreement with the current POSIX draft.
.IP -
The precedence of the
.B ^
operator is different.
.I lex
interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
or 'bar' anywhere", whereas
.I flex
interprets it as "match either 'foo' or 'bar' if they come at the beginning
of a line".  The latter is in agreement with the current POSIX draft.
.IP -
To refer to yytext outside of the scanner source file,
the correct definition with
.I flex
is "extern char *yytext" rather than "extern char yytext[]".
This is contrary to the current POSIX draft but a point on which
.I flex
will not be changing, as the array representation entails a
serious performance penalty.  It is hoped that the POSIX draft will
be emended to support the
.I flex
variety of declaration (as this is a fairly painless change to
require of
.I lex
users).
.IP -
.I yyin
is
.I initialized
by
.I lex
to be
.I stdin;
.I flex,
on the other hand,
initializes
.I yyin
to NULL
and then
.I assigns
it to
.I stdin
the first time the scanner is called, providing
.I yyin
has not already been assigned to a non-NULL value.  The difference is
subtle, but the net effect is that with
.I flex
scanners,
.I yyin
does not have a valid value until the scanner has been called.
.IP -
The special table-size declarations such as
.B %a
supported by
.I lex
are not required by
.I flex
scanners;
.I flex
ignores them.
.IP -
The name
.bd
FLEX_SCANNER
is #define'd so scanners may be written for use with either
.I flex
or
.I lex.
.LP
The following
.I flex
features are not included in
.I lex
or the POSIX draft standard:
.nf

    yyterminate()
    <<EOF>>
    YY_DECL
    #line directives
    %{}'s around actions
    yyrestart()
    comments beginning with '#' (deprecated)
    multiple actions on a line

.fi
This last feature refers to the fact that with
.I flex
you can put multiple actions on the same line, separated with
semi-colons, while with
.I lex,
the following
.nf

    foo    handle_foo(); ++num_foos_seen;

.fi
is (rather surprisingly) truncated to
.nf

    foo    handle_foo();

.fi
.I flex
does not truncate the action.  Actions that are not enclosed in
braces are simply terminated at the end of the line.
.SH DIAGNOSTICS
.I reject_used_but_not_detected undefined
or
.I yymore_used_but_not_detected undefined -
These errors can occur at compile time.  They indicate that the
scanner uses
.B REJECT
or
.B yymore()
but that
.I flex
failed to notice the fact, meaning that
.I flex
scanned the first two sections looking for occurrences of these actions
and failed to find any, but somehow you snuck some in (via a #include
file, for example).  Make an explicit reference to the action in your
.I flex
input file.  (Note that previously
.I flex
supported a
.B %used/%unused
mechanism for dealing with this problem; this feature is still supported
but now deprecated, and will go away soon unless the author hears from
people who can argue compellingly that they need it.)
.LP
.I flex scanner jammed -
a scanner compiled with
.B -s
has encountered an input string which wasn't matched by
any of its rules.
.LP
.I flex input buffer overflowed -
a scanner rule matched a string long enough to overflow the
scanner's internal input buffer (16K bytes by default - controlled by
.B YY_BUF_SIZE
in "flex.skel".  Note that to redefine this macro, you must first
.B #undefine
it).
.LP
.I scanner requires -8 flag -
Your scanner specification includes recognizing 8-bit characters and
you did not specify the -8 flag (and your site has not installed flex
with -8 as the default).
.LP
.I
fatal flex scanner internal error--end of buffer missed -
This can occur in an scanner which is reentered after a long-jump
has jumped out (or over) the scanner's activation frame.  Before
reentering the scanner, use:
.nf

    yyrestart( yyin );

.fi
.LP
.I too many %t classes! -
You managed to put every single character into its own %t class.
.I flex
requires that at least one of the classes share characte
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -