📄 flex.1

📁 flex编译器的源代码
💻 1
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
.TH FLEX 1 "April 1995" "Version 2.5"
.SH NAME
flex \- fast lexical analyzer generator
.SH SYNOPSIS
.B flex
.B [\-bcdfhilnpstvwBFILTV78+? \-C[aefFmr] \-ooutput \-Pprefix \-Sskeleton]
.B [\-\-help \-\-version]
.I [filename ...]
.SH OVERVIEW
This manual describes
.I flex,
a tool for generating programs that perform pattern-matching on text.  The
manual includes both tutorial and reference sections:
.nf

    Description
        a brief overview of the tool

    Some Simple Examples

    Format Of The Input File

    Patterns
        the extended regular expressions used by flex

    How The Input Is Matched
        the rules for determining what has been matched

    Actions
        how to specify what to do when a pattern is matched

    The Generated Scanner
        details regarding the scanner that flex produces;
        how to control the input source

    Start Conditions
        introducing context into your scanners, and
        managing "mini-scanners"

    Multiple Input Buffers
        how to manipulate multiple input sources; how to
        scan from strings instead of files

    End-of-file Rules
        special rules for matching the end of the input

    Miscellaneous Macros
        a summary of macros available to the actions

    Values Available To The User
        a summary of values available to the actions

    Interfacing With Yacc
        connecting flex scanners together with yacc parsers

    Options
        flex command-line options, and the "%option"
        directive

    Performance Considerations
        how to make your scanner go as fast as possible

    Generating C++ Scanners
        the (experimental) facility for generating C++
        scanner classes

    Incompatibilities With Lex And POSIX
        how flex differs from AT&T lex and the POSIX lex
        standard

    Diagnostics
        those error messages produced by flex (or scanners
        it generates) whose meanings might not be apparent

    Files
        files used by flex

    Deficiencies / Bugs
        known problems with flex

    See Also
        other documentation, related tools

    Author
        includes contact information

.fi
.SH DESCRIPTION
.I flex
is a tool for generating
.I scanners:
programs which recognized lexical patterns in text.
.I flex
reads
the given input files, or its standard input if no file names are given,
for a description of a scanner to generate.  The description is in
the form of pairs
of regular expressions and C code, called
.I rules.  flex
generates as output a C source file,
.B lex.yy.c,
which defines a routine
.B yylex().
This file is compiled and linked with the
.B \-lfl
library to produce an executable.  When the executable is run,
it analyzes its input for occurrences
of the regular expressions.  Whenever it finds one, it executes
the corresponding C code.
.SH SOME SIMPLE EXAMPLES
.PP
First some simple examples to get the flavor of how one uses
.I flex.
The following
.I flex
input specifies a scanner which whenever it encounters the string
"username" will replace it with the user's login name:
.nf

    %%
    username    printf( "%s", getlogin() );

.fi
By default, any text not matched by a
.I flex
scanner
is copied to the output, so the net effect of this scanner is
to copy its input file to its output with each occurrence
of "username" expanded.
In this input, there is just one rule.  "username" is the
.I pattern
and the "printf" is the
.I action.
The "%%" marks the beginning of the rules.
.PP
Here's another simple example:
.nf

            int num_lines = 0, num_chars = 0;

    %%
    \\n      ++num_lines; ++num_chars;
    .       ++num_chars;

    %%
    main()
            {
            yylex();
            printf( "# of lines = %d, # of chars = %d\\n",
                    num_lines, num_chars );
            }

.fi
This scanner counts the number of characters and the number
of lines in its input (it produces no output other than the
final report on the counts).  The first line
declares two globals, "num_lines" and "num_chars", which are accessible
both inside
.B yylex()
and in the
.B main()
routine declared after the second "%%".  There are two rules, one
which matches a newline ("\\n") and increments both the line count and
the character count, and one which matches any character other than
a newline (indicated by the "." regular expression).
.PP
A somewhat more complicated example:
.nf

    /* scanner for a toy Pascal-like language */

    %{
    /* need this for the call to atof() below */
    #include <math.h>
    %}

    DIGIT    [0-9]
    ID       [a-z][a-z0-9]*

    %%

    {DIGIT}+    {
                printf( "An integer: %s (%d)\\n", yytext,
                        atoi( yytext ) );
                }

    {DIGIT}+"."{DIGIT}*        {
                printf( "A float: %s (%g)\\n", yytext,
                        atof( yytext ) );
                }

    if|then|begin|end|procedure|function        {
                printf( "A keyword: %s\\n", yytext );
                }

    {ID}        printf( "An identifier: %s\\n", yytext );

    "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );

    "{"[^}\\n]*"}"     /* eat up one-line comments */

    [ \\t\\n]+          /* eat up whitespace */

    .           printf( "Unrecognized character: %s\\n", yytext );

    %%

    main( argc, argv )
    int argc;
    char **argv;
        {
        ++argv, --argc;  /* skip over program name */
        if ( argc > 0 )
                yyin = fopen( argv[0], "r" );
        else
                yyin = stdin;
        
        yylex();
        }

.fi
This is the beginnings of a simple scanner for a language like
Pascal.  It identifies different types of
.I tokens
and reports on what it has seen.
.PP
The details of this example will be explained in the following
sections.
.SH FORMAT OF THE INPUT FILE
The
.I flex
input file consists of three sections, separated by a line with just
.B %%
in it:
.nf

    definitions
    %%
    rules
    %%
    user code

.fi
The
.I definitions
section contains declarations of simple
.I name
definitions to simplify the scanner specification, and declarations of
.I start conditions,
which are explained in a later section.
.PP
Name definitions have the form:
.nf

    name definition

.fi
The "name" is a word beginning with a letter or an underscore ('_')
followed by zero or more letters, digits, '_', or '-' (dash).
The definition is taken to begin at the first non-white-space character
following the name and continuing to the end of the line.
The definition can subsequently be referred to using "{name}", which
will expand to "(definition)".  For example,
.nf

    DIGIT    [0-9]
    ID       [a-z][a-z0-9]*

.fi
defines "DIGIT" to be a regular expression which matches a
single digit, and
"ID" to be a regular expression which matches a letter
followed by zero-or-more letters-or-digits.
A subsequent reference to
.nf

    {DIGIT}+"."{DIGIT}*

.fi
is identical to
.nf

    ([0-9])+"."([0-9])*

.fi
and matches one-or-more digits followed by a '.' followed
by zero-or-more digits.
.PP
The
.I rules
section of the
.I flex
input contains a series of rules of the form:
.nf

    pattern   action

.fi
where the pattern must be unindented and the action must begin
on the same line.
.PP
See below for a further description of patterns and actions.
.PP
Finally, the user code section is simply copied to
.B lex.yy.c
verbatim.
It is used for companion routines which call or are called
by the scanner.  The presence of this section is optional;
if it is missing, the second
.B %%
in the input file may be skipped, too.
.PP
In the definitions and rules sections, any
.I indented
text or text enclosed in
.B %{
and
.B %}
is copied verbatim to the output (with the %{}'s removed).
The %{}'s must appear unindented on lines by themselves.
.PP
In the rules section,
any indented or %{} text appearing before the
first rule may be used to declare variables
which are local to the scanning routine and (after the declarations)
code which is to be executed whenever the scanning routine is entered.
Other indented or %{} text in the rule section is still copied to the output,
but its meaning is not well-defined and it may well cause compile-time
errors (this feature is present for
.I POSIX
compliance; see below for other such features).
.PP
In the definitions section (but not in the rules section),
an unindented comment (i.e., a line
beginning with "/*") is also copied verbatim to the output up
to the next "*/".
.SH PATTERNS
The patterns in the input are written using an extended set of regular
expressions.  These are:
.nf

    x          match the character 'x'
    .          any character (byte) except newline
    [xyz]      a "character class"; in this case, the pattern
                 matches either an 'x', a 'y', or a 'z'
    [abj-oZ]   a "character class" with a range in it; matches
                 an 'a', a 'b', any letter from 'j' through 'o',
                 or a 'Z'
    [^A-Z]     a "negated character class", i.e., any character
                 but those in the class.  In this case, any
                 character EXCEPT an uppercase letter.
    [^A-Z\\n]   any character EXCEPT an uppercase letter or
                 a newline
    r*         zero or more r's, where r is any regular expression
    r+         one or more r's
    r?         zero or one r's (that is, "an optional r")
    r{2,5}     anywhere from two to five r's
    r{2,}      two or more r's
    r{4}       exactly 4 r's
    {name}     the expansion of the "name" definition
               (see above)
    "[xyz]\\"foo"
               the literal string: [xyz]"foo
    \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                 then the ANSI-C interpretation of \\x.
                 Otherwise, a literal 'X' (used to escape
                 operators such as '*')
    \\0         a NUL character (ASCII code 0)
    \\123       the character with octal value 123
    \\x2a       the character with hexadecimal value 2a
    (r)        match an r; parentheses are used to override
                 precedence (see below)


    rs         the regular expression r followed by the
                 regular expression s; called "concatenation"


    r|s        either an r or an s


    r/s        an r but only if it is followed by an s.  The
                 text matched by s is included when determining
                 whether this rule is the "longest match",
                 but is then returned to the input before
                 the action is executed.  So the action only
                 sees the text matched by r.  This type
                 of pattern is called trailing context".
                 (There are some combinations of r/s that flex
                 cannot match correctly; see notes in the
                 Deficiencies / Bugs section below regarding
                 "dangerous trailing context".)
    ^r         an r, but only at the beginning of a line (i.e.,
                 which just starting to scan, or right after a
                 newline has been scanned).
    r$         an r, but only at the end of a line (i.e., just
                 before a newline).  Equivalent to "r/\\n".

               Note that flex's notion of "newline" is exactly
               whatever the C compiler used to compile flex
               interprets '\\n' as; in particular, on some DOS
               systems you must either filter out \\r's in the
               input yourself, or explicitly use r/\\r\\n for "r$".


    <s>r       an r, but only in start condition s (see
                 below for discussion of start conditions)
    <s1,s2,s3>r
               same, but in any of start conditions s1,
                 s2, or s3
    <*>r       an r in any start condition, even an exclusive one.


    <<EOF>>    an end-of-file
    <s1,s2><<EOF>>
               an end-of-file when in start condition s1 or s2

.fi
Note that inside of a character class, all regular expression operators
lose their special meaning except escape ('\\') and the character class
operators, '-', ']', and, at the beginning of the class, '^'.
.PP
The regular expressions listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence.  For example,
.nf

    foo|bar*

.fi
is the same as
.nf

    (foo)|(ba(r*))

.fi
since the '*' operator has higher precedence than concatenation,
and concatenation higher than alternation ('|').  This pattern
therefore matches
.I either
the string "foo"
.I or
the string "ba" followed by zero-or-more r's.
To match "foo" or zero-or-more "bar"'s, use:
.nf

    foo|(bar)*

.fi
and to match zero-or-more "foo"'s-or-"bar"'s:
.nf

    (foo|bar)*

.fi
.PP
In addition to characters and ranges of characters, character classes
can also contain character class
.I expressions.
These are expressions enclosed inside
.B [:
and
.B :]
delimiters (which themselves must appear between the '[' and ']' of the
character class; other elements may occur inside the character class, too).
The valid expressions are:
.nf

    [:alnum:] [:alpha:] [:blank:]
    [:cntrl:] [:digit:] [:graph:]
    [:lower:] [:print:] [:punct:]
    [:space:] [:upper:] [:xdigit:]

.fi
These expressions all designate a set of characters equivalent to
the corresponding standard C
.B isXXX
function.  For example,
.B [:alnum:]
designates those characters for which
.B isalnum()
returns true - i.e., any alphabetic or numeric.
Some systems don't provide
.B isblank(),
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -