📄 flexdoc.1

📁 操作系统设计与实现源码
💻 1
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页


FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


NAME
     flexdoc - fast lexical analyzer generator

SYNOPSIS
     flex [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] [filename ...]

DESCRIPTION
     flex is a tool for generating scanners: programs which recognized lexical
     patterns  in  text.   flex  reads  the given input files, or its standard
     input if no file names are given, for  a  description  of  a  scanner  to
     generate.  The description is in the form of pairs of regular expressions
     and C code, called rules. flex generates  as  output  a  C  source  file,
     lex.yy.c,  which  defines  a  routine  yylex(). This file is compiled and
     linked with  the  -lfl  library  to  produce  an  executable.   When  the
     executable  is  run, it analyzes its input for occurrences of the regular
     expressions.  Whenever it finds one,  it  executes  the  corresponding  C
     code.

SOME SIMPLE EXAMPLES

     First some simple examples to get the flavor of how one  uses  flex.  The
     following flex input specifies a scanner which whenever it encounters the
     string "username" will replace it with the user's login name:

         %%
         username    printf( "%s", getlogin() );

     By default, any text not matched by a  flex  scanner  is  copied  to  the
     output,  so  the  net effect of this scanner is to copy its input file to
     its output with each occurrence of "username" expanded.  In  this  input,
     there  is  just  one rule.  "username" is the pattern and the "printf" is
     the action. The "%%" marks the beginning of the rules.

     Here's another simple example:

             int num_lines = 0, num_chars = 0;

         %%
         \n    ++num_lines; ++num_chars;
         .     ++num_chars;

         %%
         main()
             {
             yylex();
             printf( "# of lines = %d, # of chars = %d\n",
                     num_lines, num_chars );
             }

     This scanner counts the number of characters and the number of  lines  in


                                 26 May 1990                                 1



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


     its input (it produces no output other  than  the  final  report  on  the
     counts).    The   first   line  declares  two  globals,  "num_lines"  and
     "num_chars", which are accessible both inside yylex() and in  the  main()
     routine  declared  after the second "%%".  There are two rules, one which
     matches a newline ("\n") and increments  both  the  line  count  and  the
     character count, and one which matches any character other than a newline
     (indicated by the "." regular expression).

     A somewhat more complicated example:

         /* scanner for a toy Pascal-like language */

         %{
         /* need this for the call to atof() below */
         #include <math.h>
         %}

         DIGIT    [0-9]
         ID       [a-z][a-z0-9]*

         %%

         {DIGIT}+    {
                     printf( "An integer: %s (%d)\n", yytext,
                             atoi( yytext ) );
                     }

         {DIGIT}+"."{DIGIT}*        {
                     printf( "A float: %s (%g)\n", yytext,
                             atof( yytext ) );
                     }

         if|then|begin|end|procedure|function        {
                     printf( "A keyword: %s\n", yytext );
                     }

         {ID}        printf( "An identifier: %s\n", yytext );

         "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );

         "{"[^}\n]*"}"     /* eat up one-line comments */

         [ \t\n]+          /* eat up whitespace */

         .           printf( "Unrecognized character: %s\n", yytext );

         %%

         main( argc, argv )
         int argc;


                                 26 May 1990                                 2



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


         char **argv;
             {
             ++argv, --argc;  /* skip over program name */
             if ( argc > 0 )
                     yyin = fopen( argv[0], "r" );
             else
                     yyin = stdin;

             yylex();
             }

     This is the beginnings of a simple scanner for a  language  like  Pascal.
     It identifies different types of tokens and reports on what it has seen.

     The details of this example will be explained in the following sections.

FORMAT OF THE INPUT FILE
     The flex input file consists of three sections, separated by a line  with
     just %% in it:

         definitions
         %%
         rules
         %%
         user code

     The definitions section contains declarations of simple name  definitions
     to   simplify  the  scanner  specification,  and  declarations  of  start
     conditions, which are explained in a later section.

     Name definitions have the form:

         name definition

     The "name" is a word beginning with  a  letter  or  an  underscore  ('_')
     followed  by  zero  or  more  letters,  digits,  '_', or '-' (dash).  The
     definition is taken to  begin  at  the  first  non-white-space  character
     following the name and continuing to the end of the line.  The definition
     can subsequently be referred to using  "{name}",  which  will  expand  to
     "(definition)".  For example,

         DIGIT    [0-9]
         ID       [a-z][a-z0-9]*

     defines "DIGIT" to be a regular expression which matches a single  digit,
     and  "ID"  to  be a regular expression which matches a letter followed by
     zero-or-more letters-or-digits.  A subsequent reference to

         {DIGIT}+"."{DIGIT}*



                                 26 May 1990                                 3



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


     is identical to

         ([0-9])+"."([0-9])*

     and matches one-or-more digits followed by a '.' followed by zero-or-more
     digits.

     The rules section of the flex input contains a series  of  rules  of  the
     form:

         pattern   action

     where the pattern must be unindented and the action  must  begin  on  the
     same line.

     See below for a further description of patterns and actions.

     Finally, the user code section is simply copied to lex.yy.c verbatim.  It
     is  used  for companion routines which call or are called by the scanner.
     The presence of this section is optional; if it is missing, the second %%
     in the input file may be skipped, too.

     In the definitions and rules sections, any indented text or text enclosed
     in  %{  and %} is copied verbatim to the output (with the %{}'s removed).
     The %{}'s must appear unindented on lines by themselves.

     In the rules section, any indented or %{} text appearing before the first
     rule  may  be  used  to declare variables which are local to the scanning
     routine and (after  the  declarations)  code  which  is  to  be  executed
     whenever  the scanning routine is entered.  Other indented or %{} text in
     the rule section is still copied to the output, but its  meaning  is  not
     well-defined  and  it may well cause compile-time errors (this feature is
     present for POSIX compliance; see below for other such features).

     In the definitions section, an unindented comment (i.e., a line beginning
     with  "/*")  is  also  copied verbatim to the output up to the next "*/".
     Also, any line in the definitions section beginning with '#' is  ignored,
     though this style of comment is deprecated and may go away in the future.

PATTERNS
     The patterns in the input are written using an extended  set  of  regular
     expressions.  These are:

         x          match the character 'x'
         .          any character except newline
         [xyz]      a "character class"; in this case, the pattern
                      matches either an 'x', a 'y', or a 'z'
         [abj-oZ]   a "character class" with a range in it; matches
                      an 'a', a 'b', any letter from 'j' through 'o',
                      or a 'Z'


                                 26 May 1990                                 4



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


         [^A-Z]     a "negated character class", i.e., any character
                      but those in the class.  In this case, any
                      character EXCEPT an uppercase letter.
         [^A-Z\n]   any character EXCEPT an uppercase letter or
                      a newline
         r*         zero or more r's, where r is any regular expression
         r+         one or more r's
         r?         zero or one r's (that is, "an optional r")
         r{2,5}     anywhere from two to five r's
         r{2,}      two or more r's
         r{4}       exactly 4 r's
         {name}     the expansion of the "name" definition
                    (see above)
         "[xyz]\"foo"
                    the literal string: [xyz]"foo
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                      then the ANSI-C interpretation of \x.
                      Otherwise, a literal 'X' (used to escape
                      operators such as '*')
         \123       the character with octal value 123
         \x2a       the character with hexadecimal value 2a
         (r)        match an r; parentheses are used to override
                      precedence (see below)


         rs         the regular expression r followed by the
                      regular expression s; called "concatenation"


         r|s        either an r or an s


         r/s        an r but only if it is followed by an s.  The
                      s is not part of the matched text.  This type
                      of pattern is called as "trailing context".
         ^r         an r, but only at the beginning of a line
         r$         an r, but only at the end of a line.  Equivalent
                      to "r/\n".


         <s>r       an r, but only in start condition s (see
                    below for discussion of start conditions)
         <s1,s2,s3>r
                    same, but in any of start conditions s1,
                    s2, or s3


         <<EOF>>    an end-of-file
         <s1,s2><<EOF>>
                    an end-of-file when in start condition s1 or s2


                                 26 May 1990                                 5



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


     The regular expressions listed above are grouped according to precedence,
     from  highest  precedence  at  the  top  to  lowest at the bottom.  Those
     grouped together have equal precedence.  For example,

         foo|bar*

     is the same as

         (foo)|(ba(r*))

     since the '*' operator has  higher  precedence  than  concatenation,  and
     concatenation  higher  than  alternation  ('|').   This pattern therefore
     matches either the string "foo" or the string "ba" followed  by  zero-or-
     more r's.  To match "foo" or zero-or-more "bar"'s, use:

         foo|(bar)*

     and to match zero-or-more "foo"'s-or-"bar"'s:

         (foo|bar)*


     Some notes on patterns:

     -    A negated character class such as the example  "[^A-Z]"  above  will
          match  a  newline  unless "\n" (or an equivalent escape sequence) is
          one of the characters explicitly present in  the  negated  character
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -