📄 flexdoc.1

📁 操作系统设计与实现源码
💻 1
📖 第 1 页 / 共 5 页
字号:
          class  (e.g.,  "[^A-Z\n]").   This  is unlike how many other regular
          expression tools treat negated character classes, but  unfortunately
          the  inconsistency  is  historically  entrenched.  Matching newlines
          means  that  a  pattern  like  [^"]*  can  match  an  entire   input
          (overflowing  the  scanner's  input  buffer)  unless there's another
          quote in the input.

     -    A rule can have at most one instance of trailing  context  (the  '/'
          operator  or  the  '$'  operator).   The  start  condition, '^', and
          "<<EOF>>" patterns can only occur at the  beginning  of  a  pattern,
          and,  as  well  as  with  '/'  and  '$',  cannot  be  grouped inside
          parentheses.  A '^' which does not occur at the beginning of a  rule
          or a '$' which does not occur at the end of a rule loses its special
          properties and is treated as a normal character.

          The following are illegal:

              foo/bar$
              <sc1>foo<sc2>bar

          Note that the first of these, can be written "foo/bar\n".




                                 26 May 1990                                 6



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


          The following will result in '$' or '^' being treated  as  a  normal
          character:

              foo|(bar$)
              foo|^bar

          If what's wanted is a  "foo"  or  a  bar-followed-by-a-newline,  the
          following could be used (the special '|' action is explained below):

              foo      |
              bar$     /* action goes here */

          A similar trick will work  for  matching  a  foo  or  a  bar-at-the-
          beginning-of-a-line.

HOW THE INPUT IS MATCHED
     When the generated scanner is run, it  analyzes  its  input  looking  for
     strings  which  match  any  of  its  patterns.  If it finds more than one
     match, it takes the one matching the  most  text  (for  trailing  context
     rules, this includes the length of the trailing part, even though it will
     then be returned to the input).  If it finds two or more matches  of  the
     same length, the rule listed first in the flex input file is chosen.

     Once the match is determined, the text corresponding to the match (called
     the  token) is made available in the global character pointer yytext, and
     its length in the global integer yyleng. The action corresponding to  the
     matched  pattern is then executed (a more detailed description of actions
     follows), and then the remaining input is scanned for another match.

     If no match is found,  then  the  default  rule  is  executed:  the  next
     character  in  the input is considered matched and copied to the standard
     output.  Thus, the simplest legal flex input is:

         %%

     which generates a scanner that simply copies its input (one character  at
     a time) to its output.

ACTIONS
     Each pattern in a rule has a  corresponding  action,  which  can  be  any
     arbitrary  C  statement.   The  pattern  ends  at  the  first non-escaped
     whitespace character; the remainder of the line is its  action.   If  the
     action  is  empty,  then  when  the pattern is matched the input token is
     simply discarded.  For example, here is the specification for  a  program
     which deletes all occurrences of "zap me" from its input:

         %%
         "zap me"

     (It will copy all other characters in the input to the output since  they


                                 26 May 1990                                 7



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


     will be matched by the default rule.)

     Here is a program which compresses multiple blanks and  tabs  down  to  a
     single blank, and throws away whitespace found at the end of a line:

         %%
         [ \t]+        putchar( ' ' );
         [ \t]+$       /* ignore this token */


     If the action contains a '{', then the action spans  till  the  balancing
     '}'  is found, and the action may cross multiple lines.  flex knows about
     C strings and comments and won't be fooled by braces found  within  them,
     but  also allows actions to begin with %{ and will consider the action to
     be all the text up to the next %} (regardless of ordinary  braces  inside
     the action).

     An action consisting solely of a vertical bar ('|') means  "same  as  the
     action for the next rule."  See below for an illustration.

     Actions can include arbitrary C  code,  including  return  statements  to
     return  a  value to whatever routine called yylex(). Each time yylex() is
     called it continues processing tokens from where it last left  off  until
     it  either  reaches  the  end  of the file or executes a return.  Once it
     reaches an end-of-file, however, then any subsequent call to yylex() will
     simply  immediately  return,  unless  yyrestart()  is  first  called (see
     below).

     Actions are not allowed to modify yytext or yyleng.

     There are a number of special directives which can be included within  an
     action:

     -    ECHO copies yytext to the scanner's output.

     -    BEGIN followed by the name of a start condition places  the  scanner
          in the corresponding start condition (see below).

     -    REJECT directs the scanner to proceed on to the "second  best"  rule
          which  matched  the  input  (or a prefix of the input).  The rule is
          chosen as described above in "How the Input is Matched", and  yytext
          and yyleng set up appropriately.  It may either be one which matched
          as much text as the originally chosen rule but  came  later  in  the
          flex  input  file, or one which matched less text.  For example, the
          following will both count the  words  in  the  input  and  call  the
          routine special() whenever "frob" is seen:

                      int word_count = 0;
              %%



                                 26 May 1990                                 8



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


              frob        special(); REJECT;
              [^ \t\n]+   ++word_count;

          Without the REJECT, any "frob"'s in the input would not  be  counted
          as  words,  since  the scanner normally executes only one action per
          token.  Multiple REJECT's are allowed, each  one  finding  the  next
          best  choice  to  the  currently active rule.  For example, when the
          following scanner scans the token "abcd", it will write "abcdabcaba"
          to the output:

              %%
              a        |
              ab       |
              abc      |
              abcd     ECHO; REJECT;
              .|\n     /* eat up any unmatched character */

          (The first three rules share the fourth's action since they use  the
          special  '|' action.)  REJECT is a particularly expensive feature in
          terms scanner performance; if it is used in  any  of  the  scanner's
          actions   it   will   slow  down  all  of  the  scanner's  matching.
          Furthermore, REJECT cannot be used with the -f or  -F  options  (see
          below).

          Note also that unlike the other special actions, REJECT is a branch;
          code immediately following it in the action will not be executed.

     -    yymore() tells the scanner that the next time it matches a rule, the
          corresponding  token  should  be  appended onto the current value of
          yytext rather than replacing  it.   For  example,  given  the  input
          "mega-kludge"  the  following  will  write "mega-mega-kludge" to the
          output:

              %%
              mega-    ECHO; yymore();
              kludge   ECHO;

          First "mega-" is matched and echoed to the output.  Then "kludge" is
          matched,  but  the  previous  "mega-" is still hanging around at the
          beginning of yytext so the ECHO for the "kludge" rule will  actually
          write  "mega-kludge".   The  presence  of  yymore() in the scanner's
          action entails a minor performance penalty in the scanner's matching
          speed.

     -    yyless(n) returns all but the first  n  characters  of  the  current
          token  back  to  the input stream, where they will be rescanned when
          the scanner looks  for  the  next  match.   yytext  and  yyleng  are
          adjusted  appropriately (e.g., yyleng will now be equal to n ).  For
          example,  on  the  input  "foobar"  the  following  will  write  out
          "foobarbar":


                                 26 May 1990                                 9



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


              %%
              foobar    ECHO; yyless(3);
              [a-z]+    ECHO;

          An argument of 0 to yyless  will  cause  the  entire  current  input
          string  to  be scanned again.  Unless you've changed how the scanner
          will subsequently process its input (using BEGIN, for example), this
          will result in an endless loop.

     -    unput(c) puts the character c back onto the input stream.   It  will
          be  the  next character scanned.  The following action will take the
          current token and cause it to be rescanned enclosed in parentheses.

              {
              int i;
              unput( ')' );
              for ( i = yyleng - 1; i >= 0; --i )
                  unput( yytext[i] );
              unput( '(' );
              }

          Note that since each unput() puts the given character  back  at  the
          beginning  of  the  input  stream, pushing back strings must be done
          back-to-front.

     -    input() reads  the  next  character  from  the  input  stream.   For
          example, the following is one way to eat up C comments:

              %%
              "/*"        {
                          register int c;

                          for ( ; ; )
                              {
                              while ( (c = input()) != '*' &&
                                      c != EOF )
                                  ;    /* eat up text of comment */

                              if ( c == '*' )
                                  {
                                  while ( (c = input()) == '*' )
                                      ;
                                  if ( c == '/' )
                                      break;    /* found the end */
                                  }

                              if ( c == EOF )
                                  {
                                  error( "EOF in comment" );
                                  break;


                                26 May 1990                                 10



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


                                  }
                              }
                          }

          (Note that if the scanner is compiled using  C++,  then  input()  is
          instead  referred  to  as  yyinput(), in order to avoid a name clash
          with the C++ stream by the name of input.)

     -    yyterminate() can be used in  lieu  of  a  return  statement  in  an
          action.   It terminates the scanner and returns a 0 to the scanner's
          caller, indicating "all done".  Subsequent calls to the scanner will
          immediately  return  unless  preceded  by a call to yyrestart() (see
          below).  By default, yyterminate() is also called  when  an  end-of-
          file is encountered.  It is a macro and may be redefined.

THE GENERATED SCANNER
     The output of flex is the file  lex.yy.c,  which  contains  the  scanning
     routine yylex(), a number of tables used by it for matching tokens, and a
     number of auxiliary routines and macros.  By default, yylex() is declared
     as follows:

         int yylex()
             {
             ... various definitions and the actions in here ...
             }

     (If your environment supports function prototypes, then it will  be  "int
     yylex(  void  )".)   This  definition  may  be  changed by redefining the
     "YY_DECL" macro.  For example, you could use:

         #undef YY_DECL
         #define YY_DECL float lexscan( a, b ) float a, b;

     to give the scanning routine the name lexscan,  returning  a  float,  and
     taking  two  floats as arguments.  Note that if you give arguments to the
     scanning routine using a K&R-style/non-prototyped  function  declaration,
     you must terminate the definition with a semi-colon (;).

     Whenever yylex() is called, it scans tokens from the  global  input  file
     yyin  (which defaults to stdin).  It continues until it either reaches an
     end-of-file (at which point it returns the value 0) or one of its actions
     executes  a  return statement.  In the former case, when called again the
     scanner will immediately return unless yyrestart()  is  called  to  point
     yyin  at  the new input file.  ( yyrestart() takes one argument, a FILE *
     pointer.)  In the latter case (i.e., when an action executes  a  return),
     the scanner may then be called again and it will resume scanning where it
     left off.





                                26 May 1990                                 11



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -