📄 flex.man

📁 flex编译器的源代码
💻 MAN
📖 第 1 页 / 共 5 页
字号:
         "[xyz]\"foo"
                    the literal string: [xyz]"foo
         \X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
                      then the ANSI-C interpretation of \x.
                      Otherwise, a literal 'X' (used to escape
                      operators such as '*')
         \0         a NUL character (ASCII code 0)
         \123       the character with octal value 123
         \x2a       the character with hexadecimal value 2a
         (r)        match an r; parentheses are used to override
                      precedence (see below)



Version 2.5          Last change: April 1995                    6






FLEX(1)                  USER COMMANDS                    FLEX(1)



         rs         the regular expression r followed by the
                      regular expression s; called "concatenation"


         r|s        either an r or an s


         r/s        an r but only if it is followed by an s.  The
                      text matched by s is included when determining
                      whether this rule is the "longest match",
                      but is then returned to the input before
                      the action is executed.  So the action only
                      sees the text matched by r.  This type
                      of pattern is called trailing context".
                      (There are some combinations of r/s that flex
                      cannot match correctly; see notes in the
                      Deficiencies / Bugs section below regarding
                      "dangerous trailing context".)
         ^r         an r, but only at the beginning of a line (i.e.,
                      which just starting to scan, or right after a
                      newline has been scanned).
         r$         an r, but only at the end of a line (i.e., just
                      before a newline).  Equivalent to "r/\n".

                    Note that flex's notion of "newline" is exactly
                    whatever the C compiler used to compile flex
                    interprets '\n' as; in particular, on some DOS
                    systems you must either filter out \r's in the
                    input yourself, or explicitly use r/\r\n for "r$".


         <s>r       an r, but only in start condition s (see
                      below for discussion of start conditions)
         <s1,s2,s3>r
                    same, but in any of start conditions s1,
                      s2, or s3
         <*>r       an r in any start condition, even an exclusive one.


         <<EOF>>    an end-of-file
         <s1,s2><<EOF>>
                    an end-of-file when in start condition s1 or s2

     Note that inside of a character class, all  regular  expres-
     sion  operators  lose  their  special  meaning except escape
     ('\') and the character class operators, '-', ']',  and,  at
     the beginning of the class, '^'.

     The regular expressions listed above are  grouped  according
     to  precedence, from highest precedence at the top to lowest
     at the bottom.   Those  grouped  together  have  equal  pre-
     cedence.  For example,



Version 2.5          Last change: April 1995                    7






FLEX(1)                  USER COMMANDS                    FLEX(1)



         foo|bar*

     is the same as

         (foo)|(ba(r*))

     since the '*' operator has higher precedence than concatena-
     tion, and concatenation higher than alternation ('|').  This
     pattern therefore matches either the  string  "foo"  or  the
     string "ba" followed by zero-or-more r's.  To match "foo" or
     zero-or-more "bar"'s, use:

         foo|(bar)*

     and to match zero-or-more "foo"'s-or-"bar"'s:

         (foo|bar)*


     In addition to characters and ranges of characters,  charac-
     ter  classes  can  also contain character class expressions.
     These are expressions enclosed inside [: and  :]  delimiters
     (which themselves must appear between the '[' and ']' of the
     character class; other elements may occur inside the charac-
     ter class, too).  The valid expressions are:

         [:alnum:] [:alpha:] [:blank:]
         [:cntrl:] [:digit:] [:graph:]
         [:lower:] [:print:] [:punct:]
         [:space:] [:upper:] [:xdigit:]

     These  expressions  all  designate  a  set   of   characters
     equivalent  to  the corresponding standard C isXXX function.
     For example, [:alnum:] designates those characters for which
     isalnum()  returns  true  - i.e., any alphabetic or numeric.
     Some  systems  don't  provide  isblank(),  so  flex  defines
     [:blank:] as a blank or a tab.

     For  example,  the  following  character  classes  are   all
     equivalent:

         [[:alnum:]]
         [[:alpha:][:digit:]
         [[:alpha:]0-9]
         [a-zA-Z0-9]

     If your scanner is  case-insensitive  (the  -i  flag),  then
     [:upper:] and [:lower:] are equivalent to [:alpha:].

     Some notes on patterns:

     -    A negated character class such as the example  "[^A-Z]"



Version 2.5          Last change: April 1995                    8






FLEX(1)                  USER COMMANDS                    FLEX(1)



          above   will   match  a  newline  unless  "\n"  (or  an
          equivalent escape sequence) is one  of  the  characters
          explicitly  present  in  the  negated  character  class
          (e.g., "[^A-Z\n]").  This is unlike how many other reg-
          ular  expression tools treat negated character classes,
          but unfortunately  the  inconsistency  is  historically
          entrenched.   Matching  newlines  means  that a pattern
          like [^"]* can match the entire  input  unless  there's
          another quote in the input.

     -    A rule can have at most one instance of  trailing  con-
          text (the '/' operator or the '$' operator).  The start
          condition, '^', and "<<EOF>>" patterns can  only  occur
          at the beginning of a pattern, and, as well as with '/'
          and '$', cannot be grouped inside parentheses.   A  '^'
          which  does  not  occur at the beginning of a rule or a
          '$' which does not occur at the end of a rule loses its
          special  properties  and is treated as a normal charac-
          ter.

          The following are illegal:

              foo/bar$
              <sc1>foo<sc2>bar

          Note  that  the  first  of  these,   can   be   written
          "foo/bar\n".

          The following will result in '$' or '^'  being  treated
          as a normal character:

              foo|(bar$)
              foo|^bar

          If what's wanted is a  "foo"  or  a  bar-followed-by-a-
          newline,  the  following could be used (the special '|'
          action is explained below):

              foo      |
              bar$     /* action goes here */

          A similar trick will work for matching a foo or a  bar-
          at-the-beginning-of-a-line.

HOW THE INPUT IS MATCHED
     When the generated scanner is run,  it  analyzes  its  input
     looking  for strings which match any of its patterns.  If it
     finds more than one match, it takes  the  one  matching  the
     most  text  (for  trailing  context rules, this includes the
     length of the trailing part, even though  it  will  then  be
     returned  to the input).  If it finds two or more matches of
     the same length, the rule listed first  in  the  flex  input



Version 2.5          Last change: April 1995                    9






FLEX(1)                  USER COMMANDS                    FLEX(1)



     file is chosen.

     Once the match is determined, the text corresponding to  the
     match  (called  the  token)  is made available in the global
     character pointer yytext,  and  its  length  in  the  global
     integer yyleng. The action corresponding to the matched pat-
     tern is  then  executed  (a  more  detailed  description  of
     actions  follows),  and  then the remaining input is scanned
     for another match.

     If no match is found, then the default rule is executed: the
     next character in the input is considered matched and copied
     to the standard output.  Thus, the simplest legal flex input
     is:

         %%

     which generates a scanner that simply copies its input  (one
     character at a time) to its output.

     Note that yytext can  be  defined  in  two  different  ways:
     either  as  a character pointer or as a character array. You
     can control which definition flex uses by including  one  of
     the  special  directives  %pointer  or  %array  in the first
     (definitions) section of your flex input.   The  default  is
     %pointer, unless you use the -l lex compatibility option, in
     which case yytext will be an array.  The advantage of  using
     %pointer  is  substantially  faster  scanning  and no buffer
     overflow when matching very large tokens (unless you run out
     of  dynamic  memory).  The disadvantage is that you are res-
     tricted in how your actions can modify yytext (see the  next
     section),  and  calls  to  the unput() function destroys the
     present contents of yytext,  which  can  be  a  considerable
     porting headache when moving between different lex versions.

     The advantage of %array is that you can then  modify  yytext
     to your heart's content, and calls to unput() do not destroy
     yytext (see  below).   Furthermore,  existing  lex  programs
     sometimes access yytext externally using declarations of the
     form:
         extern char yytext[];
     This definition is erroneous when used  with  %pointer,  but
     correct for %array.

     %array defines yytext to be an array of  YYLMAX  characters,
     which  defaults to a fairly large value.  You can change the
     size by simply #define'ing YYLMAX to a  different  value  in
     the  first  section of your flex input.  As mentioned above,
     with %pointer yytext grows dynamically to accommodate  large
     tokens.  While this means your %pointer scanner can accommo-
     date very large tokens (such as matching  entire  blocks  of
     comments),  bear  in  mind  that  each time the scanner must



Version 2.5          Last change: April 1995                   10






FLEX(1)                  USER COMMANDS                    FLEX(1)



     resize yytext it also must rescan the entire token from  the
     beginning,  so  matching such tokens can prove slow.  yytext
     presently does not dynamically grow if  a  call  to  unput()
     results  in too much text being pushed back; instead, a run-
     time error results.

     Also note that  you  cannot  use  %array  with  C++  scanner
     classes (the c++ option; see below).

ACTIONS
     Each pattern in a rule has a corresponding action, which can
     be any arbitrary C statement.  The pattern ends at the first
     non-escaped whitespace character; the remainder of the  line
     is  its  action.  If the action is empty, then when the pat-
     tern is matched the input token is  simply  discarded.   For
     example,  here  is  the  specification  for  a program which
     deletes all occurrences of "zap me" from its input:

         %%
         "zap me"

     (It will copy all other characters in the input to the  out-
     put since they will be matched by the default rule.)

     Here is a program which compresses multiple blanks and  tabs
     down  to a single blank, and throws away whitespace found at
     the end of a line:

         %%
         [ \t]+        putchar( ' ' );
         [ \t]+$       /* ignore this token */


     If the action contains a '{', then the action spans till the
     balancing  '}'  is  found, and the action may cross multiple
     lines.  flex knows about C strings and comments and won't be
     fooled  by braces found within them, but also allows actions
     to begin with %{ and will consider the action to be all  the
     text up to the next %} (regardless of ordinary braces inside
     the action).

     An action consisting solely of a vertical  bar  ('|')  means
     "same  as  the  action for the next rule."  See below for an
     illustration.

     Actions can  include  arbitrary  C  code,  including  return
     statements  to  return  a  value  to whatever routine called
     yylex(). Each time yylex() is called it continues processing
     tokens  from  where it last left off until it either reaches
     the end of the file or executes a return.





Version 2.5          Last change: April 1995                   11






FLEX(1)                  USER COMMANDS                    FLEX(1)



     Actions are free to modify yytext except for lengthening  it
     (adding  characters  to  its end--these will overwrite later
     characters in the input  stream).   This  however  does  not
     apply  when  using  %array (see above); in that case, yytext
     may be freely modified in any way.

     Actions are free to modify yyleng except they should not  do
     so if the action also includes use of yymore() (see below).

     There are a  number  of  special  directives  which  can  be
     included within an action:

     -    ECHO copies yytext to the scanner's output.

     -    BEGIN followed by the name of a start condition  places
          the  scanner  in the corresponding start condition (see
          below).

     -    REJECT directs the scanner to proceed on to the "second
          best"  rule which matched the input (or a prefix of the
          input).  The rule is chosen as described above in  "How
          the  Input  is  Matched",  and yytext and yyleng set up
          appropriately.  It may either be one which  matched  as
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -