📄 flex.man

📁 一个用flex、bison和vc开发的堆栈机
💻 MAN
📖 第 1 页 / 共 5 页
字号:
Version 2.5          Last change: April 1995                    6FLEX(1)                  USER COMMANDS                    FLEX(1)         rs         the regular expression r followed by the                      regular expression s; called "concatenation"         r|s        either an r or an s         r/s        an r but only if it is followed by an s.  The                      text matched by s is included when determining                      whether this rule is the "longest match",                      but is then returned to the input before                      the action is executed.  So the action only                      sees the text matched by r.  This type                      of pattern is called trailing context".                      (There are some combinations of r/s that flex                      cannot match correctly; see notes in the                      Deficiencies / Bugs section below regarding                      "dangerous trailing context".)         ^r         an r, but only at the beginning of a line (i.e.,                      which just starting to scan, or right after a                      newline has been scanned).         r$         an r, but only at the end of a line (i.e., just                      before a newline).  Equivalent to "r/\n".                    Note that flex's notion of "newline" is exactly                    whatever the C compiler used to compile flex                    interprets '\n' as; in particular, on some DOS                    systems you must either filter out \r's in the                    input yourself, or explicitly use r/\r\n for "r$".         <s>r       an r, but only in start condition s (see                      below for discussion of start conditions)         <s1,s2,s3>r                    same, but in any of start conditions s1,                      s2, or s3         <*>r       an r in any start condition, even an exclusive one.         <<EOF>>    an end-of-file         <s1,s2><<EOF>>                    an end-of-file when in start condition s1 or s2     Note that inside of a character class, all  regular  expres-     sion  operators  lose  their  special  meaning except escape     ('\') and the character class operators, '-', ']',  and,  at     the beginning of the class, '^'.     The regular expressions listed above are  grouped  according     to  precedence, from highest precedence at the top to lowest     at the bottom.   Those  grouped  together  have  equal  pre-     cedence.  For example,Version 2.5          Last change: April 1995                    7FLEX(1)                  USER COMMANDS                    FLEX(1)         foo|bar*     is the same as         (foo)|(ba(r*))     since the '*' operator has higher precedence than concatena-     tion, and concatenation higher than alternation ('|').  This     pattern therefore matches either the  string  "foo"  or  the     string "ba" followed by zero-or-more r's.  To match "foo" or     zero-or-more "bar"'s, use:         foo|(bar)*     and to match zero-or-more "foo"'s-or-"bar"'s:         (foo|bar)*     In addition to characters and ranges of characters,  charac-     ter  classes  can  also contain character class expressions.     These are expressions enclosed inside [: and  :]  delimiters     (which themselves must appear between the '[' and ']' of the     character class; other elements may occur inside the charac-     ter class, too).  The valid expressions are:         [:alnum:] [:alpha:] [:blank:]         [:cntrl:] [:digit:] [:graph:]         [:lower:] [:print:] [:punct:]         [:space:] [:upper:] [:xdigit:]     These  expressions  all  designate  a  set   of   characters     equivalent  to  the corresponding standard C isXXX function.     For example, [:alnum:] designates those characters for which     isalnum()  returns  true  - i.e., any alphabetic or numeric.     Some  systems  don't  provide  isblank(),  so  flex  defines     [:blank:] as a blank or a tab.     For  example,  the  following  character  classes  are   all     equivalent:         [[:alnum:]]         [[:alpha:][:digit:]         [[:alpha:]0-9]         [a-zA-Z0-9]     If your scanner is  case-insensitive  (the  -i  flag),  then     [:upper:] and [:lower:] are equivalent to [:alpha:].     Some notes on patterns:     -    A negated character class such as the example  "[^A-Z]"Version 2.5          Last change: April 1995                    8FLEX(1)                  USER COMMANDS                    FLEX(1)          above   will   match  a  newline  unless  "\n"  (or  an          equivalent escape sequence) is one  of  the  characters          explicitly  present  in  the  negated  character  class          (e.g., "[^A-Z\n]").  This is unlike how many other reg-          ular  expression tools treat negated character classes,          but unfortunately  the  inconsistency  is  historically          entrenched.   Matching  newlines  means  that a pattern          like [^"]* can match the entire  input  unless  there's          another quote in the input.     -    A rule can have at most one instance of  trailing  con-          text (the '/' operator or the '$' operator).  The start          condition, '^', and "<<EOF>>" patterns can  only  occur          at the beginning of a pattern, and, as well as with '/'          and '$', cannot be grouped inside parentheses.   A  '^'          which  does  not  occur at the beginning of a rule or a          '$' which does not occur at the end of a rule loses its          special  properties  and is treated as a normal charac-          ter.          The following are illegal:              foo/bar$              <sc1>foo<sc2>bar          Note  that  the  first  of  these,   can   be   written          "foo/bar\n".          The following will result in '$' or '^'  being  treated          as a normal character:              foo|(bar$)              foo|^bar          If what's wanted is a  "foo"  or  a  bar-followed-by-a-          newline,  the  following could be used (the special '|'          action is explained below):              foo      |              bar$     /* action goes here */          A similar trick will work for matching a foo or a  bar-          at-the-beginning-of-a-line.HOW THE INPUT IS MATCHED     When the generated scanner is run,  it  analyzes  its  input     looking  for strings which match any of its patterns.  If it     finds more than one match, it takes  the  one  matching  the     most  text  (for  trailing  context rules, this includes the     length of the trailing part, even though  it  will  then  be     returned  to the input).  If it finds two or more matches of     the same length, the rule listed first  in  the  flex  inputVersion 2.5          Last change: April 1995                    9FLEX(1)                  USER COMMANDS                    FLEX(1)     file is chosen.     Once the match is determined, the text corresponding to  the     match  (called  the  token)  is made available in the global     character pointer yytext,  and  its  length  in  the  global     integer yyleng. The action corresponding to the matched pat-     tern is  then  executed  (a  more  detailed  description  of     actions  follows),  and  then the remaining input is scanned     for another match.     If no match is found, then the default rule is executed: the     next character in the input is considered matched and copied     to the standard output.  Thus, the simplest legal flex input     is:         %%     which generates a scanner that simply copies its input  (one     character at a time) to its output.     Note that yytext can  be  defined  in  two  different  ways:     either  as  a character pointer or as a character array. You     can control which definition flex uses by including  one  of     the  special  directives  %pointer  or  %array  in the first     (definitions) section of your flex input.   The  default  is     %pointer, unless you use the -l lex compatibility option, in     which case yytext will be an array.  The advantage of  using     %pointer  is  substantially  faster  scanning  and no buffer     overflow when matching very large tokens (unless you run out     of  dynamic  memory).  The disadvantage is that you are res-     tricted in how your actions can modify yytext (see the  next     section),  and  calls  to  the unput() function destroys the     present contents of yytext,  which  can  be  a  considerable     porting headache when moving between different lex versions.     The advantage of %array is that you can then  modify  yytext     to your heart's content, and calls to unput() do not destroy     yytext (see  below).   Furthermore,  existing  lex  programs     sometimes access yytext externally using declarations of the     form:         extern char yytext[];     This definition is erroneous when used  with  %pointer,  but     correct for %array.     %array defines yytext to be an array of  YYLMAX  characters,     which  defaults to a fairly large value.  You can change the     size by simply #define'ing YYLMAX to a  different  value  in     the  first  section of your flex input.  As mentioned above,     with %pointer yytext grows dynamically to accommodate  large     tokens.  While this means your %pointer scanner can accommo-     date very large tokens (such as matching  entire  blocks  of     comments),  bear  in  mind  that  each time the scanner mustVersion 2.5          Last change: April 1995                   10FLEX(1)                  USER COMMANDS                    FLEX(1)     resize yytext it also must rescan the entire token from  the     beginning,  so  matching such tokens can prove slow.  yytext     presently does not dynamically grow if  a  call  to  unput()     results  in too much text being pushed back; instead, a run-     time error results.     Also note that  you  cannot  use  %array  with  C++  scanner     classes (the c++ option; see below).ACTIONS     Each pattern in a rule has a corresponding action, which can     be any arbitrary C statement.  The pattern ends at the first     non-escaped whitespace character; the remainder of the  line     is  its  action.  If the action is empty, then when the pat-     tern is matched the input token is  simply  discarded.   For     example,  here  is  the  specification  for  a program which     deletes all occurrences of "zap me" from its input:         %%         "zap me"     (It will copy all other characters in the input to the  out-     put since they will be matched by the default rule.)     Here is a program which compresses multiple blanks and  tabs     down  to a single blank, and throws away whitespace found at     the end of a line:         %%         [ \t]+        putchar( ' ' );         [ \t]+$       /* ignore this token */     If the action contains a '{', then the action spans till the     balancing  '}'  is  found, and the action may cross multiple     lines.  flex knows about C strings and comments and won't be     fooled  by braces found within them, but also allows actions     to begin with %{ and will consider the action to be all  the     text up to the next %} (regardless of ordinary braces inside     the action).     An action consisting solely of a vertical  bar  ('|')  means     "same  as  the  action for the next rule."  See below for an     illustration.     Actions can  include  arbitrary  C  code,  including  return     statements  to  return  a  value  to whatever routine called     yylex(). Each time yylex() is called it continues processing     tokens  from  where it last left off until it either reaches     the end of the file or executes a return.Version 2.5          Last change: April 1995                   11FLEX(1)                  USER COMMANDS                    FLEX(1)     Actions are free to modify yytext except for lengthening  it     (adding  characters  to  its end--these will overwrite later     characters in the input  stream).   This  however  does  not     apply  when  using  %array (see above); in that case, yytext     may be freely modified in any way.     Actions are free to modify yyleng except they should not  do     so if the action also includes use of yymore() (see below).     There are a  number  of  special  directives  which  can  be     included within an action:     -    ECHO copies yytext to the scanner's output.     -    BEGIN followed by the name of a start condition  places          the  scanner  in the corresponding start condition (see          below).     -    REJECT directs the scanner to proceed on to the "second          best"  rule which matched the input (or a prefix of the          input).  The rule is chosen as described above in  "How          the  Input  is  Matched",  and yytext and yyleng set up          appropriately.  It may either be one which  matched  as          much  text as the originally chosen rule but came later          in the flex input file, or one which matched less text.          For example, the following will both count the words in          the input  and  call  the  routine  special()  whenever          "frob" is seen:                      int word_count = 0;              %%              frob        special(); REJECT;              [^ \t\n]+   ++word_count;          Without the REJECT, any "frob"'s in the input would not          be  counted  as  words, since the scanner normally exe-          cutes only one action per token.  Multiple REJECT's are          allowed,  each  one finding the next best choice to the          currently active rule.  For example, when the following          scanner  scans the token "abcd", it will write "abcdab-          caba" to the output:              %%              a        |
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -