📄 flexdoc.1

📁 操作系统设计与实现源码
💻 1
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
          -Ce directs flex to construct equivalence  classes,  i.e.,  sets  of
          characters  which have identical lexical properties (for example, if
          the only appearance of digits in the flex input is in the  character
          class  "[0-9]" then the digits '0', '1', ..., '9' will all be put in
          the same  equivalence  class).   Equivalence  classes  usually  give
          dramatic  reductions in the final table/object file sizes (typically
          a factor of 2-5) and are pretty cheap  performance-wise  (one  array
          look-up per character scanned).

          -Cf specifies that the full scanner tables  should  be  generated  -
          flex  should not compress the tables by taking advantages of similar
          transition functions for different states.

          -CF  specifies  that  the  alternate  fast  scanner   representation
          (described above under the -F flag) should be used.

          -Cm directs flex to construct meta-equivalence  classes,  which  are
          sets  of  equivalence classes (or characters, if equivalence classes
          are  not  being  used)  that  are  commonly  used  together.   Meta-
          equivalence  classes  are  often  a  big  win  when using compressed
          tables, but they have a moderate performance impact (one or two "if"
          tests and one array look-up per character scanned).

          A lone -C specifies that the scanner tables should be compressed but
          neither  equivalence  classes nor meta-equivalence classes should be
          used.

          The options -Cf or -CF and -Cm do not make sense together - there is
          no  opportunity  for  meta-equivalence  classes  if the table is not
          being compressed.  Otherwise the options may be freely mixed.

          The default setting  is  -Cem,  which  specifies  that  flex  should
          generate  equivalence  classes  and  meta-equivalence classes.  This
          setting provides the highest degree of table compression.   You  can
          trade  off  faster-executing  scanners  at the cost of larger tables
          with the following generally being true:

              slowest & smallest
                    -Cem
                    -Cm
                    -Ce
                    -C
                    -C{f,F}e
                    -C{f,F}
              fastest & largest


                                26 May 1990                                 23



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


          Note that scanners with the smallest tables  are  usually  generated
          and  compiled  the  quickest, so during development you will usually
          want to use the default, maximal compression.

          -Cfe  is  often  a  good  compromise  between  speed  and  size  for
          production scanners.

          -C options are not cumulative; whenever the flag is encountered, the
          previous -C settings are forgotten.

     -Sskeleton_file
          overrides the default skeleton file from which flex  constructs  its
          scanners.   You'll  never need this option unless you are doing flex
          maintenance or development.

PERFORMANCE CONSIDERATIONS
     The main design  goal  of  flex  is  that  it  generate  high-performance
     scanners.   It  has  been  optimized  for dealing well with large sets of
     rules.  Aside from the effects of  table  compression  on  scanner  speed
     outlined  above,  there  are  a  number  of options/actions which degrade
     performance.  These are, from most expensive to least:

         REJECT

         pattern sets that require backtracking
         arbitrary trailing context

         '^' beginning-of-line operator
         yymore()

     with the first three all being quite expensive and  the  last  two  being
     quite cheap.

     REJECT should be avoided at all costs when performance is important.   It
     is a particularly expensive option.

     Getting rid of backtracking is messy and often may be an enormous  amount
     of work for a complicated scanner.  In principal, one begins by using the
     -b flag to generate a lex.backtrack file.  For example, on the input

         %%
         foo        return TOK_KEYWORD;
         foobar     return TOK_KEYWORD;

     the file looks like:

         State #6 is non-accepting -
          associated rule line numbers:
                2       3
          out-transitions: [ o ]


                                26 May 1990                                 24



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


          jam-transitions: EOF [ \001-n  p-\177 ]

         State #8 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ a ]
          jam-transitions: EOF [ \001-`  b-\177 ]

         State #9 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ r ]
          jam-transitions: EOF [ \001-q  s-\177 ]

         Compressed tables always backtrack.

     The first few lines tell us that there's a scanner state in which it  can
     make  a  transition on an 'o' but not on any other character, and that in
     that state the currently scanned text does not match any rule.  The state
     occurs when trying to match the rules found at lines 2 and 3 in the input
     file.  If the scanner is in that state and  then  reads  something  other
     than  an  'o', it will have to backtrack to find a rule which is matched.
     With a bit of headscratching one can see that this must be the state it's
     in when it has seen "fo".  When this has happened, if anything other than
     another 'o' is seen, the scanner will have to back up to simply match the
     'f' (by the default rule).

     The comment regarding State #8 indicates there's a  problem  when  "foob"
     has been scanned.  Indeed, on any character other than a 'b', the scanner
     will have to back up to accept "foo".  Similarly, the comment  for  State
     #9 concerns when "fooba" has been scanned.

     The final comment reminds us that there's  no  point  going  to  all  the
     trouble  of removing backtracking from the rules unless we're using -f or
     -F, since there's no performance gain doing so with compressed scanners.

     The way to remove the backtracking is to add "error" rules:

         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;

         fooba       |
         foob        |
         fo          {
                     /* false alarm, not really a keyword */
                     return TOK_ID;
                     }




                                26 May 1990                                 25



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


     Eliminating backtracking among a list of keywords can also be done  using
     a "catch-all" rule:

         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;

         [a-z]+      return TOK_ID;

     This is usually the best solution when appropriate.

     Backtracking messages tend to cascade.  With a complicated set  of  rules
     it's not uncommon to get hundreds of messages.  If one can decipher them,
     though, it often only  takes  a  dozen  or  so  rules  to  eliminate  the
     backtracking  (though  it's easy to make a mistake and have an error rule
     accidentally match a valid token.  A possible future flex feature will be
     to automatically add rules to eliminate backtracking).

     Variable trailing context (where both the leading and trailing  parts  do
     not  have  a  fixed  length)  entails almost the same performance loss as
     REJECT (i.e., substantial).  So when possible a rule like:

         %%
         mouse|rat/(cat|dog)   run();

     is better written:

         %%
         mouse/cat|dog         run();
         rat/cat|dog           run();

     or as

         %%
         mouse|rat/cat         run();
         mouse|rat/dog         run();

     Note that here the special '|' action does not provide any  savings,  and
     can even make things worse (see BUGS in flex(1)).

     Another area where the user can increase a scanner's performance (and one
     that's  easier  to  implement)  arises  from the fact that the longer the
     tokens matched, the faster the scanner will run.  This  is  because  with
     long  tokens  the  processing of most input characters takes place in the
     (short) inner scanning loop, and does not often have to  go  through  the
     additional work of setting up the scanning environment (e.g., yytext) for
     the action.  Recall the scanner for C comments:

         %x comment
         %%


                                26 May 1990                                 26



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


                 int line_num = 1;

         "/*"         BEGIN(comment);

         <comment>[^*\n]*
         <comment>"*"+[^*/\n]*
         <comment>\n             ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

     This could be sped up by writing it as:

         %x comment
         %%
                 int line_num = 1;

         "/*"         BEGIN(comment);

         <comment>[^*\n]*
         <comment>[^*\n]*\n      ++line_num;
         <comment>"*"+[^*/\n]*
         <comment>"*"+[^*/\n]*\n ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

     Now instead of each newline requiring the processing of  another  action,
     recognizing  the  newlines  is "distributed" over the other rules to keep
     the matched text as long as possible.  Note that adding  rules  does  not
     slow  down  the  scanner!  The speed of the scanner is independent of the
     number of rules or (modulo the considerations given at the  beginning  of
     this section) how complicated the rules are with regard to operators such
     as '*' and '|'.

     A final example in speeding up  a  scanner:  suppose  you  want  to  scan
     through a file containing identifiers and keywords, one per line and with
     no other extraneous  characters,  and  recognize  all  the  keywords.   A
     natural first approach is:

         %%
         asm      |
         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */

         .|\n     /* it's not a keyword */

     To eliminate the back-tracking, introduce a catch-all rule:

         %%
         asm      |


                                26 May 1990                                 27



FLEX(1)                   Minix Programmer's Manual                    FLEX(1)


         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */

         [a-z]+   |
         .|\n     /* it's not a keyword */

     Now, if it's guaranteed that there's exactly one word per line,  then  we
     can  reduce  the  total  number  of  matches  by a half by merging in the
     recognition of newlines with that of the other tokens:

         %%
         asm\n    |
         auto\n   |
         break\n  |
         ... etc ...
         volatile\n |
         while\n  /* it's a keyword */

         [a-z]+\n |
         .|\n     /* it's not a keyword */

     One has to be careful here, as we have now reintroduced backtracking into
     the  scanner.   In particular, while we know that there will never be any
     characters in the input stream other than letters or newlines, flex can't
     figure  this out, and it will plan for possibly needing backtracking when
     it has scanned a token  like  "auto"  and  then  the  next  character  is
     something  other  than  a  newline or a letter.  Previously it would then
     just match the "auto" rule and be done, but now it has  no  "auto"  rule,
     only  a  "auto\n" rule.  To eliminate the possibility of backtracking, we
     could either duplicate all rules but without final newlines, or, since we
     never  expect  to  encounter  such  an input and therefore don't how it
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -