📄 flexdoc.1

📁 生成C++的词法/语法分析的Flex语法分析器
💻 1
📖 第 1 页 / 共 5 页
字号:
and the next two lines give the date when the scanner was createdand a summary of the flags which were in effect..TP.B -Fspecifies that the.ulfastscanner table representation should be used.  This representation isabout as fast as the full table representation.ul(-f),and for some sets of patterns will be considerably smaller (and forothers, larger).  In general, if the pattern set contains both "keywords"and a catch-all, "identifier" rule, such as in the set:.nf    "case"    return TOK_CASE;    "switch"  return TOK_SWITCH;    ...    "default" return TOK_DEFAULT;    [a-z]+    return TOK_ID;.fithen you're better off using the full table representation.  If onlythe "identifier" rule is present and you then use a hash table or some suchto detect the keywords, you're better off using.ul-F..IPThis option is equivalent to.B -CF(see below)..TP.B -Iinstructs.I flexto generate an.I interactivescanner.  Normally, scanners generated by.I flexalways look ahead onecharacter before deciding that a rule has been matched.  At the cost ofsome scanning overhead,.I flexwill generate a scanner which only looks aheadwhen needed.  Such scanners are called.I interactivebecause if you want to write a scanner for an interactive system such as acommand shell, you will probably want the user's input to be terminatedwith a newline, and without.B -Ithe user will have to type a character in addition to the newline in orderto have the newline recognized.  This leads to dreadful interactiveperformance..IPIf all this seems to confusing, here's the general rule: if a human willbe typing in input to your scanner, use.B -I,otherwise don't; if you don't care about squeezing the utmost performancefrom your scanner and youdon't want to make any assumptions about the input to your scanner,use.B -I..IPNote,.B -Icannot be used in conjunction with.I fullor.I fast tables,i.e., the.B -f, -F, -Cf,or.B -CFflags..TP.B -Linstructs.I flexnot to generate.B #linedirectives.  Without this option,.I flexpeppers the generated scannerwith #line directives so error messages in the actions will be correctlylocated with respect to the original.I flexinput file, and not tothe fairly meaningless line numbers of.B lex.yy.c.(Unfortunately.I flexdoes not presently generate the necessary directivesto "retarget" the line numbers for those parts of.B lex.yy.cwhich it generated.  So if there is an error in the generated code,a meaningless line number is reported.).TP.B -Tmakes.I flexrun in.I tracemode.  It will generate a lot of messages to.I stdoutconcerningthe form of the input and the resultant non-deterministic and deterministicfinite automata.  This option is mostly for use in maintaining.I flex..TP.B -8instructs.I flexto generate an 8-bit scanner, i.e., one which can recognize 8-bitcharacters.  On some sites,.I flexis installed with this option as the default.  On others, the defaultis 7-bit characters.  To see which is the case, check the verbose.B (-v)output for "equivalence classes created".  If the denominator ofthe number shown is 128, then by default.I flexis generating 7-bit characters.  If it is 256, then the default is8-bit characters and the.B -8flag is not required (but may be a good idea to keep the scannerspecification portable).  Feeding a 7-bit scanner 8-bit characterswill result in infinite loops, bus errors, or other such fireworks,so when in doubt, use the flag.  Note that if equivalence classesare used, 8-bit scanners take only slightly more table space than7-bit scanners (128 bytes, to be exact); if equivalence classes arenot used, however, then the tables may grow up to twice their7-bit size..TP .B -C[efmF]controls the degree of table compression..IP.B -Cedirects.I flexto construct.I equivalence classes,i.e., sets of characterswhich have identical lexical properties (for example, if the onlyappearance of digits in the.I flexinput is in the character class"[0-9]" then the digits '0', '1', ..., '9' will all be putin the same equivalence class).  Equivalence classes usually givedramatic reductions in the final table/object file sizes (typicallya factor of 2-5) and are pretty cheap performance-wise (one arraylook-up per character scanned)..IP.B -Cfspecifies that the.I fullscanner tables should be generated -.I flexshould not compress thetables by taking advantages of similar transition functions fordifferent states..IP.B -CFspecifies that the alternate fast scanner representation (describedabove under the.B -Fflag)should be used..IP.B -Cmdirects.I flexto construct.I meta-equivalence classes,which are sets of equivalence classes (or characters, if equivalenceclasses are not being used) that are commonly used together.  Meta-equivalenceclasses are often a big win when using compressed tables, but theyhave a moderate performance impact (one or two "if" tests and onearray look-up per character scanned)..IPA lone.B -Cspecifies that the scanner tables should be compressed but neitherequivalence classes nor meta-equivalence classes should be used..IPThe options.B -Cfor.B -CFand.B -Cmdo not make sense together - there is no opportunity for meta-equivalenceclasses if the table is not being compressed.  Otherwise the optionsmay be freely mixed..IPThe default setting is.B -Cem,which specifies that.I flexshould generate equivalence classesand meta-equivalence classes.  This setting provides the highestdegree of table compression.  You can trade offfaster-executing scanners at the cost of larger tables withthe following generally being true:.nf    slowest & smallest          -Cem          -Cm          -Ce          -C          -C{f,F}e          -C{f,F}    fastest & largest.fiNote that scanners with the smallest tables are usually generated andcompiled the quickest, soduring development you will usually want to use the default, maximalcompression..IP.B -Cfeis often a good compromise between speed and size for productionscanners..IP.B -Coptions are not cumulative; whenever the flag is encountered, theprevious -C settings are forgotten..TP.B -Sskeleton_fileoverrides the default skeleton file from which.I flexconstructs its scanners.  You'll never need this option unless you are doing.I flexmaintenance or development..SH PERFORMANCE CONSIDERATIONSThe main design goal of.I flexis that it generate high-performance scanners.  It has been optimizedfor dealing well with large sets of rules.  Aside from the effectsof table compression on scanner speed outlined above,there are a number of options/actions which degrade performance.  Theseare, from most expensive to least:.nf    REJECT    pattern sets that require backtracking    arbitrary trailing context    '^' beginning-of-line operator    yymore().fiwith the first three all being quite expensive and the last twobeing quite cheap..LP.B REJECTshould be avoided at all costs when performance is important.It is a particularly expensive option..LPGetting rid of backtracking is messy and often may be an enormousamount of work for a complicated scanner.  In principal, one beginsby using the.B -b flag to generate a.I lex.backtrackfile.  For example, on the input.nf    %%    foo        return TOK_KEYWORD;    foobar     return TOK_KEYWORD;.fithe file looks like:.nf    State #6 is non-accepting -     associated rule line numbers:           2       3     out-transitions: [ o ]     jam-transitions: EOF [ \\001-n  p-\\177 ]    State #8 is non-accepting -     associated rule line numbers:           3     out-transitions: [ a ]     jam-transitions: EOF [ \\001-`  b-\\177 ]    State #9 is non-accepting -     associated rule line numbers:           3     out-transitions: [ r ]     jam-transitions: EOF [ \\001-q  s-\\177 ]    Compressed tables always backtrack..fiThe first few lines tell us that there's a scanner state inwhich it can make a transition on an 'o' but not on any othercharacter, and that in that state the currently scanned text does not matchany rule.  The state occurs when trying to match the rules foundat lines 2 and 3 in the input file.If the scanner is in that state and then readssomething other than an 'o', it will have to backtrack to finda rule which is matched.  Witha bit of headscratching one can see that this must be thestate it's in when it has seen "fo".  When this has happened,if anything other than another 'o' is seen, the scanner willhave to back up to simply match the 'f' (by the default rule)..LPThe comment regarding State #8 indicates there's a problemwhen "foob" has been scanned.  Indeed, on any character otherthan a 'b', the scanner will have to back up to accept "foo".Similarly, the comment for State #9 concerns when "fooba" hasbeen scanned..LPThe final comment reminds us that there's no point going toall the trouble of removing backtracking from the rules unlesswe're using.B -for.B -F,since there's no performance gain doing so with compressed scanners..LPThe way to remove the backtracking is to add "error" rules:.nf    %%    foo         return TOK_KEYWORD;    foobar      return TOK_KEYWORD;    fooba       |    foob        |    fo          {                /* false alarm, not really a keyword */                return TOK_ID;                }.fi.LPEliminating backtracking among a list of keywords can also bedone using a "catch-all" rule:.nf    %%    foo         return TOK_KEYWORD;    foobar      return TOK_KEYWORD;    [a-z]+      return TOK_ID;.fiThis is usually the best solution when appropriate..LPBacktracking messages tend to cascade.With a complicated set of rules it's not uncommon to get hundredsof messages.  If one can decipher them, though, it oftenonly takes a dozen or so rules to eliminate the backtracking (thoughit's easy to make a mistake and have an error rule accidentally matcha valid token.  A possible future.I flexfeature will be to automatically add rules to eliminate backtracking)..LP.I Variabletrailing context (where both the leading and trailing parts do not havea fixed length) entails almost the same performance loss as.I REJECT(i.e., substantial).  So when possible a rule like:.nf    %%    mouse|rat/(cat|dog)   run();.fiis better written:.nf    %%    mouse/cat|dog         run();    rat/cat|dog           run();.fior as.nf    %%    mouse|rat/cat         run();    mouse|rat/dog         run();.fiNote that here the special '|' action does.I notprovide any savings, and can even make things worse (see.B BUGSin flex(1))..LPAnother area where the user can increase a scanner's performance(and one that's easier to implement) arises from the fact thatthe longer the tokens matched, the faster the scanner will run.This is because with long tokens the processing of most inputcharacters takes place in the (short) inner scanning loop, anddoes not often have to go through the additional work of setting upthe scanning environment (e.g.,.B yytext)for the action.  Recall the scanner for C comments:.nf    %x comment    %%            int line_num = 1;    "/*"         BEGIN(comment);    <comment>[^*\\n]*    <comment>"*"+[^*/\\n]*    <comment>\\n             ++line_num;    <comment>"*"+"/"        BEGIN(INITIAL);.fiThis could be sped up by writing it as:.nf    %x comment    %%            int line_num = 1;    "/*"         BEGIN(comment);    <comment>[^*\\n]*    <comment>[^*\\n]*\\n      ++line_num;    <comment>"*"+[^*/\\n]*    <comment>"*"+[^*/\\n]*\\n ++line_num;    <comment>"*"+"/"        BEGIN(INITIAL);.fiNow instead of each newline requiring the processing of anotheraction, recognizing the newlines is "distributed" over the other rulesto keep the matched text as long as possible.  Note that.I addingrules does.I notslow down the scanner!  The speed of the scanner is independentof the number of rules or (modulo the considerations given at thebeginning of this section) how complicated the rules are withregard to operators such as '*' and '|'..LPA final example in speeding up a scanner: suppose you want to scanthrough a file containing identifiers and keywords, one per lineand with no other extraneous characters, and recognize all thekeywords.  A natural first approach is:.nf    %%    asm      |    auto     |    break    |    ... etc ...    volatile |    while    /* it's a keyword */    .|\\n     /* it's not a keyword */.fiTo eliminate the back-tracking, introduce a catch-all rule:.nf    %%    asm      |    auto     |    break    |    ... etc ...    volatile |    while    /* it's a keyword */    [a-z]+   |    .|\\n     /* it's not a keyword */.fiNow, if it's guaranteed that there's exactly one word per line,then we can reduce the total number of matches by a half bymerging in the recognition of newlines with that of the othertokens:.nf    %%    asm\\n    |    auto\\n   |    break\\n  |    ... etc ...
💿 文件大小 334 K
👤 上传用户 junbo2009
📂 所属分类编译器/解释器
🏷️ 相关标签

#Flex #分 #语法分析器
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -