📄 flexdoc.1

📁 早期freebsd实现
💻 1
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
.B lex.foo.c.Here are all of the names affected:.nf    yyFlexLexer    yy_create_buffer    yy_delete_buffer    yy_flex_debug    yy_init_buffer    yy_load_buffer_state    yy_switch_to_buffer    yyin    yyleng    yylex    yyout    yyrestart    yytext    yywrap.fiWithin your scanner itself, you can still refer to the global variablesand functions using either version of their name; but eternally, theyhave the modified name..IPThis option lets you easily link together multiple.I lexprograms into the same executable.  Note, though, that using thisoption also renames.B yywrap(),so you now.I mustprovide your own (appropriately-named) version of the routine for yourscanner, as linking with.B \-llno longer provides one for you by default..TP.B \-Sskeleton_fileoverrides the default skeleton file from which.I lexconstructs its scanners.  You'll never need this option unless you are doing.I lexmaintenance or development..SH PERFORMANCE CONSIDERATIONSThe main design goal of.I lexis that it generate high-performance scanners.  It has been optimizedfor dealing well with large sets of rules.  Aside from the effects onscanner speed of the table compression.B \-Coptions outlined above,there are a number of options/actions which degrade performance.  Theseare, from most expensive to least:.nf    REJECT    pattern sets that require backing up    arbitrary trailing context    yymore()    '^' beginning-of-line operator.fiwith the first three all being quite expensive and the last twobeing quite cheap.  Note also that.B unput()is implemented as a routine call that potentially does quite a bit ofwork, while.B yyless()is a quite-cheap macro; so if just putting back some excess text youscanned, use.B yyless()..PP.B REJECTshould be avoided at all costs when performance is important.It is a particularly expensive option..PPGetting rid of backing up is messy and often may be an enormousamount of work for a complicated scanner.  In principal, one beginsby using the.B \-b flag to generate a.I lex.backupfile.  For example, on the input.nf    %%    foo        return TOK_KEYWORD;    foobar     return TOK_KEYWORD;.fithe file looks like:.nf    State #6 is non-accepting -     associated rule line numbers:           2       3     out-transitions: [ o ]     jam-transitions: EOF [ \\001-n  p-\\177 ]    State #8 is non-accepting -     associated rule line numbers:           3     out-transitions: [ a ]     jam-transitions: EOF [ \\001-`  b-\\177 ]    State #9 is non-accepting -     associated rule line numbers:           3     out-transitions: [ r ]     jam-transitions: EOF [ \\001-q  s-\\177 ]    Compressed tables always back up..fiThe first few lines tell us that there's a scanner state inwhich it can make a transition on an 'o' but not on any othercharacter, and that in that state the currently scanned text does not matchany rule.  The state occurs when trying to match the rules foundat lines 2 and 3 in the input file.If the scanner is in that state and then readssomething other than an 'o', it will have to back up to finda rule which is matched.  Witha bit of headscratching one can see that this must be thestate it's in when it has seen "fo".  When this has happened,if anything other than another 'o' is seen, the scanner willhave to back up to simply match the 'f' (by the default rule)..PPThe comment regarding State #8 indicates there's a problemwhen "foob" has been scanned.  Indeed, on any character otherthan an 'a', the scanner will have to back up to accept "foo".Similarly, the comment for State #9 concerns when "fooba" hasbeen scanned and an 'r' does not follow..PPThe final comment reminds us that there's no point going toall the trouble of removing backing up from the rules unlesswe're using.B \-Cfor.B \-CF,since there's no performance gain doing so with compressed scanners..PPThe way to remove the backing up is to add "error" rules:.nf    %%    foo         return TOK_KEYWORD;    foobar      return TOK_KEYWORD;    fooba       |    foob        |    fo          {                /* false alarm, not really a keyword */                return TOK_ID;                }.fi.PPEliminating backing up among a list of keywords can also bedone using a "catch-all" rule:.nf    %%    foo         return TOK_KEYWORD;    foobar      return TOK_KEYWORD;    [a-z]+      return TOK_ID;.fiThis is usually the best solution when appropriate..PPBacking up messages tend to cascade.With a complicated set of rules it's not uncommon to get hundredsof messages.  If one can decipher them, though, it oftenonly takes a dozen or so rules to eliminate the backing up (thoughit's easy to make a mistake and have an error rule accidentally matcha valid token.  A possible future.I lexfeature will be to automatically add rules to eliminate backing up)..PP.I Variabletrailing context (where both the leading and trailing parts do not havea fixed length) entails almost the same performance loss as.B REJECT(i.e., substantial).  So when possible a rule like:.nf    %%    mouse|rat/(cat|dog)   run();.fiis better written:.nf    %%    mouse/cat|dog         run();    rat/cat|dog           run();.fior as.nf    %%    mouse|rat/cat         run();    mouse|rat/dog         run();.fiNote that here the special '|' action does.I notprovide any savings, and can even make things worse (see.PPA final note regarding performance: as mentioned above in the sectionHow the Input is Matched, dynamically resizing.B yytextto accomodate huge tokens is a slow process because it presently requires thatthe (huge) token be rescanned from the beginning.  Thus if performance isvital, you should attempt to match "large" quantities of text but not"huge" quantities, where the cutoff between the two is at about 8Kcharacters/token..PPAnother area where the user can increase a scanner's performance(and one that's easier to implement) arises from the fact thatthe longer the tokens matched, the faster the scanner will run.This is because with long tokens the processing of most inputcharacters takes place in the (short) inner scanning loop, anddoes not often have to go through the additional work of setting upthe scanning environment (e.g.,.B yytext)for the action.  Recall the scanner for C comments:.nf    %x comment    %%            int line_num = 1;    "/*"         BEGIN(comment);    <comment>[^*\\n]*    <comment>"*"+[^*/\\n]*    <comment>\\n             ++line_num;    <comment>"*"+"/"        BEGIN(INITIAL);.fiThis could be sped up by writing it as:.nf    %x comment    %%            int line_num = 1;    "/*"         BEGIN(comment);    <comment>[^*\\n]*    <comment>[^*\\n]*\\n      ++line_num;    <comment>"*"+[^*/\\n]*    <comment>"*"+[^*/\\n]*\\n ++line_num;    <comment>"*"+"/"        BEGIN(INITIAL);.fiNow instead of each newline requiring the processing of anotheraction, recognizing the newlines is "distributed" over the other rulesto keep the matched text as long as possible.  Note that.I addingrules does.I notslow down the scanner!  The speed of the scanner is independentof the number of rules or (modulo the considerations given at thebeginning of this section) how complicated the rules are withregard to operators such as '*' and '|'..PPA final example in speeding up a scanner: suppose you want to scanthrough a file containing identifiers and keywords, one per lineand with no other extraneous characters, and recognize all thekeywords.  A natural first approach is:.nf    %%    asm      |    auto     |    break    |    ... etc ...    volatile |    while    /* it's a keyword */    .|\\n     /* it's not a keyword */.fiTo eliminate the back-tracking, introduce a catch-all rule:.nf    %%    asm      |    auto     |    break    |    ... etc ...    volatile |    while    /* it's a keyword */    [a-z]+   |    .|\\n     /* it's not a keyword */.fiNow, if it's guaranteed that there's exactly one word per line,then we can reduce the total number of matches by a half bymerging in the recognition of newlines with that of the othertokens:.nf    %%    asm\\n    |    auto\\n   |    break\\n  |    ... etc ...    volatile\\n |    while\\n  /* it's a keyword */    [a-z]+\\n |    .|\\n     /* it's not a keyword */.fiOne has to be careful here, as we have now reintroduced backing upinto the scanner.  In particular, while.I weknow that there will never be any characters in the input streamother than letters or newlines,.I lexcan't figure this out, and it will plan for possibly needing to back upwhen it has scanned a token like "auto" and then the next characteris something other than a newline or a letter.  Previously it wouldthen just match the "auto" rule and be done, but now it has no "auto"rule, only a "auto\\n" rule.  To eliminate the possibility of backing up,we could either duplicate all rules but without final newlines, or,since we never expect to encounter such an input and therefore don'thow it's classified, we can introduce one more catch-all rule, thisone which doesn't include a newline:.nf    %%    asm\\n    |    auto\\n   |    break\\n  |    ... etc ...    volatile\\n |    while\\n  /* it's a keyword */    [a-z]+\\n |    [a-z]+   |    .|\\n     /* it's not a keyword */.fiCompiled with.B \-Cf,this is about as fast as one can get a.I lex scanner to go for this particular problem..PPA final note:.I lexis slow when matching NUL's, particularly when a token containsmultiple NUL's.It's best to write rules which match.I shortamounts of text if it's anticipated that the text will often include NUL's..SH INCOMPATIBILITIES WITH AT&T LEX AND POSIXBSD.I lexis a rewrite of the AT&T Unix.I lextool (the two implementations do not share any code, though),with some extensions and incompatibilities, both of whichare of concern to those who wish to write scanners acceptableto either implementation.  The POSIX.I lexspecification is closer to BSD.I lex'sbehavior than that of the original AT&T.I leximplementation, but there also remain some incompatibilities between BSD.I lexand POSIX.  The intent is that ultimately BSD.I lexwill be fully POSIX-conformant.  In this section we discuss all ofthe known areas of incompatibility..PPBSD.I lex's.B \-loption turns on maximum compatibility with the original AT&T.I leximplementation, at the cost of a major loss in the generated scanner'sperformance.  We note below which incompatibilities can be overcomeusing the.B \-loption..PPBSD.I lexis fully compatible with AT&T.I lexwith the following exceptions:.IP -The undocumented AT&T.I lexscanner internal variable.B yylinenois not supported unless.B \-lis used..IPyylineno is not part of the POSIX specification..IP -The.B input()routine is not redefinable, though it may be called to read charactersfollowing whatever has been matched by a rule.  If.B input()encounters an end-of-file the normal.B yywrap()processing is done.  A ``real'' end-of-file is returned by.B input()as.I EOF..IPInput is instead controlled by defining the.B YY_INPUTmacro..IPThe BSD.I lexrestriction that.B input()cannot be redefined is in accordance with the POSIX specification,which simply does not specify any way of controlling thescanner's input other than by making an initial assignment to.I yyin..IP -BSD.I lexscanners are not as reentrant as AT&T.I lexscanners.  In particular, if you have an interactive scanner andan interrupt handler which long-jumps out of the scanner, andthe scanner is subsequently called again, you may get the followingmessage:.nf    fatal lex scanner internal error--end of buffer missed.fiTo reenter the scanner, first use.nf    yyrestart( yyin );.fiNote that this call will throw away any buffered input; usually thisisn't a problem with an interactive scanner..IP -.B output()is not supported.Output from the.B ECHOmacro is done to the file-pointer.I yyout(default.I stdout)..IP.B output()is not part of the POSIX specification..IP -AT&T.I lexdoes not support exclusive start conditions (%x), though theyare in the POSIX specification..IP -When definitions are expanded, BSD.I lexencloses them in parentheses.With AT&T lex, the following:.nf    NAME    [A-Z][A-Z0-9]*    %%    foo{NAME}?      printf( "Found it\\n" );    %%.fiwill not match the string "foo" because when the macrois expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"and the precedence is such that the '?' is associated with"[A-Z0-9]*".  With BSD.I lex,the rule will be expanded to"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match..IPN
上一页 1 2 3 45
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -