⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 lex-docs.txt

📁 windowns 环境的lex和yacc编译器工具
💻 TXT
📖 第 1 页 / 共 4 页
字号:
                            {                            if (*p == 'd' || *p == 'D')                                 *p=+ 'e'- 'd';                            ECHO;                            }After the floating point constant is recognized, it is scanned bythe for loop to find the letter d or D.  The program than adds'e'-'d', which converts it to the next letter of the alphabet.The modified constant, now single-precision, is written out again.There follow a series of names which must be respelled to removetheir initial d.  By using the array yytext the same actionsuffices for all the names (only a sample of a rather long list isgiven here).                 {d}{s}{i}{n}         |                 {d}{c}{o}{s}         |                 {d}{s}{q}{r}{t}      |                 {d}{a}{t}{a}{n}      |                 ...                 {d}{f}{l}{o}{a}{t}   printf("%s",yytext+1);Another list of names must have initial d changed to initial a:                  {d}{l}{o}{g}     |                  {d}{l}{o}{g}10   |                  {d}{m}{i}{n}1    |                  {d}{m}{a}{x}1    {                                   yytext[0] =+ 'a' - 'd';                                   ECHO;                                   }And one routine must have initial d changed to initial r:                {d}1{m}{a}{c}{h}   {yytext[0] =+ 'r'  - 'd';To avoid such names as dsinx being detected as instances of dsin,some final rules pick up longer words as identifiers and copy somesurviving characters:                        [A-Za-z][A-Za-z0-9]*   |                        [0-9]+                 |                        \n                     |                        .                      ECHO;Note that this program is not complete; it does not deal with thespacing problems in Fortran or with the use of keywords asidentifiers.10.  Left Context Sensitivity.     Sometimes it is desirable to have several sets of lexicalrules to be applied at different times in the input.  For example,a compiler preprocessor might distinguish preprocessorstatements and analyze them differently from ordinary statements.This requires sensitivity to prior context, and there are severalways of handling such problems.  The ^ operator, for example, isa prior context operator, recognizing immediately preceding leftcontext just as $ recognizes immediately following rightcontext.  Adjacent left context could be extended, to produce afacility similar to that for adjacent right context, but it isunlikely to be as useful, since often the relevant left contextappeared some time earlier, such as at the beginning of a line.     This section describes three means of dealing with differentenvironments: a simple use of flags, when only a few ruleschange from one environment to another, the use of startconditions on rules, and the possibility of making multiplelexical analyzers all run together.  In each case, there are ruleswhich recognize the need to change the environment in which thefollowing input text is analyzed, and set some parameter toreflect the change.  This may be a flag explicitly tested by theuser's action code; such a flag is the simplest way of dealingwith the problem, since Lex is not involved at all.  It may bemore convenient, however, to have Lex remember the flags asinitial conditions on the rules.  Any rule may be associated witha start condition.  It will only be recognized when Lex is in thatstart condition.  The current start condition may be changed atany time.  Finally, if the sets of rules for the differentenvironments are very dissimilar, clarity may be best achieved bywriting several distinct lexical analyzers, and switching from oneto another as desired.     Consider the following problem: copy the input to the output,changing the word magic to first on every line which began withthe letter a, changing magic to second on every line which beganwith the letter b, and changing magic to third on every line whichbegan with the letter c.  All other words and all other lines areleft unchanged.     These rules are so simple that the easiest way to do this jobis with a flag:                         int flag;                 %%                 ^a      {flag = 'a'; ECHO;}                 ^b      {flag = 'b'; ECHO;}                 ^c      {flag = 'c'; ECHO;}                 \n      {flag =  0 ; ECHO;}                 magic   {                         switch (flag)                         {                         case 'a': printf("first"); break;                         case 'b': printf("second"); break;                         case 'c': printf("third"); break;                         default: ECHO; break;                         }                         }should be adequate.     To handle the same problem with start conditions, each startcondition must be introduced to Lex in the definitions sectionwith a line reading                          %Start   name1 name2 ...where the conditions may be named in any order.  The word Startmay be abbreviated to s or S.  The conditions may be referencedat the head of a rule with the <> brackets:                              <name1>expressionis a rule which is only recognized when Lex is in the startcondition name1.  To enter a start condition, execute the actionstatement                                BEGIN name1;which changes the start condition to name1.  To resume the normalstate,                                  BEGIN 0;resets the initial condition of the Lex automaton interpreter.  Arule may be active in several start conditions:                             <name1,name2,name3>is a legal prefix.  Any rule not beginning with the <> prefixoperator is always active.     The same example as before can be written:                     %START AA BB CC                     %%                     ^a                {ECHO; BEGIN AA;}                     ^b                {ECHO; BEGIN BB;}                     ^c                {ECHO; BEGIN CC;}                     \n                {ECHO; BEGIN 0;}                     <AA>magic         printf("first");                     <BB>magic         printf("second");                     <CC>magic         printf("third");where the logic is exactly the same as in the previous method ofhandling the problem, but Lex does the work rather than the user'scode.11.  Character Set.     The programs generated by Lex handle character I/O onlythrough the routines input, output, and unput.  Thus thecharacter representation provided in these routines is accepted byLex and employed to return values in yytext.  For internal use acharacter is represented as a small integer which, if the standardlibrary is used, has a value equal to the integer value of the bitpattern representing the character on the host computer.Normally, the letter a is represented as the same form as thecharacter constant 'a'.  If this interpretation is changed, byproviding I/O routines which translate the characters, Lex must betold about it, by giving a translation table.  This table must bein the definitions section, and must be bracketed by lines con-taining only ``%T''.  The table contains lines of the form                        {integer} {character string}which indicate the value associated with each character.  Thus thenext example                                  %T                                   1    Aa                                   2    Bb                                  ...                                  26    Zz                                  27    \n                                  28    +                                  29    -                                  30    0                                  31    1                                  ...                                  39    9                                  %T                           Sample character table.maps the lower and upper case letters together into the integers 1through 26, newline into 27, + and - into 28 and 29, and thedigits into 30 through 39.  Note the escape for newline.  If atable is supplied, every character that is to appear either in therules or in any valid input must be included in the table.  Nocharacter may be assigned the number 0, and no character may beassigned a bigger number than the size of the hardware characterset.12.  Summary of Source Format.     The general form of a Lex source file is:                             {definitions}                             %%                             {rules}                             %%                             {user subroutines}The definitions section contains a combination of1)   Definitions, in the form ``name space translation''.2)   Included code, in the form ``space code''.3)   Included code, in the form                                       %{                                       code                                       %}4)   Start conditions, given in the form                                %S name1 name2 ...5)   Character set tables, in the form                          %T                          number space character-string                          ...                          %T6)   Changes to internal array sizes, in the form                                     %x  nnn     where nnn is a decimal integer representing an array size and     x selects the parameter as follows:                        Letter          Parameter                          p      positions                          n      states                          e      tree nodes                          a      transitions                          k      packed character classes                          o      output array sizeLines in the rules section have the form ``expression action''where the action may be continued on succeeding lines by usingbraces to delimit it.     Regular expressions in Lex use the following operators:               x        the character "x"               "x"      an "x", even if x is an operator.               \x       an "x", even if x is an operator.               [xy]     the character x or y.               [x-z]    the characters x, y or z.               [^x]     any character but x.               .        any character but newline.               ^x       an x at the beginning of a line.               <y>x     an x when Lex is in start condition y.               x$       an x at the end of a line.               x?       an optional x.               x*       0,1,2, ... instances of x.               x+       1,2,3, ... instances of x.               x|y      an x or a y.               (x)      an x.               x/y      an x but only if followed by y.               {xx}     the translation of xx from the                        definitions section.               x{m,n}   m through n occurrences of x13.  Caveats and Bugs.     There are pathological expressions which produce exponentialgrowth of the tables when converted to deterministic machines;fortunately, they are rare.     REJECT does not rescan the input; instead it remembers theresults of the previous scan.  This means that if a rule withtrailing context is found, and REJECT executed, the user must nothave used unput to change the characters forthcoming from theinput stream.  This is the only restriction on the user's abilityto manipulate the not-yet-processed input.14.  Acknowledgments.     As should be obvious from the above, the outside of Lex ispatterned on Yacc and the inside on Aho's string matchingroutines.  Therefore, both S. C.  Johnson and A. V. Aho are reallyoriginators of much of Lex, as well as debuggers of it.  Manythanks are due to both.     The code of the current version of Lex was designed, written,and debugged by Eric Schmidt.15.  References.1.   B.  W.  Kernighan  and  D.  M.  Ritchie,  The  C  Programming     Language, Prentice-Hall, N. J. (1978).2.   B. W. Kernighan, Ratfor: A Preprocessor for a Rational Fortran,     Software Practice and Experience, 5, pp. 395-496 (1975).3.   S. C. Johnson, Yacc: Yet Another Compiler  Compiler,  Computing     Science Technical Report No. 32, 1975, Bell Laboratories,     Murray Hill, NJ 07974.4.   A. V. Aho and M. J. Corasick,  Efficient  String  Matching:     An  Aid  to Bibliographic Search, Comm. ACM 18, 333-340 (1975).5.   B. W. Kernighan, D. M. Ritchie and K. L. Thompson, QED Text     Editor, Computing  Science  Technical Report No. 5, 1972,      Bell Laboratories, Murray Hill, NJ 07974.6.   D. M. Ritchie, private communication.  See also M. E. Lesk,     The Portable C Library, Computing Science Technical Report      No. 31, Bell Laboratories, Murray Hill, NJ 07974.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -