⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 lex-docs.txt

📁 windowns 环境的lex和yacc编译器工具
💻 TXT
📖 第 1 页 / 共 4 页
字号:
characters which are not both upper case letters, both lower caseletters, or both digits is implementation dependent and will get awarning message.  (E.g., [0-z] in ASCII is many more charactersthan it is in EBCDIC).  If it is desired to include the character- in a character class, it should be first or last; thus                                   [-+0-9]matches all the digits and the two signs.     In character classes, the ^ operator must appear as the firstcharacter after the left bracket; it indicates that the resultingstring is to be complemented with respect to the computercharacter set.  Thus                                   [^abc]matches all characters except a, b, or c, including all special orcontrol characters; or                                  [^a-zA-Z]is any character which is not a letter.  The \ character providesthe usual escapes within character class brackets.     Arbitrary character.  To match almost any character, theoperator character                                      .is the class of all characters except newline.  Escaping intooctal is possible although non-portable:                                 [\40-\176]matches all printable characters in the ASCII character set, fromoctal 40 (blank) to octal 176 (tilde).     Optional expressions.  The operator ?  indicates an optionalelement of an expression.  Thus                                    ab?cmatches either ac or abc.     Repeated expressions.  Repetitions of classes are indicatedby the operators * and +.                                     a*is any number of consecutive a characters, including zero; while                                     a+is one or more instances of a.  For example,                                   [a-z]+is all strings of lower case letters.  And                            [A-Za-z][A-Za-z0-9]*indicates all alphanumeric strings with a leading alphabeticcharacter.  This is a typical expression for recognizingidentifiers in computer languages.     Alternation and Grouping.  The operator | indicatesalternation:                                   (ab|cd)matches either ab or cd.  Note that parentheses are used forgrouping, although they are not necessary on the outside level;                                    ab|cdwould have sufficed.  Parentheses can be used for more complexexpressions:                               (ab|cd+)?(ef)*matches such strings as abefef, efefef, cdef, or cddd; but notabc, abcd, or abcdef.     Context sensitivity.  Lex will recognize a small amount ofsurrounding context.  The two simplest operators for this are ^and $.  If the first character of an expression is ^, theexpression will only be matched at the beginning of a line (aftera newline character, or at the beginning of the input stream).This can never conflict with the other meaning of ^, comple-mentation of character classes, since that only applies within the[] operators.  If the very last character is $, the expressionwill only be matched at the end of a line (when immediatelyfollowed by newline).  The latter operator is a special case ofthe / operator character, which indicates trailing context.  Theexpression                                    ab/cdmatches the string ab, but only if followed by cd.  Thus                                     ab$is the same as                                    ab/\nLeft context is handled in Lex by start conditions as explained insection 10.  If a rule is only to be executed when the Lexautomaton interpreter is in start condition x, the rule should beprefixed by                                     <x>using the angle bracket operator characters.  If we considered``being at the beginning of a line'' to be start condition ONE,then the ^ operator would be equivalent to                                    <ONE>Start conditions are explained more fully later.     Repetitions and Definitions.  The operators {} specify eitherrepetitions (if they enclose numbers) or definition expansion(if they enclose a name).  For example                                   {digit}looks for a predefined string named digit and inserts it at thatpoint in the expression.  The definitions are given in the firstpart of the Lex input, before the rules.  In contrast,                                   a{1,5}looks for 1 to 5 occurrences of a.     Finally, initial % is special, being the separator for Lexsource segments.4.  Lex Actions.     When an expression written as above is matched, Lex executesthe corresponding action.  This section describes some features ofLex which aid in writing actions.  Note that there is a defaultaction, which consists of copying the input to the output.  Thisis performed on all strings not otherwise matched.  Thus the Lexuser who wishes to absorb the entire input, without producing anyoutput, must provide rules to match everything.  When Lex is beingused with Yacc, this is the normal situation.  One may considerthat actions are what is done instead of copying the input to theoutput; thus, in general, a rule which merely copies can beomitted.  Also, a character combination which is omitted fromthe rules and which appears as input is likely to be printed onthe output, thus calling attention to the gap in the rules.     One of the simplest things that can be done is to ignore theinput.  Specifying a C null statement, ; as an action causes thisresult.  A frequent rule is                                 [ \t\n]   ;which causes the three spacing characters (blank, tab, andnewline) to be ignored.     Another easy way to avoid writing actions is the actioncharacter |, which indicates that the action for this rule is theaction for the next rule.  The previous example could also havebeen written                                   " "                                   "\t"                                   "\n"with the same result, although in different style.  The quotesaround \n and \t are not required.     In more complex actions, the user will often want to know theactual text that matched some expression like [a-z]+.  Lex leavesthis text in an external character array named yytext.  Thus, toprint the name found, a rule like                       [a-z]+   printf("%s", yytext);will print the string in yytext.  The C function printf accepts aformat argument and data to be printed; in this case, the formatis ``print string'' (% indicating data conversion, and sindicating string type), and the data are the characters inyytext.  So this just places the matched string on the output.This action is so common that it may be written as ECHO:                               [a-z]+   ECHO;is the same as the above.  Since the default action is just toprint the characters found, one might ask why give a rule, likethis one, which merely specifies the default action?  Such rulesare often required to avoid matching some other rule which isnot desired.  For example, if there is a rule which matches readit will normally match the instances of read contained in bread orreadjust; to avoid this, a rule of the form [a-z]+ is needed.This is explained further below.     Sometimes it is more convenient to know the end of what hasbeen found; hence Lex also provides a count yyleng of the numberof characters matched.  To count both the number of words and thenumber of characters in words in the input, the user might write                   [a-zA-Z]+   {words++; chars += yyleng;}which accumulates in chars the number of characters in the wordsrecognized.  The last character in the string matched can beaccessed by                              yytext[yyleng-1]     Occasionally, a Lex action may decide that a rule has notrecognized the correct span of characters.  Two routines areprovided to aid with this situation.  First, yymore() can becalled to indicate that the next input expression recognized is tobe tacked on to the end of this input.  Normally, the next inputstring would overwrite the current entry in yytext.  Second,yyless (n) may be called to indicate that not all the charactersmatched by the currently successful expression are wanted rightnow.  The argument n indicates the number of characters in yytextto be retained.  Further characters previously matched arereturned to the input.  This provides the same sort of lookaheadoffered by the / operator, but in a different form.    Example: Consider a language which defines a string as a setof characters between quotation (") marks, and provides that toinclude a " in a string it must be preceded by a \.  The regularexpression which matches that is somewhat confusing, so that itmight be preferable to write                  \"[^"]*   {                            if (yytext[yyleng-1] == '\\')                                 yymore();                            else                                 ... normal user processing                            }which will, when faced with a string such as "abc\"def" firstmatch the five characters "abc\; then the call to yymore() willcause the next part of the string, "def, to be tacked on the end.Note that the final quote terminating the string should be pickedup in the code labeled ``normal processing''.     The function yyless() might be used to reprocess text invarious circumstances.  Consider the C problem of distinguishingthe ambiguity of ``=-a''.  Suppose it is desired to treat this as``=- a'' but print a message.  A rule might be                 =-[a-zA-Z]   {                              printf("Op (=-) ambiguous\n");                              yyless(yyleng-1);                              ... action for =- ...                              }which prints a message, returns the letter after the operator tothe input stream, and treats the operator as ``=-''.Alternatively it might be desired to treat this as ``= -a''.  Todo this, just return the minus sign as well as the letter to theinput:                 =-[a-zA-Z]   {                              printf("Op (=-) ambiguous\n");                              yyless(yyleng-2);                              ... action for = ...                              }will perform the other interpretation.  Note that the expressionsfor the two cases might more easily be written                               =-/[A-Za-z]in the first case and                                 =/-[A-Za-z]in the second; no backup would be required in the rule action.  Itis not necessary to recognize the whole identifier to observe theambiguity.  The possibility of ``=-3'', however, makes                                 =-/[^ \t\n]a still better rule.     In addition to these routines, Lex also permits access to theI/O routines it uses.  They are:1)   input() which returns the next input character;2)   output(c) which writes the character c on the output; and3) unput(c) pushes the character c back onto the input stream to   be read later by input().By default these routines are provided as macro definitions, butthe user can override them and supply private versions.  Theseroutines define the relationship between external files andinternal characters, and must all be retained or modifiedconsistently.  They may be redefined, to cause input or output tobe transmitted to or from strange places, including other programsor internal memory; but the character set used must be consistentin all routines; a value of zero returned by input must mean endof file; and the relationship between unput and input must beretained or the Lex lookahead will not work.  Lex does not lookahead at all if it does not have to, but every rule ending in + *?  or $ or containing / implies lookahead.  Lookahead is alsonecessary to match an expression that is a prefix of anotherexpression.  See below for a discussion of the character set usedby Lex.  The standard Lex library imposes a 100 character limit onbackup.     Another Lex library routine that the user will sometimes wantto redefine is yywrap() which is called whenever Lex reaches anend-of-file.  If yywrap returns a 1, Lex continues with the normalwrapup on end of input.  Sometimes, however, it is convenient toarrange for more input to arrive from a new source.  In this case,the user should provide a yywrap which arranges for new input andreturns 0.  This instructs Lex to continue processing.  Thedefault yywrap always returns 1.     This routine is also a convenient place to print tables,summaries, etc.  at the end of a program.  Note that it is notpossible to write a normal rule which recognizes end-of-file; theonly access to this condition is through yywrap.  In fact, unlessa private version of input() is supplied a file containing nullscannot be handled, since a value of 0 returned by input is takento be end-of-file.5.  Ambiguous Source Rules.     Lex can handle ambiguous specifications.  When more than oneexpression can match the current input, Lex chooses as follows:1)   The longest match is preferred.2)   Among rules which matched the same number of characters, the     rule given first is preferred.Thus, suppose the rules                      integer   keyword action ...;                      [a-z]+    identifier action ...;to be given in that order.  If the input is integers, it is takenas an identifier, because [a-z]+ matches 8 characters whileinteger matches only 7.  If the input is integer, both rules match7 characters, and the keyword rule is selected because it wasgiven first.  Anything shorter (e.g. int) will not match theexpression integer and so the identifier interpretation is used.     The principle of preferring the longest match makes rulescontaining expressions like .* dangerous.  For example,                                    '.*'

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -