📄 lex.html
字号:
is always returned;confusion may arise when the beginning of <i>x</i>matches the trailing portion of <i>r</i>.For example, given the regular expressiona*b/ccand the input<b>aaabcc</b>,<i>yytext</i>would contain the string<b>aaab</b>on this match.But given the regular expressionx*/xyand the input<b>xxxy</b>,the token<b>xxx</b>,not<b>xx</b>,is returnedby some implementations because<b>xxx</b>matchesx*.<p>In the ruleab*/bc,theb*at the end of<i>r</i>will extend<i>r</i>'smatch into the beginning of the trailingcontext, so the result is unspecified.If this rule wereab/bc,however,the rulematches the text<b>ab</b>when it is followed by the text<b>bc</b>.In this latter case, the matching of<i>r</i>cannot extend intothe beginning of<i>x</i>,so the result is specified.<h5><a name = "tag_001_014_1092_005"> </a>Actions in lex</h5><xref type="5" name="lexacts"></xref>The action to be taken when an<i>ERE</i>is matched can be a C program fragmentor the special actions described below;the program fragment can containone or more C statements, and can also includespecial actions.The empty C statement ";" is a valid action; any string in the<b>lex.yy.c</b>input that matches the pattern portion of such a ruleis effectively ignored or skipped.However, the absence of an action is not valid,and the action<i>lex</i>takes in such a condition is undefined.<p>The specification for an action,including C statements and specialactions, can extend across several lines if enclosed in braces:<pre><code><i>ERE <one or more blanks></i> { <i>xprogram statement'|\nxu'program statement</i> }</code></pre><p>The default action when a string in the input to a<b>lex.yy.c</b>program is not matched by any expressionis to copy the string to the output.Because the default behaviour of a program generated by<i>lex</i>is to read the input and copy it to the output, a minimal<i>lex</i>source program that has just<b>%%</b>generates a Cprogram that simply copies the input to the output unchanged.<p>Four special actions are available:<pre><code>| ECHO; REJECT; BEGIN</code></pre><dl compact><dt>|<dd>The action "|"means that the action for the next rule is the action for this rule.Unlike the other three actions, "|"cannot be enclosed in braces or be semicolon-terminated;it must be specified alone, with no other actions.<dt>ECHO;<dd>Write the contents of the string<i>yytext</i>on the output.<dt>REJECT;<dd>Usually only a single expressionis matched by a given string in the input.<b>REJECT</b>means "continue to the nextexpression that matches the currentinput", and causes whatever rule was thesecond choice after the currentrule to be executed for the same input.Thus, multiple rules can be matched and executed for oneinput string or overlapping input strings.For example, given the regular expressions<b>xyz</b>and<b>xy</b>and the input<b>xyz</b>,usually only the regular expression<b>xyz</b>would match.The next attempted match would start afterz.If the last action in the<b>xyz</b>rule is<b>REJECT</b>,both this rule and the<b>xy</b>rule would be executed.The<b>REJECT</b>action may be implemented in such a fashion that flow of control does notcontinue after it, as if it were equivalent to a<b>goto</b>to another part of<i>yylex()</i>.The use of<b>REJECT</b>may result in somewhatlarger and slower scanners.<dt>BEGIN<dd>The action:<pre><code>BEGIN <i>newstate</i>;</code></pre>switches the state (start condition) to<i>newstate .</i>If the string<i>newstate</i>has not been declared previouslyas a start condition in the<i>Definitions</i>section, the results are unspecified.The initial state is indicated by the digit0or the token<b>INITIAL</b>.</dl><p>The functions or macros described beloware accessible to user codeincluded in the<i>lex</i>input.It is unspecified whether they appear inthe C code output of<i>lex</i>,or are accessible only through the<b>-l l</b>operand to<i><a href="c89.html">c89</a></i>or<i><a href="cc.html">cc</a></i>(the<i>lex</i>library).<dl compact><dt>int yylex(void)<dd>Performs lexical analysis on the input;this is the primary function generated by the<b>lex</b>utility.The function returns zero when the end of input is reached;otherwise it returns non-zero values(tokens) determined by the actions that are selected.<dt>int yymore(void)<dd>When called, indicates thatwhen the next input string is recognised,it is to be appended to the current value of<i>yytext</i>rather than replacing it; the value in<i>yyleng</i>is adjusted accordingly.<dt>int yyless(int <i>n</i>)<dd>Retains <i>n</i> initial characters in<i>yytext</i>,NUL-terminated,and treats the remainingcharacters as if they had not been read; the value in<i>yyleng</i> is adjusted accordingly.<dt>int input(void)<dd>Returns the next character from the input,or zero on end-of-file.It obtains input from the stream pointer<i>yyin</i>,although possibly via an intermediate buffer.Thus, once scanning has begun,the effect of altering the value of<i>yyin</i>is undefined.The character read is removed from the input streamof the scanner without any processing by the scanner.<dt>int unput(int <i>c</i><dd>Returns the character <i>c</i> to the input;<i>yytext</i>and<i>yyleng</i>are undefined until the next expression is matched.The result of using <i>unput</i> for more characters than havebeen input is unspecified.</dl><p>The following functions appear only in the<i>lex</i>library accessiblethrough the<b>-l l</b>operand; they can therefore be redefined by aportable application:<dl compact><dt>int yywrap(void)<dd>Called by<i>yylex()</i>at end-of-file; the default<i>yywrap()</i>always will return 1.If the application requires<i>yylex()</i>to continue processing with another source of input,then the application can include a function<i>yywrap()</i>,which associates another file with theexternal variable<b>FILE</b>*<i>yyin</i> and will return a value of zero.<dt>int main(int <i>argc</i>, char *<i>argv</i>[])<dd>Calls<i>yylex()</i>to perform lexical analysis, then exits.The user code can contain<i>main()</i>to perform application-specific operations, calling<i>yylex()</i>as applicable.</dl><p>The reason for breaking these functions into two listsis that only those functions in<b>libl.a</b>can be reliably redefined by a portable application.<p>Except for<i>input()</i>,<i>unput()</i>and<i>main()</i>,all external and static names generated by<i>lex</i>begin with the prefix<b>yy</b>or<b>YY</b>.</blockquote><h4><a name = "tag_001_014_1093"> </a>EXIT STATUS</h4><blockquote>The following exit values are returned:<dl compact><dt>0<dd>Successful completion.<dt>>0<dd>An error occurred.</dl></blockquote><h4><a name = "tag_001_014_1094"> </a>CONSEQUENCES OF ERRORS</h4><blockquote>Default.</blockquote><h4><a name = "tag_001_014_1095"> </a>APPLICATION USAGE</h4><blockquote>Portable applications are warned that in the<i>Rules</i>section, an<i>ERE</i>without an action is not acceptable, but need not be detected aserroneous by<i>lex</i>.This may result in compilation or run-time errors.<p>The purpose of<i>input()</i>is to takecharacters off the input stream and discard them as far as the lexicalanalysis is concerned.A common use is to discard the body of a commentonce the beginning of a comment is recognised.<p>The<i>lex</i>utility is not fullyinternationalised in its treatment of regular expressions in the<i>lex</i>source code or generated lexical analyser.It would seem desirableto have the lexical analyser interpret the regular expressions given in the<i>lex</i>source according to the environmentspecified when the lexical analyser is executed, but this is notpossible with the current<i>lex</i>technology.Furthermore, the very nature of the lexical analysers produced by<i>lex</i>must be closely tied to the lexical requirements of the input languagebeing described, which will frequently be locale-specific anyway.(For example, writing an analyser that is used for French text will notautomatically be useful for processing other languages.)</blockquote><h4><a name = "tag_001_014_1096"> </a>EXAMPLES</h4><blockquote>The following is an example of a<i>lex</i>program that implements a rudimentary scanner for a Pascal-likesyntax:<pre><code>%{/* need this for the call to atof() below */#include <math.h>/* need this for printf(), fopen() and stdin below */#include <stdio.h>%}DIGIT [0-9]ID [a-z][a-z0-9]*%%{DIGIT}+ { printf("An integer: %s (%d)\n", yytext, atoi(yytext)); }{DIGIT}+"."{DIGIT}* { printf("A float: %s (%g)\n", yytext, atof(yytext)); }if|then|begin|end|procedure|function { printf("A keyword: %s\n", yytext); }{ID} printf("An identifier: %s\n", yytext);"+"|"-"|"*"|"/" printf("An operator: %s\n", yytext);"{"[^}\n]*"}" /* eat up one-line comments */[ \t\n]+ /* eat up white space */. printf("Unrecognised character: %s\n", yytext);%%int main(int argc, char *argv[]){ ++argv, --argc; /* skip over program name */ if (argc > 0) yyin = fopen(argv[0], "r"); else yyin = stdin; yylex();}</code></pre></blockquote><h4><a name = "tag_001_014_1097"> </a>FUTURE DIRECTIONS</h4><blockquote>None.</blockquote><h4><a name = "tag_001_014_1098"> </a>SEE ALSO</h4><blockquote><i><a href="c89.html">c89</a></i>,<i><a href="yacc.html">yacc</a></i>.</blockquote><hr size=2 noshade><center><font size=2>UNIX ® is a registered Trademark of The Open Group.<br>Copyright © 1997 The Open Group<br> [ <a href="../index.html">Main Index</a> | <a href="../xshix.html">XSH</a> | <a href="../xcuix.html">XCU</a> | <a href="../xbdix.html">XBD</a> | <a href="../cursesix.html">XCURSES</a> | <a href="../xnsix.html">XNS</a> ]</font></center><hr size=2 noshade></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -