📄 lex.html

📁 unix 下的C开发手册,还用详细的例程。
💻 HTML
📖 第 1 页 / 共 3 页
字号:
<i>lex</i>substitution string.The format of these lines is:<pre><dl compact><dt> <dd><i>name substitute</i></dl></pre><br>If a<i>name</i>does not meet the requirements for identifiers in the ISO&nbsp;C standard,the result is undefined.The string<i>substitute</i>will replace the string{<i>name</i>}when it is used in a rule.The<i>name</i>string is recognised in this context only when the braces are providedand when it does not appear within a bracket expressionor within double-quotes.<p>In the<i>Definitions</i>section, any line beginning with a"%"(percent sign) character and followed by an alphanumericword beginning with eithersorSdefines a set of start conditions.Any line beginning with a"%"followed by a word beginning with eitherxorXdefines a set of exclusive start conditions.When the generated scanner is in a<b>%s</b>state,patterns with no state specified will be also active;in a<b>%x</b>state, such patterns will not be active.The rest of the line, afterthe first word, is considered to be one or moreblank-character-separatednames of start conditions.Start condition namesare constructed in the same way as definition names.Start conditions can be used to restrictthe matching of regular expressionsto one or more states as describedin<xref href=lexre><a href="#tag_001_014_1092_004">Regular Expressions in lex</a></xref>.<p>Implementations accept either of the following twomutually exclusive declarations in the<i>Definitions</i>section:<dl compact><dt><b>%array</b><dd>Declare the type of<i>yytext</i>to be a null-terminated character array.<dt><b>%pointer</b><dd>Declare the type of<i>yytext</i>to be a pointer to a null-terminated character string.</dl><p>The default type of<i>yytext</i>is implementation-dependent.If an application refers to<i>yytext</i>outside of the scanner source file (that is, via an<b>extern</b>),the application will include the appropriate<b>%array</b>or<b>%pointer</b>declaration in the scanner source file.<p>Implementations will accept declarations in the<i>Definitions</i>section for setting certain internal table sizes.The declarations are shown in the following table.<pre><table  bordercolor=#000000 border=1 align=center><tr valign=top><th align=center><b>Declaration</b><th align=center><b>Description</b><th align=center><b>Minimum Value</b><tr valign=top><td align=left><b>%p </b><i>n</i><td align=left>Number of positions<td align=left>2500<tr valign=top><td align=left><b>%n </b><i>n</i><td align=left>Number of states<td align=left>500<tr valign=top><td align=left><b>%a </b><i>n</i><td align=left>Number of transitions<td align=left>2000<tr valign=top><td align=left><b>%e </b><i>n</i><td align=left>Number of parse tree nodes<td align=left>1000<tr valign=top><td align=left><b>%k </b><i>n</i><td align=left>Number of packed character classes<td align=left>1000<tr valign=top><td align=left><b>%o </b><i>n</i><td align=left>Size of the output array<td align=left>3000</table></pre><h6 align=center><xref table="Table Size Declarations in <I>lex</i>"></xref>Table: Table Size Declarations in <i>lex</i></h6><p>In the table,<i>n</i>represents a positive decimal integer, preceded by one or moreblank characters.The exact meaning of these table size numbers is implementation-dependent.The implementation will document how these numbers affect the<i>lex</i>utility and how they arerelated to any output that may be generated by the implementation shouldspace limitations be encountered during the execution of<i>lex</i>.It is possible to determine from this outputwhich of the table size values needs tobe modified to permit<i>lex</i>to successfully generate tables for the input language.The values in the column Minimum Value represent thelowest values conforming implementations will provide.<h5><a name = "tag_001_014_1092_002">&nbsp;</a>Rules in lex</h5>The rules in<i>lex</i>source files are a table in whichthe left column contains regular expressionsand the right column contains actions (C program fragments)to be executed when the expressions are recognised.<pre><dl compact><dt> <dd><i>ERE actionERE action  ...</i></dl></pre><p>The extended regular expression (<i>ERE</i>)portion of a row will be separated from<i>action</i>by one or more blank characters.A regular expression containing blank charactersis recognised under one of the following conditions:<ul><p><li>The entire expression appears within double-quotes.<p><li>The blank characters appear within double-quotes or square brackets.<p><li>Each blank character is preceded by a backslash character.<p></ul><h5><a name = "tag_001_014_1092_003">&nbsp;</a>User Subroutines in lex</h5>Anything in the user subroutines section will be copied to<b>lex.yy.c</b>following<i>yylex()</i>.<h5><a name = "tag_001_014_1092_004">&nbsp;</a>Regular Expressions in lex</h5><xref type="5" name="lexre"></xref>The<i>lex</i>utility supports the set of extended regular expressions (seethe <b>XBD</b> specification, <a href="../xbd/re.html#tag_007_004"><b>Extended Regular Expressions</b>&nbsp;</a> ),with the following additions and exceptions to the syntax:<dl compact><dt><b>"..."</b><dd>Any string enclosed in double-quotes will represent the characterswithin the double-quotes as themselves, except that backslash escapes(which appear in the following table)are recognised.Any backslash-escape sequence is terminated by the closing quote.For example,"\01""1"represents a single string:the octal value 1 followed by the character 1.<dt>&lt;<i>state</i>&gt;<i>r</i><dd><dt>&lt;<i>state1</i><b>,</b><i>state2</i><b>,</b>...&gt;<i>r</i><dd>The regular expression <i>r</i> will be matched onlywhen the program is in one of the start conditions indicated by<i>state</i>,<i>state1</i>and so on;see<xref href=lexacts><a href="#tag_001_014_1092_005">Actions in lex</a></xref>.(As an exception to the typographical conventionsof the rest of this specification, in this case&lt;<i>state</i>&gt; does not represent a metavariable,but the literal angle-bracket characters surrounding a symbol.)The start condition is recognised as such onlyat the beginning of a regular expression.<dt><i>r</i>/<i>x</i><dd>The regular expression <i>r</i> will be matched only ifit is followed by an occurrence of regular expression <i>x</i>.The token returned in<i>yytext</i>will only match <i>r</i>.If the trailing portion of <i>r</i> matches thebeginning of <i>x</i>, the result is unspecified.The<i>r</i>expression cannot include further trailing context or the "$"(match-end-of-line) operator;<i>x</i>cannot include the "^"(match-beginning-of-line) operator, nor trailing context,nor the "$" operator.That is, only one occurrence oftrailing context is allowed in a<i>lex</i>regular expression, and the "^"operator only can be used at the beginning of such an expression.<dt><b>{</b><i>name</i><b>}</b><dd>When<i>name</i>is one of the substitution symbols from the <i>Definitions</i> section,the string, including the enclosing braces, will be replaced by the<i>substitute</i>value.The<i>substitute</i>value will be treated in the extended regular expressionas if it were enclosed in parentheses.No substitution will occur if <b>{</b><i>name</i><b>}</b> occurswithin a bracket expressionor within double-quotes.</dl><p>Within an ERE, a backslashcharacter is considered to beginan escape sequence as specified in the table inthe <b>XBD</b> specification, <a href="../xbd/notation.html"><b>File Format Notation</b>&nbsp;</a> (\\,\a,\b,\f,\n,\r,\t,\v).In addition, the escape sequences in the following tablewill be recognised.<p>A literalnewlinecharacter cannot occur within an ERE;the escape sequence\ncan be used to represent anewline character.Anewline charactercannot be matched by a period operator.<pre><table  bordercolor=#000000 border=1 align=center><tr valign=top><th align=center><b>Escape<br>Sequence</b><th align=center><b>Description</b><th align=center><b>Meaning</b><tr valign=top><td align=left>\<i>digits</i><td align=left> A backslash character followed by the longest sequence of one, two or three octal-digit characters (01234567). If all of the digits are 0, (that is, representation of the NUL character), the behaviour is undefined. <td align=left> The character whose encoding is represented by the one-, two- or three-digit octal integer. If the size of a byte on the system is greater than nine bits, the valid escape sequence used to represent a byte is implementation-dependent. Multi-byte characters require multiple, concatenated escape sequences of this type, including the leading \ for each byte. <tr valign=top><td align=left>\<b>x</b><i>digits</i><b><td align=left> A backslash character followed by the longest sequence of hexadecimal-digit characters (01234567abcdefABCDEF). If all of the digits are 0, (that is, representation of the NUL character), the behaviour is undefined. <td align=left> The character whose encoding is represented by the hexadecimal integer. <tr valign=top><td align=left>\</b><i>c</i><b><td align=left> A backslash character followed by any character not described in this table or in the table in the <b>XBD</b> specification, <a href="../xbd/notation.html"><b>File Format Notation</b>&nbsp;</a>  ( \\, \a, \b, \f, \n, \t, \v).<td align=left></b>The character <i>c</i>, unchanged.</table></pre><h6 align=center><xref table="Escape Sequences in <I>lex</i>"></xref>Table: Escape Sequences in <i>lex</i></h6><p>The order of precedence given to extended regular expressions for<i>lex</i>differs from that specified inthe <b>XBD</b> specification, <a href="../xbd/re.html#tag_007_004"><b>Extended Regular Expressions</b>&nbsp;</a> .The order of precedence for<i>lex</i>is as shown in the following table,from high to low.<dl><dt><b>Note:</b><dd>The escaped characters entryis not meant toimply that these are operators, but they areincluded in the table to show their relationshipsto the true operators.The start condition, trailing contextand anchoring notations have been omittedfrom the table because of the placement restrictionsdescribed in this section; they can only appearat the beginning or ending of an ERE.</dl><pre><table  bordercolor=#000000 border=1 align=center><tr valign=top><th align=center><b>Extended Regular Expression</b><th align=center><b>Precedence</b><tr valign=top><td align=left>collation-related bracket symbols<td align=left>[= =]  [: :]  [. .]<tr valign=top><td align=left>escaped characters<td align=left>\&lt;<i>special character</i>&gt;<tr valign=top><td align=left>bracket expression<td align=left>[ ]<tr valign=top><td align=left>quoting<td align=left>"..."<tr valign=top><td align=left>grouping<td align=left>( )<tr valign=top><td align=left>definition<td align=left>{<i>name</i>}<tr valign=top><td align=left>single-character RE duplication<td align=left>* + ?<tr valign=top><td align=left>concatenation<td align=left>&nbsp;<tr valign=top><td align=left>interval expression<td align=left>{<i>m</i>,<i>n</i>}<tr valign=top><td align=left>alternation<td align=left>|</table></pre><h6 align=center><xref table="ERE Precedence in <I>lex</i>"></xref>Table: ERE Precedence in <i>lex</i></h6><p>The ERE anchoring operators "^" and "$") do not appear in the table.With<i>lex</i>regular expressions, these operators are restricted in their use:the "^" operator can only be used at thebeginning of an entire regular expression, and the "$"operator only at the end.The operators apply to the entire regular expression.Thus, for example, the pattern(^abc)|(def$)is undefined;it can instead be written as twoseparate rules, one with the regular expression^abcand one withdef$,which share a common action via the special "|"action (see below).If the pattern were written^abc|def$,it would match either<b>abc</b>or<b>def</b>on a line by itself.<p>Unlike the general ERE rules, embedded anchoring is not allowedby most historical<i>lex</i>implementations.An example ofembedded anchoring would be for patterns such as(^|&nbsp;)foo(&nbsp;|$)to match<b>foo</b>when it exists as a complete word.This functionality can be obtained using existing<i>lex</i>features:<pre><code>^foo/[ \n]      |" foo"/[ \n]    /* found foo as a separate word */</code></pre><p>Note also that "$"is a form of trailing context (it is equivalent to/\n)and as such cannot be usedwith regular expressions containing another instance of the operator (seethe preceding discussion of trailing context).<p>The additional regular expressions trailing-context operator "/"can be used as an ordinary characterif presented within double-quotes, "/"; preceded by a backslash,\/; or within a bracket expression, [/].The start-condition "&lt;" and "&gt;"operators are special only in a start conditionat the beginning of a regular expression;elsewhere in the regular expression theyare treated as ordinary characters.<p>The following examples clarify the differences between<i>lex</i>regular expressions and regular expressions appearing elsewhere inthis specification.For regular expressions of the form<i>r</i>/<i>x</i>,the string matching <i>r</i>
💿 文件大小 3156 K
👤 上传用户 peterzhang1982
📂 所属分类通讯编程文档
🏷️ 相关标签

#unix
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -