📄 lex.html
字号:
<tr valign="top"><td align="left"><p class="tent">%<b>e</b> <i>n</i></p></td><td align="left"><p class="tent">Number of parse tree nodes</p></td><td align="left"><p class="tent">1000</p></td></tr><tr valign="top"><td align="left"><p class="tent">%<b>k</b> <i>n</i></p></td><td align="left"><p class="tent">Number of packed character classes</p></td><td align="left"><p class="tent">1000</p></td></tr><tr valign="top"><td align="left"><p class="tent">%<b>o</b> <i>n</i></p></td><td align="left"><p class="tent">Size of the output array</p></td><td align="left"><p class="tent">3000</p></td></tr></table></center><p>In the table, <i>n</i> represents a positive decimal integer, preceded by one or more <blank>s. The exact meaning of thesetable size numbers is implementation-defined. The implementation shall document how these numbers affect the <i>lex</i> utility andhow they are related to any output that may be generated by the implementation should limitations be encountered during theexecution of <i>lex</i>. It shall be possible to determine from this output which of the table size values needs to be modified topermit <i>lex</i> to successfully generate tables for the input language. The values in the column Minimum Value represent thelowest values conforming implementations shall provide.</p><h5><a name="tag_04_73_13_02"></a>Rules in lex</h5><p>The rules in <i>lex</i> source files are a table in which the left column contains regular expressions and the right columncontains actions (C program fragments) to be executed when the expressions are recognized.</p><pre><i>ERE actionERE action</i><tt>...</tt></pre><p>The extended regular expression (ERE) portion of a row shall be separated from <i>action</i> by one or more <blank>s. Aregular expression containing <blank>s shall be recognized under one of the following conditions:</p><ul><li><p>The entire expression appears within double-quotes.</p></li><li><p>The <blank>s appear within double-quotes or square brackets.</p></li><li><p>Each <blank> is preceded by a backslash character.</p></li></ul><h5><a name="tag_04_73_13_03"></a>User Subroutines in lex</h5><p>Anything in the user subroutines section shall be copied to <b>lex.yy.c</b> following <i>yylex</i>().</p><h5><a name="tag_04_73_13_04"></a>Regular Expressions in lex</h5><p>The <i>lex</i> utility shall support the set of extended regular expressions (see the Base Definitions volume ofIEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended Regular Expressions</a>),with the following additions and exceptions to the syntax:</p><dl compact><dt><tt>"..."</tt></dt><dd>Any string enclosed in double-quotes shall represent the characters within the double-quotes as themselves, except thatbackslash escapes (which appear in the following table) shall be recognized. Any backslash-escape sequence shall be terminated bythe closing quote. For example, <tt>"\01"</tt> <tt>"1"</tt> represents a single string: the octal value 1 followed by the character<tt>'1'</tt> .</dd><dt><<i>state</i>><i>r</i>, <<i>state1,state2,</i>...><i>r</i></dt><dd><br>The regular expression <i>r</i> shall be matched only when the program is in one of the start conditions indicated by <i>state</i>,<i>state1</i>, and so on; see <a href="#tag_04_73_13_05">Actions in lex</a> . (As an exception to the typographical conventions ofthe rest of this volume of IEEE Std 1003.1-2001, in this case <<i>state</i>> does not represent a metavariable, butthe literal angle-bracket characters surrounding a symbol.) The start condition shall be recognized as such only at the beginningof a regular expression.</dd><dt><i>r</i>/<i>x</i></dt><dd>The regular expression <i>r</i> shall be matched only if it is followed by an occurrence of regular expression <i>x</i> (<i>x</i> is the instance of trailing context, further defined below). The token returned in <i>yytext</i> shall only match<i>r</i>. If the trailing portion of <i>r</i> matches the beginning of <i>x</i>, the result is unspecified. The <i>r</i> expressioncannot include further trailing context or the <tt>'$'</tt> (match-end-of-line) operator; <i>x</i> cannot include the <tt>'^'</tt>(match-beginning-of-line) operator, nor trailing context, nor the <tt>'$'</tt> operator. That is, only one occurrence of trailingcontext is allowed in a <i>lex</i> regular expression, and the <tt>'^'</tt> operator only can be used at the beginning of such anexpression.</dd><dt>{<i>name</i>}</dt><dd>When <i>name</i> is one of the substitution symbols from the <i>Definitions</i> section, the string, including the enclosingbraces, shall be replaced by the <i>substitute</i> value. The <i>substitute</i> value shall be treated in the extended regularexpression as if it were enclosed in parentheses. No substitution shall occur if { <i>name</i>} occurs within a bracket expressionor within double-quotes.</dd></dl><p>Within an ERE, a backslash character shall be considered to begin an escape sequence as specified in the table in the BaseDefinitions volume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> (<tt>'\\'</tt> , <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ). Inaddition, the escape sequences in the following table shall be recognized.</p><p>A literal <newline> cannot occur within an ERE; the escape sequence <tt>'\n'</tt> can be used to represent a<newline>. A <newline> shall not be matched by a period operator.<br></p><center><b>Table: Escape Sequences in <i>lex</i></b></center><center><table border="1" cellpadding="3" align="center"><tr valign="top"><th align="center"><p class="tent"><b>Escape</b></p></th><th align="center"><p class="tent"><b> </b></p></th><th align="center"><p class="tent"><b> </b></p></th></tr><tr valign="top"><th align="center"><p class="tent"><b>Sequence</b></p></th><th align="center"><p class="tent"><b>Description</b></p></th><th align="center"><p class="tent"><b>Meaning</b></p></th></tr><tr valign="top"><td align="left"><p class="tent">\<i>digits</i></p></td><td align="left"><p class="tent">A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). Ifall of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.</p></td><td align="left"><p class="tent">The character whose encoding is represented by the one, two, or three-digit octal integer. If the size of a byte onthe system is greater than nine bits, the valid escape sequence used to represent a byte is implementation-defined. Multi-bytecharacters require multiple, concatenated escape sequences of this type, including the leading <tt>'\'</tt> for each byte.</p></td></tr><tr valign="top"><td align="left"><p class="tent">\x<i>digits</i></p></td><td align="left"><p class="tent">A backslash character followed by the longest sequence of hexadecimal-digit characters (01234567abcdefABCDEF). Ifall of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.</p></td><td align="left"><p class="tent">The character whose encoding is represented by the hexadecimal integer.</p></td></tr><tr valign="top"><td align="left"><p class="tent">\c</p></td><td align="left"><p class="tent">A backslash character followed by any character not described in this table or in the table in the Base Definitionsvolume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap05.html">Chapter 5, File Format Notation</a> ( <tt>'\\'</tt>, <tt>'\a'</tt> , <tt>'\b'</tt> , <tt>'\f'</tt> , <tt>'\n'</tt> , <tt>'\r'</tt> , <tt>'\t'</tt> , <tt>'\v'</tt> ).</p></td><td align="left"><p class="tent">The character <tt>'c'</tt> , unchanged.</p></td></tr></table></center><basefont size="2"> <dl><dt><b>Note:</b></dt><dd>If a <tt>'\x'</tt> sequence needs to be immediately followed by a hexadecimal digit character, a sequence such as<tt>"\x1"</tt> <tt>"1"</tt> can be used, which represents a character containing the value 1, followed by the character<tt>'1'</tt> .</dd></dl><basefont size="3"> <p>The order of precedence given to extended regular expressions for <i>lex</i> differs from that specified in the Base Definitionsvolume of IEEE Std 1003.1-2001, <a href="../basedefs/xbd_chap09.html#tag_09_04">Section 9.4, Extended RegularExpressions</a>. The order of precedence for <i>lex</i> shall be as shown in the following table, from high to low. <basefont size="2"></p><dl><dt><b>Note:</b></dt><dd>The escaped characters entry is not meant to imply that these are operators, but they are included in the table to show theirrelationships to the true operators. The start condition, trailing context, and anchoring notations have been omitted from thetable because of the placement restrictions described in this section; they can only appear at the beginning or ending of anERE.</dd></dl><basefont size="3"><br><center><b>Table: ERE Precedence in <i>lex</i></b></center><center><table border="1" cellpadding="3" align="center"><tr valign="top"><th align="center"><p class="tent"><b>Extended Regular Expression</b></p></th><th align="center"><p class="tent"><b>Precedence</b></p></th></tr><tr valign="top"><td align="left"><p class="tent">collation-related bracket symbols</p></td><td align="left"><p class="tent">[= =] [: :] [. .]</p></td></tr><tr valign="top"><td align="left"><p class="tent">escaped characters</p></td><td align="left"><p class="tent">\<<i>special character</i>></p></td></tr><tr valign="top"><td align="left"><p class="tent">bracket expression</p></td><td align="left"><p class="tent">[ ]</p></td></tr><tr valign="top"><td align="left"><p class="tent">quoting</p></td><td align="left"><p class="tent">"..."</p></td></tr><tr valign="top"><td align="left"><p class="tent">grouping</p></td><td align="left"><p class="tent">( )</p></td></tr><tr valign="top"><td align="left"><p class="tent">definition</p></td><td align="left"><p class="tent">{<i>name</i>}</p></td></tr><tr valign="top"><td align="left"><p class="tent">single-character RE duplication</p></td><td align="left"><p class="tent">* + ?</p></td></tr><tr valign="top"><td align="left"><p class="tent">concatenation</p></td><td align="left"><p class="tent"> </p></td></tr><tr valign="top"><td align="left"><p class="tent">interval expression</p></td><td align="left"><p class="tent">{m,n}</p></td></tr><tr valign="top"><td align="left"><p class="tent">alternation</p></td><td align="left"><p class="tent">|</p></td></tr></table></center><p>The ERE anchoring operators <tt>'^'</tt> and <tt>'$'</tt> do not appear in the table. With <i>lex</i> regular expressions, theseoperators are restricted in their use: the <tt>'^'</tt> operator can only be used at the beginning of an entire regular expression,and the <tt>'$'</tt> operator only at the end. The operators apply to the entire regular expression. Thus, for example, the pattern<tt>"(^abc)|(def$)"</tt> is undefined; it can instead be written as two separate rules, one with the regular expression<tt>"^abc"</tt> and one with <tt>"def$"</tt> , which share a common action via the special <tt>'|'</tt> action (see below). If thepattern were written <tt>"^abc|def$"</tt> , it would match either <tt>"abc"</tt> or <tt>"def"</tt> on a line by itself.</p><p>Unlike the general ERE rules, embedded anchoring is not allowed by most historical <i>lex</i> implementations. An example ofembedded anchoring would be for patterns such as <tt>"(^| )foo( |$)"</tt> to match <tt>"foo"</tt> when it exists as acomplete word. This functionality can be obtained using existing <i>lex</i> features:</p><pre><tt>^foo/[ \n] |" foo"/[ \n] /* Found foo as a separate word. */</tt></pre><p>Note also that <tt>'$'</tt> is a form of trailing context (it is equivalent to <tt>"/\n"</tt> ) and as such cannot be used withregular expressions containing another instance of the operator (see the preceding discussion of trailing context).</p><p>The additional regular expressions trailing-context operator <tt>'/'</tt> can be used as an ordinary character if presentedwithin double-quotes, <tt>"/"</tt> ; preceded by a backslash, <tt>"\/"</tt> ; or within a bracket expression, <tt>"[/]"</tt> . The
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -