re2c.1
来自「a little DFA compiler.」· 1 代码 · 共 598 行 · 第 1/2 页
1
598 行
\fIregular expression\fP \fC{\fP \fIC/C++ code\fP \fC}\fP.RE.LPNamed definitions are of the form:.P.RS\fIname\fP \fC=\fP \fIregular expression\fP\fC;\fP.RE.LPConfigurations look like named definitions whose names start with "\fBre2c:\fP":.P.RS\fCre2c:\fP\fIname\fP \fC=\fP \fIvalue\fP\fC;\fP.RE.RS\fCre2c:\fP\fIname\fP \fC=\fP \fB"\fP\fIvalue\fP\fB"\fP\fC;\fP.RE.SH "SUMMARY OF RE2C REGULAR EXPRESSIONS".TP\fC"foo"\fPthe literal string \fCfoo\fP.ANSI-C escape sequences can be used..TP\fC'foo'\fPthe literal string \fCfoo\fP (characters [a-zA-Z] treated case-insensitive).ANSI-C escape sequences can be used..TP\fC[xyz]\fPa "character class"; in this case,the \*(rx matches either an '\fCx\fP', a '\fCy\fP', or a '\fCz\fP'..TP\fC[abj-oZ]\fPa "character class" with a range in it;matches an '\fCa\fP', a '\fCb\fP', any letter from '\fCj\fP' through '\fCo\fP',or a '\fCZ\fP'..TP\fC[^\fIclass\fP\fC]\fPan inverted "character class"..TP\fIr\fP\fC\e\fP\fIs\fPmatch any \fIr\fP which isn't an \fIs\fP. \fIr\fP and \fIs\fP must be regular expressionswhich can be expressed as character classes..TP\fIr\fP\fC*\fPzero or more \fIr\fP's, where \fIr\fP is any regular expression.TP\fC\fIr\fP\fC+\fPone or more \fIr\fP's.TP\fC\fIr\fP\fC?\fPzero or one \fIr\fP's (that is, "an optional \fIr\fP").TPnamethe expansion of the "named definition" (see above).TP\fC(\fP\fIr\fP\fC)\fPan \fIr\fP; parentheses are used to override precedence(see below).TP\fIrs\fPan \fIr\fP followed by an \fIs\fP ("concatenation").TP\fIr\fP\fC|\fP\fIs\fPeither an \fIr\fP or an \fIs\fP.TP\fIr\fP\fC/\fP\fIs\fPan \fIr\fP but only if it is followed by an \fIs\fP. The \fIs\fP is not part ofthe matched text. This type of \*(rx is called "trailing context". A trailingcontext can only be the end of a rule and not part of a named definition..TP\fIr\fP\fC{\fP\fIn\fP\fC}\fPmatches \fIr\fP exactly \fIn\fP times..TP\fIr\fP\fC{\fP\fIn\fP\fC,}\fPmatches \fIr\fP at least \fIn\fP times..TP\fIr\fP\fC{\fP\fIn\fP\fC,\fP\fIm\fP\fC}\fPmatches \fIr\fP at least \fIn\fP but not more than \fIm\fP times..TP\fC.\fPmatch any character except newline (\\n)..TP\fIdef\fPmatches named definition as specified by \fIdef\fP..LPCharacter classes and string literals may contain octoal or hexadecimal character definitions and the following set of escape sequences (\fB\\n\fP, \fB\\t\fP, \fB\\v\fP, \fB\\b\fP, \fB\\r\fP, \fB\\f\fP, \fB\\a\fP, \fB\\\\\fP).An octal character is defined by a backslash followed by its three octal digitsand a hexadecimal character is defined by backslash, a lower cased '\fBx\fP' and its two hexadecimal digits or a backslash, an upper cased \fBX\fP and its four hexadecimal digits..LP\*(re further more supports the c/c++ unicode notation. That is a backslash followedby either a lowercased \fBu\fP and its four hexadecimal digits or an uppercased \fBU\fP and its eight hexadecimal digits. However only in \fB-u\fP mode thegenerated code can deal with any valid Unicode character up to 0x10FFFF..LPSince characters greater \fB\\X00FF\fP are not allowed in non unicode mode, the only portable "\fBany\fP" rules are \fB(.|"\\n")\fP and \fB[^]\fP..LPThe regular expressions listed above are grouped according toprecedence, from highest precedence at the top to lowest at the bottom.Those grouped together have equal precedence..SH "INPLACE CONFIGURATION".LPIt is possible to configure code generation inside \*(re blocks. The followinglists the available configurations:.TP\fIre2c:indent:top\fP \fB=\fP 0 \fB;\fPSpecifies the minimum number of indendation to use. Requires a numeric value greater than or equal zero..TP\fIre2c:indent:string\fP \fB=\fP "\\t" \fB;\fPSpecifies the string to use for indendation. Requires a string that should contain only whitespace unless you need this for external tools. The easiest way to specify spaces is to enclude them in single or double quotes. If you do not want any indendation at all you can simply set this to \fB""\fP..TP\fIre2c:yybm:hex\fP \fB=\fP 0 \fB;\fPIf set to zero then a decimal table is being used else a hexadecimal table will be generated..TP\fIre2c:yyfill:enable\fP \fB=\fP 1 \fB;\fPSet this to zero to suppress generation of YYFILL(n). When using this be sureto verify that the generated scanner does not read behind input. Allowingthis behavior might introduce sever security issues to you programs..TP\fIre2c:yyfill:parameter\fP \fB=\fP 1 \fB;\fPAllows to suppress parameter passing to \fBYYFILL\fP calls. If set to zero then no parameter is passed to \fBYYFILL\fP. If set to a non zero value then\fBYYFILL\fP usage will be followed by the number of requested characters inbraces..TP\fIre2c:startlabel\fP \fB=\fP 0 \fB;\fPIf set to a non zero integer then the start label of the next scanner blocks will be generated even if not used by the scanner itself. Otherwise the normal \fByy0\fP like start label is only being generated if needed. If set to a text value then a label with that text will be generated regardless of whether the normal start label is being used or not. This setting is being reset to \fB0\fPafter a start label has been generated..TP\fIre2c:labelprefix\fP \fB=\fP yy \fB;\fPAllows to change the prefix of numbered labels. The default is \fByy\fP andcan be set any string that is a valid label..TP\fIre2c:state:abort\fP \fB=\fP 0 \fB;\fPWhen not zero and switch -f is active then the \fCYYGETSTATE\fP block will contain a default case that aborts and a -1 case is used for initialization..TP\fIre2c:state:nextlabel\fP \fB=\fP 0 \fB;\fPUsed when -f is active to control whether the \fCYYGETSTATE\fP block is followed by a \fCyyNext:\fP label line. Instead of using \fCyyNext\fP you can usually also use configuration \fIstartlabel\fP to force a specific start labelor default to \fCyy0\fP as start label. Instead of using a dedicated label it is often better to separate the YYGETSTATE code from the actual scanner code byplacing a "\fC/*!getstate:re2c */\fP" comment..TP\fIre2c:cgoto:threshold\fP \fB=\fP 9 \fB;\fPWhen -g is active this value specifies the complexity threshold that triggersgeneration of jump tables rather than using nested if's and decision bitfields.The threshold is compared against a calculated estimation of if-s needed where every used bitmap divides the threshold by 2..TP\fIre2c:yych:conversion\fP \fB=\fP 0 \fB;\fPWhen the input uses signed characters and \fB-s\fP or \fB-b\fP switches are in effect re2c allows to automatically convert to the unsigned character type that is then necessary for its internal single character. When this setting is zero or an empty string the conversion is disabled. Using a non zero numberthe conversion is taken from \fBYYCTYPE\fP. If that is given by an inplace configuration that value is being used. Otherwise it will be \fB(YYCTYPE)\fP and changes to that configuration are no longer possible. When this setting isa string the braces must be specified. Now assuming your input is a \fBchar*\fPbuffer and you are using above mentioned switches you can set \fBYYCTYPE\fP to\fBunsigned char\fP and this setting to either \fB1\fP or \fB"(unsigned char)"\fP..TP\fIre2c:define:YYCTXMARKER\fP \fB=\fP YYCTXMARKER \fB;\fPAllows to overwrite the define YYCTXMARKER and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYCTYPE\fP \fB=\fP YYCTYPE \fB;\fPAllows to overwrite the define YYCTYPE and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYCURSOR\fP \fB=\fP YYCURSOR \fB;\fPAllows to overwrite the define YYCURSOR and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYDEBUG\fP \fB=\fP YYDEBUG \fB;\fPAllows to overwrite the define YYDEBUG and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYFILL\fP \fB=\fP YYFILL \fB;\fPAllows to overwrite the define YYFILL and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYGETSTATE\fP \fB=\fP YYGETSTATE \fB;\fPAllows to overwrite the define YYGETSTATE and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYLIMIT\fP \fB=\fP YYLIMIT \fB;\fPAllows to overwrite the define YYLIMIT and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYMARKER\fP \fB=\fP YYMARKER \fB;\fPAllows to overwrite the define YYMARKER and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:define:YYSETSTATE\fP \fB=\fP YYSETSTATE \fB;\fPAllows to overwrite the define YYSETSTATE and thus avoiding it by setting thevalue to the actual code needed..TP\fIre2c:label:yyFillLabel\fP \fB=\fP yyFillLabel \fB;\fPAllows to overwrite the name of the label yyFillLabel..TP\fIre2c:label:yyNext\fP \fB=\fP yyNext \fB;\fPAllows to overwrite the name of the label yyNext..TP\fIre2c:variable:yyaccept\fP \fB=\fP yyaccept \fB;\fPAllows to overwrite the name of the variable yyaccept..TP\fIre2c:variable:yybm\fP \fB=\fP yybm \fB;\fPAllows to overwrite the name of the variable yybm..TP\fIre2c:variable:yych\fP \fB=\fP yych \fB;\fPAllows to overwrite the name of the variable yych..TP\fIre2c:variable:yytarget\fP \fB=\fP yytarget \fB;\fPAllows to overwrite the name of the variable yytarget..SH "UNDERSTANDING RE2C".LPThe subdirectory lessons of the \*(re distribution contains a few step by steplessons to get you started with \*(re. All examples in the lessons subdirectorycan be compiled and actually work..SH FEATURES.LP\*(re does not provide a default action:the generated code assumes that the inputwill consist of a sequence of tokens.Typically this can be dealt with by adding a rule such as the one forunexpected characters in the example above..LPThe user must arrange for a sentinel token to appear at the end of input(and provide a rule for matching it):\*(re does not provide an \fC<<EOF>>\fP expression.If the source is from a null-byte terminated string, arule matching a null character will suffice. If the source is from afile then you could pad the input with a newline (or some other character that cannot appear within another token); upon recognizing such a character check to see if it is the sentinel and act accordingly. And you can also use YYFILL(n)to end the scanner in case not enough characters are available which is nothingelse then e detection of end of data/file..LP\*(re does not provide start conditions: use a separate scannerspecification for each start condition (as illustrated in the above example)..SH BUGS.LPDifference only works for character sets..LPThe \*(re internal algorithms need documentation..SH "SEE ALSO".LPflex(1), lex(1)..PMore information on \*(re can be found here:.PD 0.P.B http://re2c.org/.PD 1.SH AUTHORS.PD 0.PPeter Bumbulis <peter@csg.uwaterloo.ca>.PBrian Young <bayoung@acm.org>.PDan Nuffer <nuffer@users.sourceforge.net>.PMarcus Boerger <helly@users.sourceforge.net>.PHartmut Kaiser <hkaiser@users.sourceforge.net>.PEmmanuel Mogenet <mgix@mgix.com> added storable state.P.PD 1.SH VERSION INFORMATIONThis manpage describes \*(re, version 0.12.3..fi
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?