re2c.1

来自「a little DFA compiler.」· 1 代码 · 共 598 行 · 第 1/2 页

1
598
字号
./" ./" $Id: re2c.1.in 663 2007-04-01 11:22:15Z helly $./".TH RE2C 1 "22 April 2005" "Version 0.12.3".ds re \fBre2c\fP.ds le \fBlex\fP.ds rx regular expression.ds lx \fIl\fP-expression.SH NAMEre2c \- convert regular expressions to C/C++.SH SYNOPSIS\*(re [\fB-bdefghisuvVw1\fP] [\fB-o output\fP] file\fP.SH DESCRIPTION\*(re is a preprocessor that generates C-based recognizers from regularexpressions.The input to \*(re consists of C/C++ source interleaved withcomments of the form \fC/*!re2c\fP ... \fC*/\fP which containscanner specifications.In the output these comments are replaced with code that, whenexecuted, will find the next input token and then executesome user-supplied token-specific code.For example, given the following code.in +3.nfchar *scan(char *p){/*!re2c        re2c:define:YYCTYPE  = "unsigned char";        re2c:define:YYCURSOR = p;        re2c:yyfill:enable   = 0;        re2c:yych:conversion = 1;        re2c:indent:top      = 1;        [0-9]+          {return p;}        [\000-\377]     {return (char*)0;}*/}.fi.in -3\*(re -is will generate.in +3.nf/* Generated by re2c on Sat Apr 16 11:40:58 1994 */char *scan(char *p){    {        unsigned char yych;        yych = (unsigned char)*p;        if(yych &lt;= '/') goto yy4;        if(yych &gt;= ':') goto yy4;        ++p;        yych = (unsigned char)*p;        goto yy7;yy3:        {return p;}yy4:        ++p;        yych = (unsigned char)*p;        {return char*)0;}yy6:        ++p;        yych = (unsigned char)*p;yy7:        if(yych &lt;= '/') goto yy3;        if(yych &lt;= '9') goto yy6;        goto yy3;    }}.fi.in -3You can place one \fC/*!max:re2c */\fP comment that will output a "#define \fCYYMAXFILL\fP <n>" line that holds the maximum number of characters required to parse the input. That is the maximum value \fCYYFILL\fP(n)will receive. If -1 is in effect then YYMAXFILL can only be triggered onceafter the last \fC/*!re2c */\fP.You can also use \fC/*!ignore:re2c */\fP blocks that allows to document thescanner code and will not be part of the output..SH OPTIONS\*(re provides the following options:.TP\fB-?\fP\fB-h\fPInvoke a short help..TP\fB-b\fPImplies \fB-s\fP.  Use bit vectors as well in the attempt to coax bettercode out of the compiler.  Most useful for specifications with more than afew keywords (e.g. for most programming languages)..TP\fB-d\fPCreates a parser that dumps information about the current position and in which state the parser is while parsing the input. This is useful to debug parser issues and states. If you use this switch you need to define a macro\fIYYDEBUG\fP that is called like a function with two parameters:\fIvoid YYDEBUG(int state, char current)\fP. The first parameter receives the state or -1 and the second parameter receives the input at the current cursor..TP\fB-e\fPCross-compile from an ASCII platform to an EBCDIC one. .TP\fB-f\fPGenerate a scanner with support for storable state.For details see below at \fBSCANNER WITH STORABLE STATES\fP..TP\fB-g\fPGenerate a scanner that utilizes GCC's computed goto feature. That is \*(regenerates jump tables whenever a decision is of a certain complexity (e.g. a lot of if conditions are otherwise necessary). This is only useable with GCC and produces output that cannot be compiled with any other compiler. Note thatthis implies -b and that the complexity threshold can be configured using theinplace configuration "cgoto:threshold"..TP\fB-i\fPDo not output #line information. This is usefull when you want use a CMS toolwith the \*(re output which you might want if you do not require your users to have \*(re themselves when building from your source.\fB-o output\fPSpecify the output file..TP\fB-s\fPGenerate nested \fCif\fPs for some \fCswitch\fPes.  Many compilers need thisassist to generate better code..TP\fB-u\fPGenerate a parser that supports Unicode chars (UTF-32). This means the generated code can deal with any valid Unicode character up to 0x10FFFF. WhenUTF-8 or UTF-16 needs to be supported you need to convert the incoming streamto UTF-32 upon input yourself..TP\fB-v\fPShow version information..TP\fB-V\fPShow the version as a number XXYYZZ..TP\fB-w\fPCreate a parser that supports wide chars (UCS-2). This implies \fB-s\fP and cannot be used together with \fB-e\fP switch..TP\fB-1\fPForce single pass generation, this cannot be combined with -f and disables YYMAXFILL generation prior to last \*(re block..TP\fb--no-generation-date\fPSuppress date output in the generated output so that it only shows the re2cversion..SH "INTERFACE CODE"Unlike other scanner generators, \*(re does not generate complete scanners:the user must supply some interface code.In particular, the user must define the following macros or use the corresponding inplace configurations:.TP\fCYYCTYPE\fPType used to hold an input symbol.Usually \fCchar\fP or \fCunsigned char\fP..TP\fCYYCURSOR\fP\*(lx of type \fC*YYCTYPE\fP that points to the current input symbol.The generated code advances \fCYYCURSOR\fP as symbols are matched.On entry, \fCYYCURSOR\fP is assumed to point to the first character of thecurrent token.  On exit, \fCYYCURSOR\fP will point to the first character ofthe following token..TP\fCYYLIMIT\fPExpression of type \fC*YYCTYPE\fP that marks the end of the buffer(\fCYYLIMIT[-1]\fP is the last character in the buffer).The generated code repeatedly compares \fCYYCURSOR\fP to \fCYYLIMIT\fPto determine when the buffer needs (re)filling..TP\fCYYMARKER\fP\*(lx of type \fC*YYCTYPE\fP.The generated code saves backtracking information in \fCYYMARKER\fP. Some easyscanners might not use this..TP\fCYYCTXMARKER\fP\*(lx of type \fC*YYCTYPE\fP.The generated code saves trailing context backtracking information in \fCYYCTXMARKER\fP.The user only needs to define this macro if a scanner specification uses trailingcontext in one or more of its regular expressions..TP\fCYYFILL\fP(\fIn\fP\fC\fP)The generated code "calls" \fCYYFILL\fP(n) when the buffer needs(re)filling:  at least \fIn\fP additional characters shouldbe provided.  \fCYYFILL\fP(n) should adjust \fCYYCURSOR\fP, \fCYYLIMIT\fP,\fCYYMARKER\fP and \fCYYCTXMARKER\fP as needed.  Note that for typical programming languages \fIn\fP will be the length of the longest keyword plus one.The user can place a comment of the form \fC/*!max:re2c */\fP once to insert a \fCYYMAXFILL\fP(n) definition that is set to the maximum length value. If -1 switch is used then \fCYYMAXFILL\fP can be triggered only once after the last \fC/*!re2c */\fPblock..TP\fCYYGETSTATE\fP()The user only needs to define this macro if the \fB-f\fP flag was specified.In that case, the generated code "calls" \fCYYGETSTATE\fP() at the very beginningof the scanner in order to obtain the saved state. \fCYYGETSTATE\fP() must return a signedinteger. The value must be either -1, indicating that the scanner is entered for thefirst time, or a value previously saved by \fCYYSETSTATE\fP(s).  In the second case, thescanner will resume operations right after where the last \fCYYFILL\fP(n) was called..TP\fCYYSETSTATE(\fP\fIs\fP\fC)\fPThe user only needs to define this macro if the \fB-f\fP flag was specified.In that case, the generated code "calls" \fCYYSETSTATE\fP just before calling\fCYYFILL\fP(n).  The parameter to \fCYYSETSTATE\fP is a signed integer that uniquelyidentifies the specific instance of \fCYYFILL\fP(n) that is about to be called.Should the user wish to save the state of the scanner and have \fCYYFILL\fP(n) returnto the caller, all he has to do is store that unique identifer in a variable.Later, when the scannered is called again, it will call \fCYYGETSTATE()\fP andresume execution right where it left off. The generated code will contain both \fCYYSETSTATE\fP(s) and \fCYYGETSTATE\fP even if \fCYYFILL\fP(n) is beingdisabled..TP\fCYYDEBUG(\fP\fIstate\fP,\fIcurrent\fC)\fPThis is only needed if the \fB-d\fP flag was specified. It allows to easily debugthe generated parser by calling a user defined function for every state. The functionshould have the following signature: \fIvoid YYDEBUG(int state, char current)\fP. The first parameter receives the state or -1 and the second parameter receives the input at the current cursor..TP\fCYYMAXFILLThis will be automatically defined by \fC/*!max:re2c */\fP blocks as explained above..SH "SCANNER WITH STORABLE STATES"When the \fB-f\fP flag is specified, \*(re generates a scanner thatcan store its current state, return to the caller, and later resumeoperations exactly where it left off.The default operation of \*(re is a "pull" model, where the scanner asksfor extra input whenever it needs it. However, this mode of operationassumes that the scanner is the "owner" the parsing loop, and that maynot always be convenient.Typically, if there is a preprocessor ahead of the scanner in the stream,or for that matter any other procedural source of data, the scanner cannot"ask" for more data unless both scanner and source live in a separate threads.The \fB-f\fP flag is useful for just this situation : it lets users designscanners that work in a "push" model, i.e. where data is fed to the scannerchunk by chunk. When the scanner runs out of data to consume, it just storesits state, and return to the caller. When more input data is fed to the scanner,it resumes operations exactly where it left off.When using the -f option \*(re does not accept stdin because it has to do the full generation process twice which means it has to read the input twice. Thatmeans \*(re would fail in case it cannot open the input twice or reading theinput for the first time influences the second read attempt.Changes needed compared to the "pull" model.1. User has to supply macros YYSETSTATE() and YYGETSTATE(state)2. The \fB-f\fP option inhibits declaration of \fIyych\fP and\fIyyaccept\fP. So the user has to declare these. Also the user hasto save and restore these. In the example \fIexamples/push.re\fP theseare declared as fields of the (C++) class of which the scanner is amethod, so they do not need to be saved/restored explicitly. For Cthey could e.g. be made macros that select fields from a structurepassed in as parameter. Alternatively, they could be declared as localvariables, saved with YYFILL(n) when it decides to return and restoredat entry to the function. Also, it could be more efficient to save thestate from YYFILL(n) because YYSETSTATE(state) is calledunconditionally. YYFILL(n) however does not get \fIstate\fP asparameter, so we would have to store state in a local variable byYYSETSTATE(state).3. Modify YYFILL(n) to return (from the function calling it) if moreinput is needed.4. Modify caller to recognise "more input is needed" and respondappropriately.5. The generated code will contain a switch block that is used to restores the last state by jumping behind the corrspoding YYFILL(n) call. This code isautomatically generated in the epilog of the first "\fC/*!re2c */\fP" block. It is possible to trigger generation of the YYGETSTATE() block earlier by placing a "\fC/*!getstate:re2c */\fP" comment. This is especially useful whenthe scanner code should be wrapped inside a loop.Please see examples/push.re for push-model scanner. The generated code can betweaked using inplace configurations "\fBstate:abort\fP" and "\fBstate:nextlabel\fP"..SH "SCANNER SPECIFICATIONS"Each scanner specification consists of a set of \fIrules\fP, \fInameddefinitions\fP and \fIconfigurations\fP..LP\fIRules\fP consist of a regular expression along with a block of C/C++ code thatis to be executed when the associated \fIregular expression\fP is matched..P.RS

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?