📄 elex.texi
字号:
Below are a few example productions, both pattern-matching andevent-matching:@examplebegin@group # The Identifier matching production on Identifier "<letter>+" < return SymIdent; >@end group@group # The String matching production. Strips quotes before returning. on String "(\"([^\"\\]|\\.)*\")" < // remove quotes from matched text text = text.substr (1, text.length () - 2); return SymString; >@end group@group # The error handling production. Prints an error message and then # ignores the error text. on error < cerr << "Oh no, '" << text << "' is not a valid symbol!\n"; return SymNULL; >@end groupend@end exampleIn this example, @samp{Identifier} and @samp{String} arepattern-matching productions and @samp{error} is an event production. Apattern production has an optional name followed by a regular expressionand then zero or more code fragments. If no production names arepresent, the productions are named `1', `2', `3', etc. It's a good ideato supply descriptive production names though, because it makes testingthe scanner and reading the code easier.The code fragment associated with a production ends up being a member ofthe class generated to implement the scanner. The minimum a codefragment should do is return a value to indicate what symbol has beenmatched. This value will usually be one of the values declared in the@samp{symbols} section or @samp{SymNULL} or @samp{SymERROR}, which arespecial symbol values. Returning @samp{SymNULL} indicates that thescanner should completely ignore the match: the application using thescanner will never see this symbol. @samp{SymERROR} indicates that thescanner should treat the text as non-matchable: ie. generate an `illegalsymbol' error message and perform standard error recovery.A code fragment can do post-processing of matched text, print messagesor whatever. For example, the @samp{String} production in the exampleabove removes the quotes from the matched string before returning@samp{SymString}. The manipulations that can be performed within thescanner depend on what scanner engine is being used. @xref{CodeGeneration} for more information on the accessible bits of theindividual scanner engines.@menu* Production Matching Order:: How Elex handles production overlap.@end menu@c ----------------------------------------------------------------------@node Production Matching Order, , Productions, Productions@subsection Production Matching Order@cindex production matching order@cindex order of production matching@cindex error-handling productions@cindex example, error-handling productionWhere two or more pattern-matching productions could be triggered fora piece of text, @var{Elex} picks the production that appears furthestdown in the script. This means that later productions can partially(or fully) override earlier productions. You can use this feature tospecify symbols by defining a general case and then defining a set ofspecial cases further on. For example:@example@groupon Word "a-z+"<>on HelloWorld "helloworld"<>@end group@end exampleThe @samp{Word} production matches any string of lowercase characters,while the @samp{HelloWorld} production matches only the string`helloworld', which is a special case of @samp{Word}. Because@samp{HelloWorld} comes later in the script, it gets the matchwhenever `helloworld' is seen.This rule can also be used to handle errors. For example:@example@groupon IllegalEscapeCode "\\..?.?"< error ("Escape sequence must contain a three-digit number"); return SymNULL;>on EscapeCode "\\0-90-90-9"<>@end group@end exampleThe example above contains an error-handling production(@samp{IllegalEscapeCode}) that matches any sequence of up to threecharacters following a @samp{\}. A second production, @samp{EscapeCode}partially overrides @samp{IllegalEscapeCode} by accepting a @samp{\}followed by three digits. This means that whenever any illegal @samp{\}sequence is encountered (ie. one that fails to match the secondproduction), the error-handling production will be triggered.@c ----------------------------------------------------------------------@node Code Fragments, Testing Regular Expressions, Productions, Scanner Scripts@section Code Fragments@cindex code fragments@cindex language tag, code fragments@cindex multi-language code fragments@cindex C++ code fragment@cindex Ada code fragment@cindex Java code fragment@cindex example, code fragmentsA code fragment is simply some lines of programming language code. Eachcode fragment is delineated by @samp{<} appearing on a line by itselfand ended by @samp{>}, also on a line by itself (although whitespace mayappear before and after). For example:@example@group< // This is a fragment of C++ code. int i = 42; cout << "i + 1 = " << i + 1 << endl;>@end group@end exampleA code fragment can also have a language tag associated with it to tell@var{Elex} what programming language is contained in the fragment. Forexample:@example@group<cpp // This is a C++ code fragment.><java // This is a Java code fragment.><ada -- This is an Ada code fragment.>@end group@end exampleThe language tag allows you to include code for more than one languagein a script or---more commonly---it allows you to specify that thescript is targeting a language other than the default (@pxref{DefaultTarget Language}).@c ----------------------------------------------------------------------@node Testing Regular Expressions, Custom Scanner Engines, Code Fragments, Scanner Scripts@section Testing Regular Expressions@cindex debugging regular expressions@cindex testing regular expressions@cindex Elex debugger @cindex -d switch@var{Elex} includes a debugger that allows you to test the regularexpressions in scanner script before you generate code. To run thedebugger using the @var{Elex} front end, use the `-d' switch (@pxref{TheDebugger}).When run in debug mode, @var{Elex} compiles the script as usual and, ifthere are no errors, it then invokes the debugger. The debugger scansits standard input using the regular expressions in the script andoutputs each matched string along with the name of the production thatwould be triggered for that string. @xref{Testing the Script} for anexample of debugger's output.@c ----------------------------------------------------------------------@node Custom Scanner Engines, Multi-language Scanners, Testing Regular Expressions, Scanner Scripts@section Custom Scanner EnginesUnder construction.@c ----------------------------------------------------------------------@node Multi-language Scanners, , Custom Scanner Engines, Scanner Scripts@section Multi-language ScannersUnder construction.@c ----------------------------------------------------------------------@node The Elex Tool Set, Code Generation, Scanner Scripts, Top@chapter The Elex Tool SetThis section describes the tools included in the @var{Elex} package. Ofthe tools described here, only the front end and the debugger need to bedealt with directly in normal use: the other pieces are described forthose who are curious or who wish to extend @var{Elex}. The codegenerator modules are not described here: they get their own chapter(@pxref{Code Generation}).@menu* The Front End:: The central point of control.* The Compiler:: The bit that does the work.* The Debugger:: Use this when something doesn't work.@end menu@c ----------------------------------------------------------------------@node The Front End, The Compiler, The Elex Tool Set, The Elex Tool Set@section The Front EndThe @var{Elex} front end integrates the modules that make up the@var{Elex} package. It can be used to generate source code from ascanner script, run the debugger on a script, or compile the script intoan intermediate file. The front end is similar in purpose to the@var{gcc} program which is a convenient front end for the GNU compilers,assembler, linker, etc.@menu* Generating Code:: Making source code from a script.* Debugging:: Using the debugger.* Compile Only:: Create some useless intermediate files.* The elex.cfg File:: The Elex configuration file.* Default Target Language:: The default language for code generation.@end menu@c ----------------------------------------------------------------------@node Generating Code, Debugging, The Front End, The Front End@subsection Generating Code@cindex generating source code@cindex -l switch@cindex target language, specifyingTo generate source code for a scanner, simply invoke the @var{Elex}front end with the name of your scanner:@exampleelex myscanner.elx@end exampleThe front end first invokes the compiler to generate intermediate codefrom the script. It then invokes the appropriate code generator(@pxref{Default Target Language}) which creates source code from thecompiled intermediate form. If you want to explicitly set the language@var{Elex} targets (ie. manually select the code generator) you can usethe `-l' option:@exampleelex -l java myscanner.elx@end exampleThis tells the front end to skip it's usual attempt to guess the targetlanguage and use `java' as the target.All files generated by the front end are placed in the same directory asthe scanner script. What files are actually generated vary depending onwhat language you target, but their names will generally be based on thename of scanner script with differing extensions. For example, if youtarget C++ using @samp{elex -l cpp subdir/myscanner.elx}, you will get@file{subdir/myscanner.h} and @file{subdir/myscanner.cpp} as outputfiles.@c ----------------------------------------------------------------------@node Debugging, Compile Only, Generating Code, The Front End@subsection Debugging@cindex -d switch@cindex debuggerUsing the `-d' switch tells the front end to compile the script and thenrun the debugger. The debugger matches the regular expressions in thescript against lines read from standard input and reports the matches tostandard output. @xref{Testing the Script} for a sample of thedebugger's output.@c ----------------------------------------------------------------------@node Compile Only, The elex.cfg File, Debugging, The Front End@subsection Compile Only@cindex -c switch@cindex .esd files@cindex .ecf files@cindex FSM, in a .esd file@cindex intermediate form@cindex code fragments, .ecf form@cindex compile-only optionUsing the `-c' (compile-only) option causes the front end to use thecompiler to generate two intermediate files: a `.esd' (@var{Elex}scanner definition) file and a `.ecf' (@var{Elex} code fragment) file.These files are not particularly useful except for debugging thecompiler. However, you can generate code from an `.esd'/`.ecf' pair bypassing the `.esd' file to the front end. This will be slightly fasterthan generating code from a script since it bypasses the compile step.An `.esd' file contains an intermediate form of the script. Theintermediate form is a language-neutral description of the informationneeded by a code generator to generate a scanner, and for the most partconsists of a FSM (Finite State Machine). The `.ecf' file simplycontains the necessary code fragments from the script.If you're really interested in what either of these two files actuallycontain, you can run the front end with `-c' on the@file{demo/CalcScanner.elx} file supplied as part of the @var{Elex}installation.@c ----------------------------------------------------------------------@node The elex.cfg File, Default Target Language, Compile Only, The Front End@subsection The elex.cfg File@cindex elex.cfg@cindex FOO formatThe @file{elex.cfg} file contains configuration information for the@var{Elex} tool set. Amongst other things, it defines where thecompiler, debugger and code generators are and any options associatedwith these components.The @file{elex.cfg} file uses the @var{FOO} heirachical informationformat which can be used for general information and configurationstorage. Actually, the @var{Elex} compiler emits the compiledintermediate scanner definition in @var{FOO} format. If you'reinterested in using this format, support for @var{FOO} file I/O and dataaccess is provided in the @file{foo} directory for both C++ (see
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -