⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 elex.texi

📁 用于词法分析的词法分析器
💻 TEXI
📖 第 1 页 / 共 5 页
字号:
@file{FooObject.h} and @file{FooParser.h}) and @var{Perl} (see@file{FooObject.pm} and @file{FooParser.pm}).@c ----------------------------------------------------------------------@node       Default Target Language,  , The elex.cfg File, The Front End@subsection Default Target Language@cindex default target language@cindex -l switch, default behaviorIf you don't specify a language to the front end using `-l', it attemptsto guess the language by scanning the script for code fragments with alanguage tag.  The first tag found is assumed to be the `primary'language for the scanner and this is selected as the target.  If no tagis found then the front end uses the value of the `codegen/default'section in @file{elex.cfg} (this is set to `cpp' by default).  Finally,if this section is missing, the front end gives up with an errormessage.@c ----------------------------------------------------------------------@node       The Compiler, The Debugger, The Front End, The Elex Tool Set@section    The Compiler@cindex elexc@cindex compiler@cindex .esd files@cindex .esc files@cindex FSM, optimisationThe compiler (@file{elexc}) is the central part of the @var{Elex}package.  It takes a @file{.elx} script file and generates a scannerdefinition that is used by a code generator to make a scanner.  As thecompiler parses the script, it generates a consolidated FSM model forall regular expressions found.  If the parse is successful, the FSM isthen transformed in a number of ways, which includes a globaloptimisation to reduce the FSM to the smallest number of states.  TheFSM, along with other information about the scanner is then output tostandard output.  A @file{.ecf} file containing the code fragments fromthe script is also generated.@c ----------------------------------------------------------------------@node       The Debugger,  , The Compiler, The Elex Tool Set@section    The Debugger@cindex elexd@cindex debuggerThe debugger is a sort of `code generator' for @var{Elex} that does notactually generate any code, but instead matches its standard inputagainst the regular expressions in a script and reports what matched.The debugger is built with exactly the same scanner engine as any C++scanner, so it behaves in the same way as the scanner will in anapplication.The debugger can be invoked by running @file{elex} with the `-d' switch,or by running @file{elexd} directly with the @file{.esd} file as acommand line argument.@c ----------------------------------------------------------------------@node       Code Generation, Glossary, The Elex Tool Set, Top@chapter    Code GenerationThis section discusses aspects of @var{Elex} code generation, includinggenerating and using C++ scanners, using the Graphviz code generator andthe concept of backtracking scanner engines.@menu* C++ Code Generator::          Generates a C++ scanner engine.* Graphviz Code Generator::     A code generator that creates FSM diagrams.* Backtracking Scanners::       Scanners that try every possible match.@end menu@c ----------------------------------------------------------------------@node       C++ Code Generator, Graphviz Code Generator, Code Generation, Code Generation@section    C++ Code Generator@cindex C++, code generation@cindex code generation, C++@cindex ElexScannerThe C++ code generator generates a scanner built on the standard C++scanner engine.  Specifically, the generated scanner is derived from the@samp{ElexScanner} class which implements a full backtracking scannerengine for C++ applications.The generator works by creating a header file to specify a scanner classwith member functions which correspond to the productions in the script.@xref{Generating the Scanner} for an example of such a header file.  Toimplement the scanner, the generator creates a @file{.cpp} file which iscomposed of data structures that describe the FSM to the scanner engineand the appropriate code fragments from the script inserted as memberfunctions.Because the code fragments for handling matches of a particularproduction become member functions of the scanner engine, they are ableto manipulate the scanner engine's internals to some extent.  This isusually exploited to further process matched text (eg. to remove quotesfrom strings).  @xref{C++ Scanner Engine} for details on how to interactwith the scanner engine.@menu* Using a C++ Scanner::         How to use the scanner in C++ applications.* C++ Scanner Engine::          Overview of the C++ scanner engine.* XInputStream::                The scanner's view of an input stream.@end menu@c ----------------------------------------------------------------------@node       Using a C++ Scanner, C++ Scanner Engine, C++ Code Generator, C++ Code Generator@subsection Using a C++ Scanner@cindex using a C++ scanner@cindex C++ scanner, usingSorry, this section isn't done yet, but @ref{Using the Scanner} shouldgive enough information on how to use a scanner in a C++ application.Also see @file{demo/CalcParser.cpp}.@c ----------------------------------------------------------------------@node       C++ Scanner Engine, XInputStream, Using a C++ Scanner, C++ Code Generator@subsection C++ Scanner Engine@cindex C++ scanner engine@cindex scanner engine, C++The C++ scanner engine is an abstract class that all @var{Elex} C++scanners inherit from.  It implements the logic required to `drive' anyscanner that, when combined with a scanner definition emitted by thecode generator, produces a working scanner.  This section describes thepublic interface and relevant protected interfaces of the C++ scannerengine.@exdent @b{Public Interfaces}@table @code@item ElexScanner (ElexScannerData &data, XInputStream &input, int lookahead = 1)This creates an instance of the scanner, using @code{data} to representthe scanner definition generated by @var{Elex}.  @code{data} is astructure emitted by the code generator: the scanner class passes it tothe @code{ElexScanner} constructor for you, so you only need supply the@code{input} and @code{lookahead} parameters.The @code{input} object is used by the scanner as its input.@xref{XInputStream} for more information.@code{lookahead} defines the number of symbols the scanner reads aheadbefore reporting a match (default is 1).  By reading ahead, the scannercan backtrack and retract symbol matches when an error occurs in orderto try other possible matches. @xref{Backtracking Scanners} for moreinformation on backtracking.@item int getNext ()Causes the scanner to read the next symbol from its input and return thesymbol value.  This symbol value is also returned by @code{getSymbol()}.@item int getSymbol ()Returns the last symbol value matched by @code{getNext ()}.@item string &getText ()Returns the text matched by the last call to @code{getNext ()}.@item  int getTextLine ()@itemx int getTextColumn ()Returns the line and column that the last symbol matched by@code{getNext ()} began at (both row and column start at 0).@item  void error (const string &message)@itemx void warning (const string &message)Generates error or warning at the current symbol, using @code{message}as the error/warning message.@item const ErrorMessageList &getErrors ()Returns the current list of error/warning messages.@end table@exdent @b{Protected Interfaces}In general, there is not much you should play with in the C++ scannerengine's innards.  The bits you might need to use are listed below.@table @code@item string textThe text matched for the current symbol.  You may modify this as neededto change the text reported as matched by @code{getText ()}.@item   int symbolLine@itemx  int symbolColumnThe current line and column that the last symbol started at.  These arethe values reported by @code{getTextLine ()} and @code{getTextColumn()}.@end table@c ----------------------------------------------------------------------@node       XInputStream,  , C++ Scanner Engine, C++ Code Generator@subsection XInputStream@cindex XInputStreamThe @code{XInputStream} class encapsulates an @code{istream}, addingservices required by the C++ scanner engine. This section gives anoverview of @code{XInputStream} class only in case you're interested,otherwise you can treat @code{XInputStream} as a `black box', using itin the manner shown in @ref{Using the Scanner}.@emph{Sorry, the rest of this section is under construction.}@c ----------------------------------------------------------------------@node       Graphviz Code Generator, Backtracking Scanners, C++ Code Generator, Code Generation@section    Graphviz Code Generator@cindex Graphviz, code generator@cindex code generator, Graphviz@cindex .dot files, generating@cindex .gif files, generating@cindex GIF files, generating@cindex graphing the FSM@cindex visualising the FSMThe Graphviz code generator generates a graph of the FSM constructed bythe @var{Elex} compiler to implement the scanner.  It does this bygenerating a Graphviz @file{.dot} file that can be used with theGraphviz toolset to generate a directed-graph visualisation of the FSM.To generate a @file{.dot} file for a scanner, run the @var{Elex} frontend like this:@example> elex -l dot MyScanner.elx@end exampleThis will generate a file called @file{MyScanner.dot}, which is astandard Graphviz graph specification.  You can then generate a graph inGIF format using the a command like:@example> dot -T gif MyScanner.dot > MyScanner.gif@end exampleYou can download the Graphviz 1.0 package from@url{http://www.research.att.com/sw/tools/graphviz/}.@c ----------------------------------------------------------------------@node       Backtracking Scanners,  , Graphviz Code Generator, Code Generation@section    Backtracking Scanner@cindex backtracking@cindex greedy matching@cindex speculative matching@cindex lookaheadScanning for regular expressions is an inherently non-deterministicprocess.  Using a FSM representation and performing certainmanipulations can resolve some of this non-determinism, but murky areaswill always remain.  Example: what does a regular expression scanner dowhen it has the choice of either accepting a character as part of thecurrent symbol, or terminating the current symbol and starting a newone?  Most regular expression scanners are `greedy' in these cases; theywill always try to match the longest possible symbol.  But this canresult in the scanner failing to match legitimate symbols further on.Assume we have a simplistic regular expression scanner that matches theexpression @code{for|forest|end}.  The scanner is then presented withthe input `forend'.  After the scanner has read `for', it can eitheraccept `for' as a symbol, or accept the `e' in the hope it will be ableto match `forest'.  The greedy rule specifies that the scanner continue,and this results in a failure to match either `for' or `forest'.To solve this problem, most regular expression scanners employ abacktracking scheme that allows the scanner to mark points where it ishas been greedy (and possibly other points where it had more than onechoice) and return to them if an error occurs.  In the example above,after finding the error, the scanner would simply backtrack to the pointwhere it had just read `for', accept `for' as a symbol and then continueto successfully match `end'.The @var{Elex} scanner engine employs such a backtracking scheme.  Itsdesign also allows speculative symbol matching which means that a symbolmatch can be retracted by a backtrack operation.  When a certain numberof symbols have been speculatively matched, the oldest symbol is`frozen' and reported as matched: after which it cannot be retracted.You can specify the number of symbols that the scanner engine holds asspeculative matches by setting the `lookahead' parameter on the scannerengine.  Usually the default setting of 1 gives acceptable matchingbehavior with good performance.@c ----------------------------------------------------------------------@node       Glossary, Index, Code Generation, Top@unnumbered GlossaryNote: this glossary is pretty sparse right now.  You'll probably bebetter off using the index (@pxref{Index}).@table @dfn@item code fragmentA piece of source code from a programming language.@item code generatorA module that plugs into the compiler to generate a scanner for aparticular language.@item front endThe `elex' program that integrates the compiler, code generators anddebugger.@item FSMFinite State Machine: a graph of `states' connected by `edges'.  Themachine is always in one of the states in the graph, and may move toother states to which the current state is connected by an edge.  In ascanner, each edge has a character label associated with it, and onlyedges whose label matches the current input character are followed.@item productionA construct that defines a symbol or an event handler in a scannerscript.@item scanner engineA module of code that implements a generic @var{Elex} scanner.  A codegenerator merges a scanner definition with a scanner engine to producethe final scanner.@end table@c ----------------------------------------------------------------------@node       Index,  , Glossary, Top@unnumbered Index@printindex cp@contents@bye

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -