elex.texi

来自「用于词法分析的词法分析器」· TEXI 代码 · 共 1,687 行 · 第 1/5 页
TEXI
1,687 行
@end group@end exampleNote: a better way to compile is to use @var{make}.  See@file{demo/Makefile} for an example.)@c ----------------------------------------------------------------------@node       Scanner Scripts, The Elex Tool Set, An Example, Top@chapter    Scanner Scripts@cindex scanner script, recommended extension@cindex scanner script, defined@cindex .elx file extensionAn @var{Elex} scanner script is a text file with a `.elx' extension thatspecifies a scanner to the @var{Elex} compiler.  This section describesthe structure of @var{Elex} scanner scripts in detail.@menu* Scanner Script Structure::    The parts of a scanner script described.* Regular Expressions::         Regular expressions in detail* Productions::                 Productions described.* Code Fragments::              Code fragment formatting.* Testing Regular Expressions::  How to quickly test regular expressions.* Custom Scanner Engines::      Using non-standard scanner engines.* Multi-language Scanners::     How to handle more than one language.@end menu@c ----------------------------------------------------------------------@node       Scanner Script Structure, Regular Expressions, Scanner Scripts, Scanner Scripts@section    Scanner Script Structure@cindex BNF grammar for a scanner script@cindex scanner script, BNF grammar@cindex scanner script, parts of@cindex base class of a scanner@cindex scanner, setting base class ofAn scanner script has two major parts: the optional declarative sectionwhere symbols and regular expressions can be declared and and theproductions section which defines symbols that the scanner will match.The extended BNF grammar for an @var{Elex} scanner is:@example@group<scanner> ::= `scanner_name' <name:ident> [ `:' <base:ident> ]                (<symbols_section> |  <defines_section>)*              `begin'                 <production>*              `end'  <symbols_section> ::= `symbols' <symbol_name:ident>*  <defines_section> ::= `define' ( <regexp_name:ident> `=' <regexp> )*       <production> ::= ( <regexp_production> | <event_production> )                        <code_fragment>*<regexp_production> ::= `on' [ <production_name:ident> `:' ] <regexp> <event_production> ::= `on' <event:ident>    <code_fragment> ::= `<' <language:ident>                           @var{source code}                        `>'           <regexp> ::= `"' @var{An Elex regular expression} `"'            <ident> ::= @var{[a-zA-Z][a-zA-Z0-9_]*}@end group@end exampleA scanner definition begins with the @samp{scanner} line which gives thescanner a name which usually becomes the name of the class generated toimplement the scanner.  The rarely-used @samp{base} clause allows you toset the base (parent) class from which the generated scanner class isderived.  This allows you to use a parent class other than the@var{Elex} default class and might be useful if you create your ownscanner engine (@pxref{Custom Scanner Engines}).@menu* The Symbols Section::          Defines symbol names.* The Defines Section::         Defines symbolic regular expressions.* The Productions Section::     Defines how the scanner behaves.@end menu@c ----------------------------------------------------------------------@node       The Symbols Section, The Defines Section, Scanner Script Structure, Scanner Script Structure@subsection The Symbols Section@cindex symbols section@cindex declaring symbol names@cindex symbol names, declaring@cindex example, symbols sectionThe @samp{symbols} section is optional, but it's almost always includedbecause it's a convenient way to declare the symbols the scanner willmatch.  Any names listed in this section will be declared to be uniqueinteger constants in whatever way is appropriate for the language youtarget.  In C++ for example, these names will be declared using an@code{enum} inside the scanner class.Example:@examplesymbols  SymNumber SymIdentifier SymString SymWhiteSpace@end example@c ----------------------------------------------------------------------@node       The Defines Section, The Productions Section, The Symbols Section, Scanner Script Structure@subsection The Defines Section@cindex defines section@cindex example, defines sectionThe @samp{defines} section is where regular expressions can beassociated with a name, similar to variable declarations in aprogramming language.  These expressions can be included in anyfollowing regular expressions by using their name enclosed in anglebrackets (@samp{<>}).Example:@exampledefines  letter     = "[a-zA-Z]"  identifier = "<letter>(<letter>|0-9|_)*"@end example@c ----------------------------------------------------------------------@node       The Productions Section,  , The Defines Section, Scanner Script Structure@subsection The Productions Section@cindex productions section@cindex main part of a scanner scriptFollowing the @samp{symbols} and @samp{defines} sections comes the mainpart of the script.  This section contains productions(@pxref{Productions}) that each specify a symbol and some code to beexecuted when that symbol is encountered.@c ----------------------------------------------------------------------@node       Regular Expressions, Productions, Scanner Script Structure, Scanner Scripts@section    Regular Expressions@cindex regular expressions@cindex escape characters, regular expressions@cindex backslash, regular expressions@cindex \ operator, regular expressions@cindex ~ operator@cindex case-independent strings@cindex - operator@cindex ranges@cindex character ranges@cindex ? operator@cindex * operator@cindex + operator@cindex repetition operators@cindex wildcard operator@cindex . operator@cindex symbolic regular expressions@cindex | operator@cindex alternation operatorFor those who have used @var{egrep}, @var{Perl} or @var{sed} regularexpressions, @var{Elex}'s expressions will look familiar.  However,there are a few extensions and a number of missing features(@pxref{Future Development}, @pxref{Elex Compared to Perl}).The basic unit of a regular expression matches a single character orrange of characters.  For example @samp{a} matches the letter `a',@samp{a-z} matches any letter from `a' to `z' inclusive and @samp{.}(wildcard) matches any character.  Note that the order of the boundsfor a range is not significant, so @samp{0-9} is the same as@samp{9-0}.  Characters may be specified by their literal value(eg. @samp{a}), their decimal ASCII value (eg. @samp{\097}), theirhexadecimal ASCII value (eg. @samp{\x61}) or by using C backslashconstants (eg. @samp{\n} for newline, @samp{\t} for TAB, etc.)You can specify a set of characters to be matched using the setoperators (@samp{[ ]}).  For example, @samp{[a-zA-Z0-9\.]}  matchesany letter, digit or `.' (note the @samp{\} used to indicate that `.'is to be treated literally, not as a wildcard).  Starting a characterset with a @samp{^} matches anything @emph{but} what is in the set.For example @samp{[^xy]} matches anything but `x' or `y'.The character matching units described above can be combined to matchstrings of characters.  For example @samp{0-9[abc].}  will match`8a!', @w{`1c '}, etc.  In many regular expression implementationsmatching strings of characters in a case-independent way can beawkward resulting in expressions like @samp{[Hh][Ee][Ll][Ll][Oo]}.  In@var{Elex} the @samp{~} operator allows you to write this expressionas @samp{~hello~}.  Any literal character or @samp{.} can appear in acase-independent string.Variable-length expressions can be made using the repetition operators@samp{?}, @samp{*} and @samp{+}.  The @samp{?} operator makes thething on the left optional ie. it may appear zero or one times.  Forexample, @samp{a-z?0-9} matches a letter followed by a digit or just adigit.  Similarly @samp{*} matches zero or more occurrences of thething to its left and @samp{+} matches one or more occurrences.  Forexample @samp{0-9*a-z+} will match `123abc', `hello' and `1a' but not`' or `7'.You can insert the contents of a symbolic regular expression declaredin the @code{defines} section using the @samp{<@var{name}>} syntax.For example, if @var{whitespace} is defined to be the expression@samp{[\ \n\t\r]}, then @samp{x<whitespace>+y} is equivalent to@samp{x([\ \n\t\r])+y} (note the brackets).Expressions that match more than one thing can be specified by usingthe @samp{|} (alternation) operator.  For example, the expression@samp{hi|hello} matches either `hi' or `hello'.The precedence of all the operators described above except @samp{-}and @samp{\} can be modified by using brackets.  For example@samp{(hi|hello) (world|everyone)(?!)*} matches `hi everyone?!?!?!',`hello world' and `hello everyone?!'.@menu* Elex Compared to Perl::       Elex regular expressions against Perl's.* Operator Precedence::         Operator binding order.* Regular Expression Grammar::  A grammar for regular expressions.@end menu@c ----------------------------------------------------------------------@node       Elex Compared to Perl, Operator Precedence, Regular Expressions, Regular Expressions@subsection Elex Compared to Perl@cindex Perl, regular expression comparisons@cindex missing features, regular expressions@cindex regular expressions, differences from Perl@var{Elex} currently only has support for `vanilla' regularexpressions, which means it's lacking some things @var{Perl} usersmight be fond of.  This is going to change, but for now these are themain things missing:@itemize @bullet@item Sub-expressions.  You can use brackets, but you cannot currentlyrefer to the strings matched for a particular bracket group.@item @samp{$} and @samp{^} operators.@item @samp{\b}, @samp{\s}, @samp{\d}, etc character and meta-match operators.@item Non-greedy matching operators (@samp{*?}, etc.).@item @samp{@{@var{x}, @var{y}@}} repetition operator.@item None of the legibility (comment, whitespace, etc) features.@end itemizeWhat features @var{Elex} regular expressions @emph{do} support isdetailed in @ref{Regular Expressions}.  Things that differ in @var{Elex}or that @var{Perl} hasn't got are:@itemize @bullet@item The case-independent string operator (@samp{~}).  This aids in matchingstrings where case is not important, similar to using@samp{/@var{regexp}/i} in @var{Perl}.@item Symbolic expressions using @samp{<@var{name}>}.  Well, actually@var{Perl} does have these, you use @samp{$@var{name}} instead.@item Character ranges outside of sets.  @var{Elex} allows you to usethe @samp{@var{x}-@var{y}} form anywhere, not just between @samp{[ ]}'s.@item The wildcard (@samp{.}) operator matches @emph{any} character,including newlines.  @var{Elex} regular expressions not line-orientedunless you make them so.@end itemize@c ----------------------------------------------------------------------@node       Operator Precedence, Regular Expression Grammar, Elex Compared to Perl, Regular Expressions@subsection Operator PrecedenceThis section defines operator precedence in @var{Elex} regularexpressions.  Operators higher in the list bind tighter than those lowerin the list.@multitable @columnfractions .20 .70@item @strong{Operator}@tab @strong{Operator Class}@item @samp{\}@tab escape operator@item @samp{-}@tab range operator@item @samp{(} @samp{)}@tab precedence modifier@item @samp{?}@item @samp{*}@item @samp{+}@tab repetition@item @samp{@var{x}}@item @samp{.}@item @samp{[} @samp{]}@item @samp{~}@item @samp{<@var{variable}>}@tab character/wildcard/set/case-indep string/variable@item @samp{|}@tab alternation@end multitable@c ----------------------------------------------------------------------@node       Regular Expression Grammar,  , Operator Precedence, Regular Expressions@subsection Regular Expression GrammarThis section contains the extended BNF grammar for @var{Elex} regularexpressions.@example@group<regular_expression> ::= <term> ( `|' <term> )*              <term> ::= <factor> (`?' | `*' | `+' )            <factor> ::= <range> | <set> | <variable> |                         <ci_string> | `(' <regular_expression> `)'             <range> ::= `.' | <char> | <char> `-' <char>               <set> ::= `[' [ `^' ] <range>+ `]'          <variable> ::= `<' @var{[a-zA-Z_][a-zA-Z0-9_]*} `>'         <ci_string> ::= `~' <ci_char>+ `~'           <ci_char> ::= `.' | <char>              <char> ::= @var{literal character} |                         `\x'<hex_digit><hex_digit> |                         `\'<dec_digit><dec_digit><dec_digit> |                         `\n' | `\t' | `\v' | `\a' | `\f' | `\b' | `\r'         <dec_digit> ::= @var{[0-9]}         <hex_digit> ::= @var{[0-9a-fA-F]}@end group@end example@c ----------------------------------------------------------------------@node       Productions, Code Fragments, Regular Expressions, Scanner Scripts@section    Productions@cindex productions@cindex event handling productions@cindex pattern matching productions@cindex production naming@cindex SymNULL@cindex SymERROR@cindex code fragments, as part of a production@cindex matched text, modifying@cindex scanner script example@cindex example, scanner script@cindex modifying matched text example@cindex example, modifying matched textProductions come in two flavours: pattern-matching and event-handling.Pattern-matching productions are triggered when a regular expressionmatches some text, while event-matching productions are triggered whensome sort of event (such as an error) occurs.
elex.texi - 源码说明

本页面展示了「用于词法分析的词法分析器」中的 elex.texi 源码文件，采用 TEXI 编程语言编写，共 1,687 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与分相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?