flexdoc.1

来自「生成C++的词法/语法分析的Flex语法分析器」· 1 代码 · 共 2,448 行 · 第 1/5 页
2,448 行
.TH FLEX 1 "26 May 1990" "Version 2.3".SH NAMEflex - fast lexical analyzer generator.SH SYNOPSIS.B flex.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton].I [filename ...].SH DESCRIPTION.I flexis a tool for generating.I scanners:programs which recognized lexical patterns in text..I flexreadsthe given input files, or its standard input if no file names are given,for a description of a scanner to generate.  The description is inthe form of pairsof regular expressions and C code, called.I rules.  flexgenerates as output a C source file,.B lex.yy.c,which defines a routine.B yylex().This file is compiled and linked with the.B -lfllibrary to produce an executable.  When the executable is run,it analyzes its input for occurrencesof the regular expressions.  Whenever it finds one, it executesthe corresponding C code..SH SOME SIMPLE EXAMPLES.LPFirst some simple examples to get the flavor of how one uses.I flex.The following.I flexinput specifies a scanner which whenever it encounters the string"username" will replace it with the user's login name:.nf    %%    username    printf( "%s", getlogin() );.fiBy default, any text not matched by a.I flexscanneris copied to the output, so the net effect of this scanner isto copy its input file to its output with each occurrenceof "username" expanded.In this input, there is just one rule.  "username" is the.I patternand the "printf" is the.I action.The "%%" marks the beginning of the rules..LPHere's another simple example:.nf        int num_lines = 0, num_chars = 0;    %%    \\n    ++num_lines; ++num_chars;    .     ++num_chars;    %%    main()        {        yylex();        printf( "# of lines = %d, # of chars = %d\\n",                num_lines, num_chars );        }.fiThis scanner counts the number of characters and the numberof lines in its input (it produces no output other than thefinal report on the counts).  The first linedeclares two globals, "num_lines" and "num_chars", which are accessibleboth inside.B yylex()and in the.B main()routine declared after the second "%%".  There are two rules, onewhich matches a newline ("\\n") and increments both the line count andthe character count, and one which matches any character other thana newline (indicated by the "." regular expression)..LPA somewhat more complicated example:.nf    /* scanner for a toy Pascal-like language */    %{    /* need this for the call to atof() below */    #include <math.h>    %}    DIGIT    [0-9]    ID       [a-z][a-z0-9]*    %%    {DIGIT}+    {                printf( "An integer: %s (%d)\\n", yytext,                        atoi( yytext ) );                }    {DIGIT}+"."{DIGIT}*        {                printf( "A float: %s (%g)\\n", yytext,                        atof( yytext ) );                }    if|then|begin|end|procedure|function        {                printf( "A keyword: %s\\n", yytext );                }    {ID}        printf( "An identifier: %s\\n", yytext );    "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );    "{"[^}\\n]*"}"     /* eat up one-line comments */    [ \\t\\n]+          /* eat up whitespace */    .           printf( "Unrecognized character: %s\\n", yytext );    %%    main( argc, argv )    int argc;    char **argv;        {        ++argv, --argc;  /* skip over program name */        if ( argc > 0 )                yyin = fopen( argv[0], "r" );        else                yyin = stdin;                yylex();        }.fiThis is the beginnings of a simple scanner for a language likePascal.  It identifies different types of.I tokensand reports on what it has seen..LPThe details of this example will be explained in the followingsections..SH FORMAT OF THE INPUT FILEThe.I flexinput file consists of three sections, separated by a line with just.B %%in it:.nf    definitions    %%    rules    %%    user code.fiThe.I definitionssection contains declarations of simple.I namedefinitions to simplify the scanner specification, and declarations of.I start conditions,which are explained in a later section..LPName definitions have the form:.nf    name definition.fiThe "name" is a word beginning with a letter or an underscore ('_')followed by zero or more letters, digits, '_', or '-' (dash).The definition is taken to begin at the first non-white-space characterfollowing the name and continuing to the end of the line.The definition can subsequently be referred to using "{name}", whichwill expand to "(definition)".  For example,.nf    DIGIT    [0-9]    ID       [a-z][a-z0-9]*.fidefines "DIGIT" to be a regular expression which matches asingle digit, and"ID" to be a regular expression which matches a letterfollowed by zero-or-more letters-or-digits.A subsequent reference to.nf    {DIGIT}+"."{DIGIT}*.fiis identical to.nf    ([0-9])+"."([0-9])*.fiand matches one-or-more digits followed by a '.' followedby zero-or-more digits..LPThe.I rulessection of the.I flexinput contains a series of rules of the form:.nf    pattern   action.fiwhere the pattern must be unindented and the action must beginon the same line..LPSee below for a further description of patterns and actions..LPFinally, the user code section is simply copied to.B lex.yy.cverbatim.It is used for companion routines which call or are calledby the scanner.  The presence of this section is optional;if it is missing, the second.B %%in the input file may be skipped, too..LPIn the definitions and rules sections, any.I indentedtext or text enclosed in.B %{and.B %}is copied verbatim to the output (with the %{}'s removed).The %{}'s must appear unindented on lines by themselves..LPIn the rules section,any indented or %{} text appearing before thefirst rule may be used to declare variableswhich are local to the scanning routine and (after the declarations)code which is to be executed whenever the scanning routine is entered.Other indented or %{} text in the rule section is still copied to the output,but its meaning is not well-defined and it may well cause compile-timeerrors (this feature is present for.I POSIXcompliance; see below for other such features)..LPIn the definitions section, an unindented comment (i.e., a linebeginning with "/*") is also copied verbatim to the output upto the next "*/".  Also, any line in the definitions sectionbeginning with '#' is ignored, though this style of comment isdeprecated and may go away in the future..SH PATTERNSThe patterns in the input are written using an extended set of regularexpressions.  These are:.nf    x          match the character 'x'    .          any character except newline    [xyz]      a "character class"; in this case, the pattern                 matches either an 'x', a 'y', or a 'z'    [abj-oZ]   a "character class" with a range in it; matches                 an 'a', a 'b', any letter from 'j' through 'o',                 or a 'Z'    [^A-Z]     a "negated character class", i.e., any character                 but those in the class.  In this case, any                 character EXCEPT an uppercase letter.    [^A-Z\\n]   any character EXCEPT an uppercase letter or                 a newline    r*         zero or more r's, where r is any regular expression    r+         one or more r's    r?         zero or one r's (that is, "an optional r")    r{2,5}     anywhere from two to five r's    r{2,}      two or more r's    r{4}       exactly 4 r's    {name}     the expansion of the "name" definition               (see above)    "[xyz]\\"foo"               the literal string: [xyz]"foo    \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',                 then the ANSI-C interpretation of \\x.                 Otherwise, a literal 'X' (used to escape                 operators such as '*')    \\123       the character with octal value 123    \\x2a       the character with hexadecimal value 2a    (r)        match an r; parentheses are used to override                 precedence (see below)    rs         the regular expression r followed by the                 regular expression s; called "concatenation"    r|s        either an r or an s    r/s        an r but only if it is followed by an s.  The                 s is not part of the matched text.  This type                 of pattern is called as "trailing context".    ^r         an r, but only at the beginning of a line    r$         an r, but only at the end of a line.  Equivalent                 to "r/\\n".    <s>r       an r, but only in start condition s (see               below for discussion of start conditions)    <s1,s2,s3>r               same, but in any of start conditions s1,               s2, or s3    <<EOF>>    an end-of-file    <s1,s2><<EOF>>               an end-of-file when in start condition s1 or s2.fiThe regular expressions listed above are grouped according toprecedence, from highest precedence at the top to lowest at the bottom.Those grouped together have equal precedence.  For example,.nf    foo|bar*.fiis the same as.nf    (foo)|(ba(r*)).fisince the '*' operator has higher precedence than concatenation,and concatenation higher than alternation ('|').  This patterntherefore matches.I eitherthe string "foo".I orthe string "ba" followed by zero-or-more r's.To match "foo" or zero-or-more "bar"'s, use:.nf    foo|(bar)*.fiand to match zero-or-more "foo"'s-or-"bar"'s:.nf    (foo|bar)*.fi.LPSome notes on patterns:.IP -A negated character class such as the example "[^A-Z]"above.I will match a newlineunless "\\n" (or an equivalent escape sequence) is one of thecharacters explicitly present in the negated character class(e.g., "[^A-Z\\n]").  This is unlike how many other regularexpression tools treat negated character classes, but unfortunatelythe inconsistency is historically entrenched.Matching newlines means that a pattern like [^"]* can match an entireinput (overflowing the scanner's input buffer) unless there's anotherquote in the input..IP -A rule can have at most one instance of trailing context (the '/' operatoror the '$' operator).  The start condition, '^', and "<<EOF>>" patternscan only occur at the beginning of a pattern, and, as well as with '/' and '$',cannot be grouped inside parentheses.  A '^' which does not occur atthe beginning of a rule or a '$' which does not occur at the end ofa rule loses its special properties and is treated as a normal character..IPThe following are illegal:.nf    foo/bar$    <sc1>foo<sc2>bar.fiNote that the first of these, can be written "foo/bar\\n"..IPThe following will result in '$' or '^' being treated as a normal character:.nf    foo|(bar$)    foo|^bar.fiIf what's wanted is a "foo" or a bar-followed-by-a-newline, the followingcould be used (the special '|' action is explained below):.nf    foo      |    bar$     /* action goes here */.fiA similar trick will work for matching a foo or abar-at-the-beginning-of-a-line..SH HOW THE INPUT IS MATCHEDWhen the generated scanner is run, it analyzes its input lookingfor strings which match any of its patterns.  If it finds more thanone match, it takes the one matching the most text (for trailingcontext rules, this includes the length of the trailing part, eventhough it will then be returned to the input).  If it finds twoor more matches of the same length, therule listed first in the.I flexinput file is chosen..LPOnce the match is determined, the text corresponding to the match(called the.I token)is made available in the global character pointer.B yytext,and its length in the global integer.B yyleng.The.I actioncorresponding to the matched pattern is then executed (a moredetailed description of actions follows), and then the remaininginput is scanned for another match..LPIf no match is found, then the.I default ruleis executed: the next character in the input is considered matched andcopied to the standard output.  Thus, the simplest legal.I flexinput is:.nf    %%.fiwhich generates a scanner that simply copies its input (one characterat a time) to its output..SH ACTIONSEach pattern in a rule has a corresponding action, which can be anyarbitrary C statement.  The pattern ends at the first non-escapedwhitespace character; the remainder of the line is its action.  If theaction is empty, then when the pattern is matched the input tokenis simply discarded.  For example, here is the specification for a programwhich deletes all occurrences of "zap me" from its input:.nf    %%    "zap me".fi(It will copy all other characters in the input to the output sincethey will be matched by the default rule.).LPHere is a program which compresses multiple blanks and tabs down toa single blank, and throws away whitespace found at the end of a line:.nf    %%    [ \\t]+        putchar( ' ' );    [ \\t]+$       /* ignore this token */.fi.LPIf the action contains a '{', then the action spans till the balancing '}'is found, and the action may cross multiple lines..I flex knows about C strings and comments and won't be fooled by braces foundwithin them, but also allows actions to begin with.B %{and will consider the action to be all the text up to the next.B %}(regardless of ordinary braces inside the action)..LPAn action consisting solely of a vertical bar ('|') means "same asthe action for the next rule."  See below for an illustration..LPActions can include arbitrary C code, including.B returnstatements to return a value to whatever routine called.B yylex().Each time.B yylex()is called it continues processing tokens from where it last leftoff until it either reachesthe end of the file or executes a return.  Once it reaches an end-of-file,however, then any subsequent call to.B yylex()will simply immediately return, unless
flexdoc.1 - 源码说明

本页面展示了「生成C++的词法/语法分析的Flex语法分析器」中的 flexdoc.1 源码文件，采用 1 编程语言编写，共 2,448 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与Flex相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?