📄 grammar

📁 PL/0源码
💻
📖 第 1 页 / 共 2 页
字号:
12 下一页
-- This file is input for the preprocessor 'mungegrammar', which uses it-- to generate scangen.in, fmq.in, tokens.h, and actions.cc.-- Keep in mind that mungegrammar is a pre-processor, not a compiler: it-- may accept your input without complaining, yet pass invalid input to-- scangen and/or fmq.  You may therefore see error message from these-- latter tools.  You will probably need to look at scangen.in and fmq.in-- to really understand what's going on.  During the course of debugging-- you may need to modify scangen.in or fmq.in, e.g. to specify debugging-- options.  For information on the input syntax of scangen and fmq, see-- the documentation in /u/cs254/plzero/doc, or read the appendices to-- Fischer and LeBlanc's "Crafting a Compiler".  The main purpose of-- mungegrammar is to automate the creation of files containing redundant-- information, so you don't have to worry about keeping them consistent.-- This file consists of three sections, which begin with the reserved-- words CHARACTERS, TOKENS, and PRODUCTIONS.  Each of these reserved words-- must appear on a line by itself (modulo comments).  Each section begins-- with some comments that explain (among other things) any important ways-- in which the syntax expected by mungegrammar differs from that of the-- coresponding sections of the input expected by scangen and/or fmq.-- Comments (as you may have guessed) are as in Ada: they begin with-- a double hyphen and extend through end-of-line.CHARACTERS-- The following definitions should partition the ASCII character set:-- every ASCII character should appear, and it should appear in exactly-- one set.    Letter          = 'A'..'Z', 'a'..'z';    Digit           = '0'..'9';    Blank           = ' ', 9, 10;    NeChar          = '#';    LparenChar      = '(';    RparenChar      = ')';    TimesChar       = '*';    PlusChar        = '+';    CommaChar       = ',';    MinusChar       = '-';    PeriodChar      = '.';    SlashChar       = '/';    ColonChar       = ':';    SemicolonChar   = ';';    LtChar          = '<';    EqChar          = '=';    GtChar          = '>';    NonPrint        = 0..8, 11..31, 127;    Illegal         = '"', '\', '[', ']', '!', '%', '&', '''', '?', '@',                      '^', '_', '`', '{', '|', '}', '~', '$';TOKENS--  Every token has a name.  Names should consist of upper case letters--      (no underscores).  Mungegrammar uses names to create C++ #defines for--      major and minor token numbers.  For example, in the list of tokens--      below, the major and minor token numbers for PLUS will be MAJ_ADDOP--      and MIN_PLUS.  The major and minor token numbers for COMMA will be--      MAJ_COMMA and MIN_COMMA.--  Optionally a token may have an insert image.  This is what the syntax--      corrector will use in its error message when inserting the token.--      If no insert image is specified, the token name will be used.--      Insert images are double-quoted.--  For compatibility with the syntax of fmq, insert images should not contain--      the characters '<' or '#'.--  Some tokens may have several variants.  Variants play the same role in--      parsing, but must be distinguished during semantic analysis.  They--      share the same major token number, but have different minor token--      numbers.--  Two token names are special:  (1) You must name one token SPACE.--      This is the only token that the scanner will not pass on to the parser.--      In most languages, SPACE will include blanks, tabs, carriage--      returns, comments, and pragmas.  You may want to use a different--      variant for pragmas, so the scanner can identify them by minor--      token number and treat them specially.  (2) You must name one--      token IDENT.  This is the only token with "exceptions".--      It cannot have variants.--      You will notice that some of the token definitions below end with--      double-quoted character strings instead of regular expressions.--      These tokens are IDENT exceptions.  Other than these, no string--      is permitted to match the definition of more than one token.  It is--      acceptable, however, for two strings, one of which is a prefix of the--      other, to match the definitions of different tokens.  For example,--      "3" would match the definition of INT_LITERAL in Pascal, while--      "3.14" would match the definition of REAL_LITERAL.--  Syntax for the lines below is as follows:--  token_definition    ::= token_name insert_image_opt--                          insertion_cost "," deletion_cost--                          token_def_tail ";"--  insert_image_opt    ::= "\"" character_string "\""--                      ::=--  token_def_tail      ::= "=" string_or_reg_exp--                      ::= ":" token_variants--  token_variants      ::= token_name "=" string_or_reg_exp more_variants--  more_variants       ::= "," token_variants--                      ::=--  string_or_reg_exp   ::= "\"" character_string "\""--                      ::= regular_expression-- This syntax is not entirely "free-format": the semicolon that terminates-- a token definition must be the last thing (other than a comment or white-- space) on its line.  Moreover, the comma that separates token variants-- must be the last thing (other than a comment or white space) on its line,-- and a comma within a regular expression must NOT be the last thing-- (other than a comment or white space) on its line.    SPACE               100,100 = (Blank{TOSS})+;    IDENT               9,15    = (Letter).(Letter,Digit)*;    NUMBER              6,15    = Digit+;    ADDOP       "+"     4,4     :                PLUS            = PlusChar{TOSS},                MINUS           = MinusChar{TOSS};    MULOP       "*"     5,4     :               -- ADDOP is easier to insert                TIMES           = TimesChar{TOSS},                SLASH           = SlashChar{TOSS};    ODD                 4,4     = "ODD";    EQ          "="     4,5     = EqChar{TOSS};    RELOP       ">"     5,4     :               -- EQ is easier to insert                NE              = NeChar{TOSS},                LT              = LtChar{TOSS},                LE              = LtChar{TOSS}.EqChar{TOSS},                GT              = GtChar{TOSS},                GE              = GtChar{TOSS}.EqChar{TOSS};    LPAREN      "("     10,20   = LparenChar{TOSS};    RPAREN      ")"     4,2     = RparenChar{TOSS};    COMMA       ","     4,8     = CommaChar{TOSS};    SEMICOLON   ";"     3,8     = SemicolonChar{TOSS};    PERIOD      "."     7,8     = PeriodChar{TOSS};    BECOMES     ":="    5,9     = ColonChar{TOSS}.EqChar{TOSS};    BEGIN               10,20   = "BEGIN";    END                 8,15    = "END";    IF                  10,20   = "IF";    THEN                8,20    = "THEN";    WHILE               10,20   = "WHILE";    DO                  8,10    = "DO";    CALL                12,20   = "CALL";    CONST               10,20   = "CONST";    VAR                 10,20   = "VAR";    PROCEDURE           15,25   = "PROCEDURE";PRODUCTIONS-- Unlike fmq, mungegrammar allows you to embed actual C++ code for action-- routines inside right-hand sides.  Code fragments are delimited with-- "[[" and "]]".  Code fragments may not contain "]]", even in comments-- and strings.  Temporary variables for use in action routines are-- declared in case.pre.  Within action routines, "lhs" is a pointer to the-- semantic record for the symbol on the left-hand-side of the production;-- "rhs" is a pointer to an array of semantic records for the symbols on the-- right-hand-side of the production.  The action routines here serve-- *only* to create a syntax tree; they do *not* do any significant-- semantic analysis; that is reserved for a later phase of the compiler.-- Note that comments inside code fragments are C++ comments, *not*-- Ada-style mungegrammar comments.-- The productions are not "free-format": the left-hand-side of each-- production must start in column one, and nothing else (except a-- comment) may start in column one.  Non-terminal names should consist-- of lower-case letters and underscores.  Terminals should be specified-- by the insert images from the TOKENS section above.program         ::= block .                    [[                        ast_root = rhs[0].node;                        if (ast_root == NULL)                            issue_warning(rhs[1].token.get_location(),                                "Empty program!?");                    ]]block           ::= const_part_opt var_part_opt proc_part_opt statement                    [[                        if (rhs[3].node != NULL) {                            l = rhs[3].node->get_location();                        }                        p = NULL;                        if (rhs[2].node != NULL) {                            p = rhs[2].node;                            l = p->get_location();                        }                        if (rhs[1].node != NULL) {                            st_declaration_t *decl =                                dynamic_cast<st_declaration_t *>(rhs[1].node);                            decl->tail()->set_next(p);                            p = rhs[1].node;                            l = p->get_location();                        }                        if (rhs[0].node != NULL) {                            st_declaration_t *decl =                                dynamic_cast<st_declaration_t *>(rhs[0].node);                            decl->tail()->set_next(p);                            p = rhs[0].node;                            l = p->get_location();                        }                        if (p == NULL && rhs[3].node == NULL) {                            lhs->node = NULL;                        } else {                            lhs->node = new st_block_t(l, p, rhs[3].node);                        }                    ]]const_part_opt  ::= CONST equate_list ;                    [[                        *lhs = rhs[1];                    ]]                ::= [[                        lhs->node = NULL;                    ]]equate_list     ::= IDENT = NUMBER more_equates                    [[                        lhs->node = new st_constant_t(                            rhs[0].token.get_location(),                            new st_identifier_t(rhs[0].token),                            new st_number_t(rhs[2].token), rhs[3].node);                        // set tail of list                        if (rhs[3].node == NULL) {                            st_declaration_t *decl =
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -