📄 grammar
字号:
-- This file is input for the preprocessor 'mungegrammar', which uses it-- to generate scangen.in, fmq.in, tokens.h, and actions.cc.-- Keep in mind that mungegrammar is a pre-processor, not a compiler: it-- may accept your input without complaining, yet pass invalid input to-- scangen and/or fmq. You may therefore see error message from these-- latter tools. You will probably need to look at scangen.in and fmq.in-- to really understand what's going on. During the course of debugging-- you may need to modify scangen.in or fmq.in, e.g. to specify debugging-- options. For information on the input syntax of scangen and fmq, see-- the documentation in /u/cs254/plzero/doc, or read the appendices to-- Fischer and LeBlanc's "Crafting a Compiler". The main purpose of-- mungegrammar is to automate the creation of files containing redundant-- information, so you don't have to worry about keeping them consistent.-- This file consists of three sections, which begin with the reserved-- words CHARACTERS, TOKENS, and PRODUCTIONS. Each of these reserved words-- must appear on a line by itself (modulo comments). Each section begins-- with some comments that explain (among other things) any important ways-- in which the syntax expected by mungegrammar differs from that of the-- coresponding sections of the input expected by scangen and/or fmq.-- Comments (as you may have guessed) are as in Ada: they begin with-- a double hyphen and extend through end-of-line.CHARACTERS-- The following definitions should partition the ASCII character set:-- every ASCII character should appear, and it should appear in exactly-- one set. Letter = 'A'..'Z', 'a'..'z'; Digit = '0'..'9'; Blank = ' ', 9, 10; NeChar = '#'; LparenChar = '('; RparenChar = ')'; TimesChar = '*'; PlusChar = '+'; CommaChar = ','; MinusChar = '-'; PeriodChar = '.'; SlashChar = '/'; ColonChar = ':'; SemicolonChar = ';'; LtChar = '<'; EqChar = '='; GtChar = '>'; NonPrint = 0..8, 11..31, 127; Illegal = '"', '\', '[', ']', '!', '%', '&', '''', '?', '@', '^', '_', '`', '{', '|', '}', '~', '$';TOKENS-- Every token has a name. Names should consist of upper case letters-- (no underscores). Mungegrammar uses names to create C++ #defines for-- major and minor token numbers. For example, in the list of tokens-- below, the major and minor token numbers for PLUS will be MAJ_ADDOP-- and MIN_PLUS. The major and minor token numbers for COMMA will be-- MAJ_COMMA and MIN_COMMA.-- Optionally a token may have an insert image. This is what the syntax-- corrector will use in its error message when inserting the token.-- If no insert image is specified, the token name will be used.-- Insert images are double-quoted.-- For compatibility with the syntax of fmq, insert images should not contain-- the characters '<' or '#'.-- Some tokens may have several variants. Variants play the same role in-- parsing, but must be distinguished during semantic analysis. They-- share the same major token number, but have different minor token-- numbers.-- Two token names are special: (1) You must name one token SPACE.-- This is the only token that the scanner will not pass on to the parser.-- In most languages, SPACE will include blanks, tabs, carriage-- returns, comments, and pragmas. You may want to use a different-- variant for pragmas, so the scanner can identify them by minor-- token number and treat them specially. (2) You must name one-- token IDENT. This is the only token with "exceptions".-- It cannot have variants.-- You will notice that some of the token definitions below end with-- double-quoted character strings instead of regular expressions.-- These tokens are IDENT exceptions. Other than these, no string-- is permitted to match the definition of more than one token. It is-- acceptable, however, for two strings, one of which is a prefix of the-- other, to match the definitions of different tokens. For example,-- "3" would match the definition of INT_LITERAL in Pascal, while-- "3.14" would match the definition of REAL_LITERAL.-- Syntax for the lines below is as follows:-- token_definition ::= token_name insert_image_opt-- insertion_cost "," deletion_cost-- token_def_tail ";"-- insert_image_opt ::= "\"" character_string "\""-- ::=-- token_def_tail ::= "=" string_or_reg_exp-- ::= ":" token_variants-- token_variants ::= token_name "=" string_or_reg_exp more_variants-- more_variants ::= "," token_variants-- ::=-- string_or_reg_exp ::= "\"" character_string "\""-- ::= regular_expression-- This syntax is not entirely "free-format": the semicolon that terminates-- a token definition must be the last thing (other than a comment or white-- space) on its line. Moreover, the comma that separates token variants-- must be the last thing (other than a comment or white space) on its line,-- and a comma within a regular expression must NOT be the last thing-- (other than a comment or white space) on its line. SPACE 100,100 = (Blank{TOSS})+; IDENT 9,15 = (Letter).(Letter,Digit)*; NUMBER 6,15 = Digit+; ADDOP "+" 4,4 : PLUS = PlusChar{TOSS}, MINUS = MinusChar{TOSS}; MULOP "*" 5,4 : -- ADDOP is easier to insert TIMES = TimesChar{TOSS}, SLASH = SlashChar{TOSS}; ODD 4,4 = "ODD"; EQ "=" 4,5 = EqChar{TOSS}; RELOP ">" 5,4 : -- EQ is easier to insert NE = NeChar{TOSS}, LT = LtChar{TOSS}, LE = LtChar{TOSS}.EqChar{TOSS}, GT = GtChar{TOSS}, GE = GtChar{TOSS}.EqChar{TOSS}; LPAREN "(" 10,20 = LparenChar{TOSS}; RPAREN ")" 4,2 = RparenChar{TOSS}; COMMA "," 4,8 = CommaChar{TOSS}; SEMICOLON ";" 3,8 = SemicolonChar{TOSS}; PERIOD "." 7,8 = PeriodChar{TOSS}; BECOMES ":=" 5,9 = ColonChar{TOSS}.EqChar{TOSS}; BEGIN 10,20 = "BEGIN"; END 8,15 = "END"; IF 10,20 = "IF"; THEN 8,20 = "THEN"; WHILE 10,20 = "WHILE"; DO 8,10 = "DO"; CALL 12,20 = "CALL"; CONST 10,20 = "CONST"; VAR 10,20 = "VAR"; PROCEDURE 15,25 = "PROCEDURE";PRODUCTIONS-- Unlike fmq, mungegrammar allows you to embed actual C++ code for action-- routines inside right-hand sides. Code fragments are delimited with-- "[[" and "]]". Code fragments may not contain "]]", even in comments-- and strings. Temporary variables for use in action routines are-- declared in case.pre. Within action routines, "lhs" is a pointer to the-- semantic record for the symbol on the left-hand-side of the production;-- "rhs" is a pointer to an array of semantic records for the symbols on the-- right-hand-side of the production. The action routines here serve-- *only* to create a syntax tree; they do *not* do any significant-- semantic analysis; that is reserved for a later phase of the compiler.-- Note that comments inside code fragments are C++ comments, *not*-- Ada-style mungegrammar comments.-- The productions are not "free-format": the left-hand-side of each-- production must start in column one, and nothing else (except a-- comment) may start in column one. Non-terminal names should consist-- of lower-case letters and underscores. Terminals should be specified-- by the insert images from the TOKENS section above.program ::= block . [[ ast_root = rhs[0].node; if (ast_root == NULL) issue_warning(rhs[1].token.get_location(), "Empty program!?"); ]]block ::= const_part_opt var_part_opt proc_part_opt statement [[ if (rhs[3].node != NULL) { l = rhs[3].node->get_location(); } p = NULL; if (rhs[2].node != NULL) { p = rhs[2].node; l = p->get_location(); } if (rhs[1].node != NULL) { st_declaration_t *decl = dynamic_cast<st_declaration_t *>(rhs[1].node); decl->tail()->set_next(p); p = rhs[1].node; l = p->get_location(); } if (rhs[0].node != NULL) { st_declaration_t *decl = dynamic_cast<st_declaration_t *>(rhs[0].node); decl->tail()->set_next(p); p = rhs[0].node; l = p->get_location(); } if (p == NULL && rhs[3].node == NULL) { lhs->node = NULL; } else { lhs->node = new st_block_t(l, p, rhs[3].node); } ]]const_part_opt ::= CONST equate_list ; [[ *lhs = rhs[1]; ]] ::= [[ lhs->node = NULL; ]]equate_list ::= IDENT = NUMBER more_equates [[ lhs->node = new st_constant_t( rhs[0].token.get_location(), new st_identifier_t(rhs[0].token), new st_number_t(rhs[2].token), rhs[3].node); // set tail of list if (rhs[3].node == NULL) { st_declaration_t *decl =
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -