📄 cobol.pars
字号:
/* COBOL grammar ============= conforming to: ANSI'74 Standard (ANSI X3.23 - 1974) ANSI'85 Standard (ANSI X3.23 - 1985) IBM OS/VS COBOL IBM VS COBOL II IBM SAA COBOL/370 IBM DOSVS COBOL X/Open Micro Focus COBOL*//* Ich, Doktor Josef Grosch, Informatiker, March 1997 *//*Conventions:The suffix _l stands for list.The suffix _e stands for list element.The suffix _o stands for optional.The suffix _i stands for imperative statement.For some nonterminals such as name, qualification, subscription, and identifiervarious kinds of usage are distinguished:No suffix stands for read access.The suffix _w stands for write access.The suffix _f stands for forward reference.The suffix _c stands for reference in CORRESPONDING context.The suffix _n stands for none of the above.The character - is replaced by _ in nonterminals.Notation for words: keywords : all uppercase : END optional words : first uppercase, else lowercase : Is terminals : all lowercase : real nonterminals : all lowercase : identifierTerminals with : unsigned_integer, plus_integer, minus_integer, level_number, attributes are : real, string, name, paragraph_name, pseudo_text, picture_string, illegal_character*//* Discussion of the LR-Conflicts ------------------------------The grammars for COBOL '85 and Micro Focus COBOL in their published forms arehighly ambiguous and they are not LR(k) for any k. The grammars in their originalversion contain dozens of LR conflicts. However, the situation is not as bad asit might seem because:- The verbose syntax rules of the language specify how to resolve some conflicts.- Many rules can be rewritten into an LR(1) form.- A few rules are LR(2), they require lookahead of 2 tokens.- Some rules could be written in an LR(1) form, but the natural version that reflects the semantic structure is LR(2).Theoretically, the grammar for Cobol following below is still not LR(k) for any k.The shift-reduce conflicts that require a lookahead of arbitrary length can beresolved in favor of the shift action according to the verbose syntax rules.Therefore, the grammar is practically LR(2). In order to process the grammar withan LALR(1) tool, a buffer could be inserted between scanner and parser. This bufferimplements a lookahead of 2 tokens by modifying some tokens if these are followedby certain tokens. With this mechanism the grammar is actually LALR(1).The parser generator Lark automatically provides the mentioned buffer and evensupports lookahead of an unlimited number of tokens. In the followingI will discuss the interesting conflicts present in COBOL '85 in detail.1. file_control_entry SELECT f ACCESS MODE SEQUENTIAL . SELECT f ACCESS MODE SEQUENTIAL RELATIVE KEY IS n . SELECT f ACCESS MODE RANDOM . SELECT f ACCESS MODE RANDOM RELATIVE KEY IS n . SELECT f ACCESS MODE DYNAMIC . SELECT f ACCESS MODE DYNAMIC RELATIVE KEY IS n . Does RELATIVE start the KEY phrase of the ACCESS MODE clause or does it start the ORGANIZATION IS clause? This conflict requires a lookahead of 2. It could be handled with a lookahead of 1 by adding the following rules: select_clause = ACCESS Mode Is SEQUENTIAL RELATIVE . select_clause = ACCESS Mode Is RANDOM RELATIVE . select_clause = ACCESS Mode Is DYNAMIC RELATIVE . These rules are combinations of ACCESS MODE clauses and ORGANIZATION IS clauses. They recognize the given combinations of two clauses with one rule. Using Lark, syntactic predicates that trigger trial parsing can be added in order to solve the shift reduce conflicts: select_clause = ACCESS Mode Is SEQUENTIAL ? - RELATIVE_Key_Is_name . select_clause = ACCESS Mode Is RANDOM ? - RELATIVE_Key_Is_name . select_clause = ACCESS Mode Is DYNAMIC ? - RELATIVE_Key_Is_name . RELATIVE_Key_Is_name = RELATIVE Key Is name . The nonterminal RELATIVE_Key_Is_name checks whether RELATIVE starts the KEY phrase.2. report_group_description_entry 01 LINE NUMBER 50 . 01 LINE NUMBER 50 NEXT PAGE . Does NEXT start the NEXT PAGE phrase of the LINE NUMBER clause or does it start a NEXT GROUP clause? This conflict requires a lookahead of 2. It could be handled with a lookahead of 1 by adding the following rules: report_group_clause = LINE Number Is integer NEXT GROUP Is integer . report_group_clause = LINE Number Is integer NEXT GROUP Is PLUS integer . report_group_clause = LINE Number Is integer NEXT GROUP Is NEXT PAGE . These rules are combinations of LINE NUMBER clauses and NEXT GROUP clauses. They recognize the given combinations of two clauses with one rule. Using Lark, the shift reduce conflict can be solved by considering a lookahead of 2 tokens: report_group_clause = LINE Number Is integer ? { GetLookahead (2) == YYCODE (GROUP) } .3. Scope delimiters and optional error phrases ADD a TO b SIZE ERROR ADD c TO d END-ADD ADD a TO b SIZE ERROR ADD c TO d NOT SIZE ERROR STOP RUN Are END-ADD or the NOT SIZE ERROR phrase associated with the outer or the inner ADD statement? This part of the grammar is ambiguous. The verbose syntax rules specify that both phrases are to be associated with the inner ADD statement. This corresponds to taking a shift action instead of a reduce action. This is also the usual method for parser generators to solve this type of conflict. This problem arises with all variants of ADD statements such as ADD TO, ADD TO GIVING, and ADD CORRESPONDING as well as for many more imperative statements with optional NOT phrases and optional scope delimiters: ADD CORRESPONDING, ADD TO, ADD TO GIVING, CALL, COMPUTE, DELETE, DIVIDE BY GIVING, DIVIDE BY GIVING REMAINDER, DIVIDE INTO, DIVIDE INTO GIVING, DIVIDE INTO GIVING REMAINDER, MULTIPLY BY, MULTIPLY BY GIVING, READ, READ KEY, READ NEXT, RECEIVE, REWRITE, START, STRING, SUBTRACT CORRESPONDING FROM, SUBTRACT FROM, SUBTRACT FROM GIVING, UNSTRING, WRITE, WRITE WITH NO ADVANCING4. RECEIVE WITH DATA RECEIVE n MESSAGE INTO i NO DATA CLOSE f WITH DATA STOP RUN RECEIVE n MESSAGE INTO i NO DATA CLOSE f WITH LOCK Does WITH start the WITH DATA phrase of the RECEIVE statement or does it start the WITH LOCK phrase of the CLOSE statement? This conflict requires a lookahead of 2. This problem arises in combination of the RECEIVE statement and all statements that have an optional phrase starting with WITH. These are: CLOSE / WITH NO REWIND / WITH LOCK DISABLE / WITH KEY DISPLAY / WITH NO ADVANCING ENABLE / WITH KEY OPEN / WITH NO REWIND PERFORM / WITH TEST BEFORE / WITH TEST AFTER STRING / WITH POINTER UNSTRING / WITH POINTER WRITE / WITH NO ADVANCING Using Lark, these shift reduce conflicts can be solved by adding an inspection of 2 lookahead tokens to numerous rules such as e. g.: perform = PERFORM procedure ? { GetLookahead (2) == YYCODE (DATA) } .5. PERFORM UNTIL NOT ADD a TO b SIZE ERROR PERFORM p UNTIL i NOT NUMERIC " e NOT ZERO " e NOT EQUAL TO f ADD a TO b SIZE ERROR PERFORM p UNTIL i NOT On SIZE ERROR " i NOT INVALID KEY " i NOT On OVERFLOW " i NOT At END " i NOT At END-OF-PAGE " i NOT On EXCEPTION Does the NOT continue the condition after UNTIL or does it start a NOT ERROR or a similar NOT phrase of a containing statement? This conflict requires a lookahead of 2. In combination of PERFORM and ADD it is the SIZE ERROR phrase that causes the problem. For the other phrases, combinations of PERFORM with READ and WRITE or similar statements cause the trouble. Using Lark, this conflict can be solved by adding a syntactic predicate: Is = ? not . not = < = NOT classification . = NOT sign_3 . = NOT EQUAL . = NOT LESS . = NOT GREATER . = NOT '=' . = NOT '<' . = NOT '>' . = NOT '(' . > .6. INSPECT TALLYING INSPECT a TALLYING i FOR ALL u v w j FOR ALL x y z Does the identifier j continue the list of identifiers after the first ALL or does it start a new FOR phrase? This problem could be formulated with a lookahead of 1 but the natural version that reflects the semantic structure requires a lookahead of 2. LR(2) Version: inspect = INSPECT identifier TALLYING tallying_l . tallying_l = < = tallying_e . = tallying_l tallying_e . > . tallying_e = identifier 'FOR' for_l . LR(1) Version: inspect = INSPECT identifier TALLYING identifier tallying_l . tallying_l = < = tallying_e . = tallying_l tallying_e . > . tallying_e = < = 'FOR' for_l . = 'FOR' for_l identifier . > . Using Lark, this conflict can be solved by adding syntactic predicates: for_e = ALL all_leading_l ? identifier_FOR . for_e = LEADING all_leading_l ? identifier_FOR . identifier_FOR = identifier 'FOR' . The nonterminal identifier_FOR checks whether identifier starts a new FOR phrase.7. Sections and Paragraphs A SECTION. a. CONTINUE. b. CONTINUE. B SECTION. c. CONTINUE. Does the name B start a new section or a new paragraph? As before this problem could be formulated with a lookahead of 1 but the natural version that reflects the semantic structure requires a lookahead of 2. LR(2) Version:procedure_division = < = PROCEDURE DIVISION using_o '.' declaratives section_l . = PROCEDURE DIVISION using_o '.' section_l . = PROCEDURE DIVISION using_o '.' paragraph_l .> .declaratives = DECLARATIVES '.' d_section_l 'END' DECLARATIVES '.' .d_section_l = < = section_head use '.' paragraph_l ? { GetLookahead (2) == YYCODE (SECTION) } . = d_section_l section_head use '.' paragraph_l ? { GetLookahead (2) == YYCODE (SECTION) } .> .section_l = < = section_head paragraph_l ? { GetLookahead (2) == YYCODE (SECTION) } . = section_head paragraph_l section_l .> .section_head = name SECTION segment_number_o '.'paragraph_l = < = . = paragraph_l paragraph_e .> .paragraph_e = name '.' sentence_l . LR(1) Version:procedure_division = < = PROCEDURE DIVISION using_o '.' declaratives name section_l . = PROCEDURE DIVISION using_o '.' name section_l . = PROCEDURE DIVISION using_o '.' name paragraph_l .> .declaratives = DECLARATIVES '.' d_section_l 'END' DECLARATIVES '.' .d_section_l = < = SECTION segment_number_o '.' use '.' name paragraph_l . = SECTION segment_number_o '.' use '.' . = SECTION segment_number_o '.' use '.' name paragraph_l d_section_l . = SECTION segment_number_o '.' use '.' name d_section_l .> .section_l = < = SECTION segment_number_o '.' name paragraph_l . = SECTION segment_number_o '.' . = SECTION segment_number_o '.' name paragraph_l section_l . = SECTION segment_number_o '.' name section_l .> .paragraph_l = < = paragraph_e . = paragraph_l paragraph_e .> .paragraph_e = < = '.' sentence_l name . = '.' sentence_l .> . Using Lark, the shift reduce conflicts in the LR(2) version can be solved by adding an inspection of 2 lookahead tokens to some rules as shown above.8. Identifiers: Subscription or Modification? n (i + 1) n (i + 1 :) Does the character '+' continue the index of a subscription or does it continue the expression of a modification? This conflict requires lookahead of arbitrary length. It can be solved by allowing a full expression for the first index of a subscription. The check for the restricted form of the index is delegated to semantic analysis. identifier = < = qualification . = qualification '(' expression index_l ')' . = qualification '(' expression ':' ')' . ... > .9. LINAGE clause of file_description_entry In its natural form the LINAGE clause of the file_description_entry is LR(2). It can be rewritten into an LR(1) form as can be seen below.10. SUM clause of report_group_description SUM a SUM b The SUM clause in a report_group_description_entry can be repeated several times. All the clauses in a report_group_description_entry can be given in arbitrary order. Therefore we allow a list of clauses and rely on semantic analysis to detect multiple appearances of clauses. Now, if the SUM clause would be specified as a list, too, it is ambiguous whether SUM b continues the outer list of all clauses or the inner list of SUM clauses. The solution omits the inner list, uses the outer list for the repetition of SUM clauses, too, and delegates the detailed checks to semantic analysis.*/PARSERGLOBAL {# include <ctype.h># include "Position.h"# include "StringM.h"# include "Idents.h"# include "keywdef.h"# include "keywords.h"# include "def.h"# include "deftab.h"# include "paf.h"# define yyInitStackSize 200# define yyInitBufferSize 32# define TOKENOP PrevEPos = CurrentEPos; CurrentEPos = Attribute.name.EPos;# define BEFORE_TRIAL tPosition SavePEPos, SaveCEPos; SavePEPos = PrevEPos; SaveCEPos = CurrentEPos;# define AFTER_TRIAL PrevEPos = SavePEPos; CurrentEPos = SaveCEPos;extern rbool Copy ARGS ((tIdent ident, tPosition pos));static tPosition PrevEPos, CurrentEPos;static tIdent iCURRENT_DATE ;static tIdent iWHEN_COMPILED ;}BEGIN { iCURRENT_DATE = MakeIdent ("CURRENT-DATE" , 12); iWHEN_COMPILED = MakeIdent ("WHEN-COMPILED" , 13);}START programs descriptionsPROPERTY INPUTRULEprograms = program_l program_end_o .program_end_o = < = program . = program program_l 'END' PROGRAM name '.' .> .program_end = program program_l 'END' PROGRAM name '.' .program_l = < = . = program_l program_end .> .program = identification_division environment_division_o data_division_o procedure_division_o .sce = { => { Start_Comment_Entry (); }; } .identification_division = identification_o program_id_o identification_l .environment_division_o = < = environment_division . /* = . */> .data_division_o = < = data_division . /* = . */> .procedure_division_o = < = procedure_division . = .> .identification_o = < = IDENTIFICATION DIVISION '.' { => { Section = cID_DIV; }; } . = { => { Section = cID_DIV; }; } .> .program_id_o = < = 'PROGRAM-ID' period_o sce name program_o { => { (void) DeclareLabel (name:Scan, lPROGRAM, PrevEPos); }; } . = 'PROGRAM-ID' period_o sce string program_o { => { char word [128]; StGetString (string:Value, word); string:Scan.name.Ident = MakeIdent (word, strlen (word)); (void) DeclareLabel (string:Scan, lPROGRAM, PrevEPos); }; } . = .
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -