📄 grammar.c
字号:
/* * Mesa 3-D graphics library * Version: 6.6 * * Copyright (C) 1999-2006 Brian Paul All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. *//** * \file grammar.c * syntax parsing engine * \author Michal Krol */#ifndef GRAMMAR_PORT_BUILD#error Do not build this file directly, build your grammar_XXX.c instead, which includes this file#endif/**//* INTRODUCTION ------------ The task is to check the syntax of an input string. Input string is a stream of ASCII characters terminated with a null-character ('\0'). Checking it using C language is difficult and hard to implement without bugs. It is hard to maintain and make changes when the syntax changes. This is because of a high redundancy of the C code. Large blocks of code are duplicated with only small changes. Even use of macros does not solve the problem because macros cannot erase the complexity of the problem. The resolution is to create a new language that will be highly oriented to our task. Once we describe a particular syntax, we are done. We can then focus on the code that implements the language. The size and complexity of it is relatively small than the code that directly checks the syntax. First, we must implement our new language. Here, the language is implemented in C, but it could also be implemented in any other language. The code is listed below. We must take a good care that it is bug free. This is simple because the code is simple and clean. Next, we must describe the syntax of our new language in itself. Once created and checked manually that it is correct, we can use it to check another scripts. Note that our new language loading code does not have to check the syntax. It is because we assume that the script describing itself is correct, and other scripts can be syntactically checked by the former script. The loading code must only do semantic checking which leads us to simple resolving references. THE LANGUAGE ------------ Here I will describe the syntax of the new language (further called "Synek"). It is mainly a sequence of declarations terminated by a semicolon. The declaration consists of a symbol, which is an identifier, and its definition. A definition is in turn a sequence of specifiers connected with ".and" or ".or" operator. These operators cannot be mixed together in a one definition. Specifier can be a symbol, string, character, character range or a special keyword ".true" or ".false". On the very beginning of the script there is a declaration of a root symbol and is in the form: .syntax <root_symbol>; The <root_symbol> must be on of the symbols in declaration sequence. The syntax is correct if the root symbol evaluates to true. A symbol evaluates to true if the definition associated with the symbol evaluates to true. Definition evaluation depends on the operator used to connect specifiers in the definition. If ".and" operator is used, definition evaluates to true if and only if all the specifiers evaluate to true. If ".or" operator is used, definition evalutes to true if any of the specifiers evaluates to true. If definition contains only one specifier, it is evaluated as if it was connected with ".true" keyword by ".and" operator. If specifier is a ".true" keyword, it always evaluates to true. If specifier is a ".false" keyword, it always evaluates to false. Specifier evaluates to false when it does not evaluate to true. Character range specifier is in the form: '<first_character>' - '<second_character>' If specifier is a character range, it evaluates to true if character in the stream is greater or equal to <first_character> and less or equal to <second_character>. In that situation the stream pointer is advanced to point to next character in the stream. All C-style escape sequences are supported although trigraph sequences are not. The comparisions are performed on 8-bit unsigned integers. Character specifier is in the form: '<single_character>' It evaluates to true if the following character range specifier evaluates to true: '<single_character>' - '<single_character>' String specifier is in the form: "<string>" Let N be the number of characters in <string>. Let <string>[i] designate i-th character in <string>. Then the string specifier evaluates to true if and only if for i in the range [0, N) the following character specifier evaluates to true: '<string>[i]' If <string>[i] is a quotation mark, '<string>[i]' is replaced with '\<string>[i]'. Symbol specifier can be optionally preceded by a ".loop" keyword in the form: .loop <symbol> (1) where <symbol> is defined as follows: <symbol> <definition>; (2) Construction (1) is replaced by the following code: <symbol$1> and declaration (2) is replaced by the following: <symbol$1> <symbol$2> .or .true; <symbol$2> <symbol> .and <symbol$1>; <symbol> <definition>; Synek supports also a register mechanizm. User can, in its SYN file, declare a number of registers that can be accessed in the syn body. Each reg has its name and a default value. The register is one byte wide. The C code can change the default value by calling grammar_set_reg8() with grammar id, register name and a new value. As we know, each rule is a sequence of specifiers joined with .and or .or operator. And now each specifier can be prefixed with a condition expression in a form ".if (<reg_name> <operator> <hex_literal>)" where <operator> can be == or !=. If the condition evaluates to false, the specifier evaluates to .false. Otherwise it evalutes to the specifier. ESCAPE SEQUENCES ---------------- Synek supports all escape sequences in character specifiers. The mapping table is listed below. All occurences of the characters in the first column are replaced with the corresponding character in the second column. Escape sequence Represents ------------------------------------------------------------------------------------------------ \a Bell (alert) \b Backspace \f Formfeed \n New line \r Carriage return \t Horizontal tab \v Vertical tab \' Single quotation mark \" Double quotation mark \\ Backslash \? Literal question mark \ooo ASCII character in octal notation \xhhh ASCII character in hexadecimal notation ------------------------------------------------------------------------------------------------ RAISING ERRORS -------------- Any specifier can be followed by a special construction that is executed when the specifier evaluates to false. The construction is in the form: .error <ERROR_TEXT> <ERROR_TEXT> is an identifier declared earlier by error text declaration. The declaration is in the form: .errtext <ERROR_TEXT> "<error_desc>" When specifier evaluates to false and this construction is present, parsing is stopped immediately and <error_desc> is returned as a result of parsing. The error position is also returned and it is meant as an offset from the beggining of the stream to the character that was valid so far. Example: (**** syntax script ****) .syntax program; .errtext MISSING_SEMICOLON "missing ';'" program declaration .and .loop space .and ';' .error MISSING_SEMICOLON .and .loop space .and '\0'; declaration "declare" .and .loop space .and identifier; space ' '; (**** sample code ****) declare foo , In the example above checking the sample code will result in error message "missing ';'" and error position 12. The sample code is not correct. Note the presence of '\0' specifier to assure that there is no code after semicolon - only spaces. <error_desc> can optionally contain identifier surrounded by dollar signs $. In such a case, the identifier and dollar signs are replaced by a string retrieved by invoking symbol with the identifier name. The starting position is the error position. The lenght of the resulting string is the position after invoking the symbol. PRODUCTION ---------- Synek not only checks the syntax but it can also produce (emit) bytes associated with specifiers that evaluate to true. That is, every specifier and optional error construction can be followed by a number of emit constructions that are in the form: .emit <parameter> <paramater> can be a HEX number, identifier, a star * or a dollar $. HEX number is preceded by 0x or 0X. If <parameter> is an identifier, it must be earlier declared by emit code declaration in the form: .emtcode <identifier> <hex_number> When given specifier evaluates to true, all emits associated with the specifier are output in order they were declared. A star means that last-read character should be output instead of constant value. Example: (**** syntax script ****) .syntax foobar; .emtcode WORD_FOO 0x01 .emtcode WORD_BAR 0x02 foobar FOO .emit WORD_FOO .or BAR .emit WORD_BAR .or .true .emit 0x00; FOO "foo" .and SPACE; BAR "bar" .and SPACE; SPACE ' ' .or '\0'; (**** sample text 1 ****) foo (**** sample text 2 ****) foobar For both samples the result will be one-element array. For first sample text it will be value 1, for second - 0. Note that every text will be accepted because of presence of .true as an alternative. Another example: (**** syntax script ****) .syntax declaration; .emtcode VARIABLE 0x01 declaration "declare" .and .loop space .and identifier .emit VARIABLE .and (1) .true .emit 0x00 .and (2) .loop space .and ';'; space ' ' .or '\t'; identifier .loop id_char .emit *; (3) id_char 'a'-'z' .or 'A'-'Z' .or '_'; (**** sample code ****) declare fubar; In specifier (1) symbol <identifier> is followed by .emit VARIABLE. If it evaluates to true, VARIABLE constant and then production of the symbol is output. Specifier (2) is used to terminate the string with null to signal when the string ends. Specifier (3) outputs all characters that make declared identifier. The result of sample code will be the following array: { 1, 'f', 'u', 'b', 'a', 'r', 0 } If .emit is followed by dollar $, it means that current position should be output. Current position is a 32-bit unsigned integer distance from the very beginning of the parsed string to first character consumed by the specifier associated with the .emit instruction. Current position is stored in the output buffer in Little-Endian convention (the lowest byte comes first).*/#include <stdio.h>static void mem_free (void **);/* internal error messages*/static const byte *OUT_OF_MEMORY = (byte *) "internal error 1001: out of physical memory";static const byte *UNRESOLVED_REFERENCE = (byte *) "internal error 1002: unresolved reference '$'";static const byte *INVALID_GRAMMAR_ID = (byte *) "internal error 1003: invalid grammar object";static const byte *INVALID_REGISTER_NAME = (byte *) "internal error 1004: invalid register name: '$'";/*static const byte *DUPLICATE_IDENTIFIER = (byte *) "internal error 1005: identifier '$' already defined";*/static const byte *UNREFERENCED_IDENTIFIER =(byte *) "internal error 1006: unreferenced identifier '$'";static const byte *error_message = NULL; /* points to one of the error messages above */static byte *error_param = NULL; /* this is inserted into error_message in place of $ */static int error_position = -1;static byte *unknown = (byte *) "???";static void clear_last_error (void){ /* reset error message */ error_message = NULL; /* free error parameter - if error_param is a "???" don't free it - it's static */ if (error_param != unknown) mem_free ((void **) (void *) &error_param); else error_param = NULL; /* reset error position */ error_position = -1;}static void set_last_error (const byte *msg, byte *param, int pos){ /* error message can be set only once */ if (error_message != NULL) { mem_free ((void **) (void *) ¶m); return; } error_message = msg; /* if param is NULL, set error_param to unknown ("???") */ /* note: do not try to strdup the "???" - it may be that we are here because of */ /* out of memory error so strdup can fail */ if (param != NULL) error_param = param; else error_param = unknown; error_position = pos;}/* memory management routines*/static void *mem_alloc (size_t size){ void *ptr = grammar_alloc_malloc (size); if (ptr == NULL) set_last_error (OUT_OF_MEMORY, NULL, -1); return ptr;}static void *mem_copy (void *dst, const void *src, size_t size){ return grammar_memory_copy (dst, src, size);}static void mem_free (void **ptr){ grammar_alloc_free (*ptr); *ptr = NULL;}static void *mem_realloc (void *ptr, size_t old_size, size_t new_size){ void *ptr2 = grammar_alloc_realloc (ptr, old_size, new_size); if (ptr2 == NULL) set_last_error (OUT_OF_MEMORY, NULL, -1); return ptr2;}static byte *str_copy_n (byte *dst, const byte *src, size_t max_len){ return grammar_string_copy_n (dst, src, max_len);}static byte *str_duplicate (const byte *str){ byte *new_str = grammar_string_duplicate (str); if (new_str == NULL) set_last_error (OUT_OF_MEMORY, NULL, -1); return new_str;}static int str_equal (const byte *str1, const byte *str2){ return grammar_string_compare (str1, str2) == 0;}static int str_equal_n (const byte *str1, const byte *str2, unsigned int n){ return grammar_string_compare_n (str1, str2, n) == 0;}static intstr_length (const byte *str){ return (int) (grammar_string_length (str));}/* useful macros*/#define GRAMMAR_IMPLEMENT_LIST_APPEND(_Ty)\ static void _Ty##_append (_Ty **x, _Ty *nx) {\ while (*x) x = &(**x).next;\ *x = nx;\ }/* string to byte map typedef*/typedef struct map_byte_{ byte *key; byte data; struct map_byte_ *next;} map_byte;static void map_byte_create (map_byte **ma){ *ma = (map_byte *) mem_alloc (sizeof (map_byte)); if (*ma) { (**ma).key = NULL; (**ma).data = '\0'; (**ma).next = NULL; }}static void map_byte_destroy (map_byte **ma){ if (*ma) { map_byte_destroy (&(**ma).next); mem_free ((void **) &(**ma).key); mem_free ((void **) ma); }}GRAMMAR_IMPLEMENT_LIST_APPEND(map_byte)/* searches the map for the specified key, returns pointer to the element with the specified key if it exists returns NULL otherwise*/static map_byte *map_byte_locate (map_byte **ma, const byte *key){ while (*ma) { if (str_equal ((**ma).key, key)) return *ma; ma = &(**ma).next; } set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1); return NULL;}/* searches the map for specified key, if the key is matched, *data is filled with data associated with the key, returns 0 if the key is matched, returns 1 otherwise*/static int map_byte_find (map_byte **ma, const byte *key, byte *data){ map_byte *found = map_byte_locate (ma, key); if (found != NULL) { *data = found->data; return 0; } return 1;}/* regbyte context typedef Each regbyte consists of its name and a default value. These are static and created at grammar script compile-time, for example the following line: .regbyte vertex_blend 0x00 adds a new regbyte named "vertex_blend" to the static list and initializes it to 0. When the script is executed, this regbyte can be accessed by name for read and write. When a particular regbyte is written, a new regbyte_ctx entry is added to the top of the regbyte_ctx stack. The new entry contains information abot which regbyte it references and its new value. When a given regbyte is accessed for read, the stack is searched top-down to find an entry that references the regbyte. The first matching entry is used to return the current value it holds. If no entry is found, the default value is returned.*/typedef struct regbyte_ctx_{ map_byte *m_regbyte; byte m_current_value; struct regbyte_ctx_ *m_prev;} regbyte_ctx;static void regbyte_ctx_create (regbyte_ctx **re){ *re = (regbyte_ctx *) mem_alloc (sizeof (regbyte_ctx)); if (*re) { (**re).m_regbyte = NULL; (**re).m_prev = NULL; }}static void regbyte_ctx_destroy (regbyte_ctx **re){ if (*re) { mem_free ((void **) re); }}static byte regbyte_ctx_extract (regbyte_ctx **re, map_byte *reg){ /* first lookup in the register stack */ while (*re != NULL) { if ((**re).m_regbyte == reg) return (**re).m_current_value; re = &(**re).m_prev; }
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -