📄 grammar.c

📁 Mesa is an open-source implementation of the OpenGL specification - a system for rendering interacti
💻 C
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
/* * Mesa 3-D graphics library * Version:  6.6 * * Copyright (C) 1999-2006  Brian Paul   All Rights Reserved. * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the "Software"), * to deal in the Software without restriction, including without limitation * the rights to use, copy, modify, merge, publish, distribute, sublicense, * and/or sell copies of the Software, and to permit persons to whom the * Software is furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included * in all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. *//** * \file grammar.c * syntax parsing engine * \author Michal Krol */#ifndef GRAMMAR_PORT_BUILD#error Do not build this file directly, build your grammar_XXX.c instead, which includes this file#endif/**//*    INTRODUCTION    ------------    The task is to check the syntax of an input string. Input string is a stream of ASCII    characters terminated with a null-character ('\0'). Checking it using C language is    difficult and hard to implement without bugs. It is hard to maintain and make changes when    the syntax changes.    This is because of a high redundancy of the C code. Large blocks of code are duplicated with    only small changes. Even use of macros does not solve the problem because macros cannot    erase the complexity of the problem.    The resolution is to create a new language that will be highly oriented to our task. Once    we describe a particular syntax, we are done. We can then focus on the code that implements    the language. The size and complexity of it is relatively small than the code that directly    checks the syntax.    First, we must implement our new language. Here, the language is implemented in C, but it    could also be implemented in any other language. The code is listed below. We must take    a good care that it is bug free. This is simple because the code is simple and clean.    Next, we must describe the syntax of our new language in itself. Once created and checked    manually that it is correct, we can use it to check another scripts.    Note that our new language loading code does not have to check the syntax. It is because we    assume that the script describing itself is correct, and other scripts can be syntactically    checked by the former script. The loading code must only do semantic checking which leads us to    simple resolving references.    THE LANGUAGE    ------------    Here I will describe the syntax of the new language (further called "Synek"). It is mainly a    sequence of declarations terminated by a semicolon. The declaration consists of a symbol,    which is an identifier, and its definition. A definition is in turn a sequence of specifiers    connected with ".and" or ".or" operator. These operators cannot be mixed together in a one    definition. Specifier can be a symbol, string, character, character range or a special    keyword ".true" or ".false".    On the very beginning of the script there is a declaration of a root symbol and is in the form:        .syntax <root_symbol>;    The <root_symbol> must be on of the symbols in declaration sequence. The syntax is correct if    the root symbol evaluates to true. A symbol evaluates to true if the definition associated with    the symbol evaluates to true. Definition evaluation depends on the operator used to connect    specifiers in the definition. If ".and" operator is used, definition evaluates to true if and    only if all the specifiers evaluate to true. If ".or" operator is used, definition evalutes to    true if any of the specifiers evaluates to true. If definition contains only one specifier,    it is evaluated as if it was connected with ".true" keyword by ".and" operator.    If specifier is a ".true" keyword, it always evaluates to true.    If specifier is a ".false" keyword, it always evaluates to false. Specifier evaluates to false    when it does not evaluate to true.    Character range specifier is in the form:        '<first_character>' - '<second_character>'    If specifier is a character range, it evaluates to true if character in the stream is greater    or equal to <first_character> and less or equal to <second_character>. In that situation     the stream pointer is advanced to point to next character in the stream. All C-style escape    sequences are supported although trigraph sequences are not. The comparisions are performed    on 8-bit unsigned integers.    Character specifier is in the form:        '<single_character>'    It evaluates to true if the following character range specifier evaluates to true:        '<single_character>' - '<single_character>'    String specifier is in the form:        "<string>"    Let N be the number of characters in <string>. Let <string>[i] designate i-th character in    <string>. Then the string specifier evaluates to true if and only if for i in the range [0, N)    the following character specifier evaluates to true:        '<string>[i]'    If <string>[i] is a quotation mark, '<string>[i]' is replaced with '\<string>[i]'.    Symbol specifier can be optionally preceded by a ".loop" keyword in the form:        .loop <symbol>                  (1)    where <symbol> is defined as follows:        <symbol> <definition>;          (2)    Construction (1) is replaced by the following code:        <symbol$1>    and declaration (2) is replaced by the following:        <symbol$1> <symbol$2> .or .true;        <symbol$2> <symbol> .and <symbol$1>;        <symbol> <definition>;    Synek supports also a register mechanizm. User can, in its SYN file, declare a number of    registers that can be accessed in the syn body. Each reg has its name and a default value.    The register is one byte wide. The C code can change the default value by calling    grammar_set_reg8() with grammar id, register name and a new value. As we know, each rule is    a sequence of specifiers joined with .and or .or operator. And now each specifier can be    prefixed with a condition expression in a form ".if (<reg_name> <operator> <hex_literal>)"    where <operator> can be == or !=. If the condition evaluates to false, the specifier    evaluates to .false. Otherwise it evalutes to the specifier.    ESCAPE SEQUENCES    ----------------    Synek supports all escape sequences in character specifiers. The mapping table is listed below.    All occurences of the characters in the first column are replaced with the corresponding    character in the second column.        Escape sequence         Represents    ------------------------------------------------------------------------------------------------        \a                      Bell (alert)        \b                      Backspace        \f                      Formfeed        \n                      New line        \r                      Carriage return        \t                      Horizontal tab        \v                      Vertical tab        \'                      Single quotation mark        \"                      Double quotation mark        \\                      Backslash        \?                      Literal question mark        \ooo                    ASCII character in octal notation        \xhhh                   ASCII character in hexadecimal notation    ------------------------------------------------------------------------------------------------    RAISING ERRORS    --------------    Any specifier can be followed by a special construction that is executed when the specifier    evaluates to false. The construction is in the form:        .error <ERROR_TEXT>    <ERROR_TEXT> is an identifier declared earlier by error text declaration. The declaration is    in the form:        .errtext <ERROR_TEXT> "<error_desc>"    When specifier evaluates to false and this construction is present, parsing is stopped    immediately and <error_desc> is returned as a result of parsing. The error position is also    returned and it is meant as an offset from the beggining of the stream to the character that    was valid so far. Example:        (**** syntax script ****)        .syntax program;        .errtext MISSING_SEMICOLON      "missing ';'"        program         declaration .and .loop space .and ';' .error MISSING_SEMICOLON .and                        .loop space .and '\0';        declaration     "declare" .and .loop space .and identifier;        space           ' ';        (**** sample code ****)        declare foo ,    In the example above checking the sample code will result in error message "missing ';'" and    error position 12. The sample code is not correct. Note the presence of '\0' specifier to    assure that there is no code after semicolon - only spaces.    <error_desc> can optionally contain identifier surrounded by dollar signs $. In such a case,    the identifier and dollar signs are replaced by a string retrieved by invoking symbol with    the identifier name. The starting position is the error position. The lenght of the resulting    string is the position after invoking the symbol.    PRODUCTION    ----------    Synek not only checks the syntax but it can also produce (emit) bytes associated with specifiers    that evaluate to true. That is, every specifier and optional error construction can be followed    by a number of emit constructions that are in the form:        .emit <parameter>    <paramater> can be a HEX number, identifier, a star * or a dollar $. HEX number is preceded by    0x or 0X. If <parameter> is an identifier, it must be earlier declared by emit code declaration    in the form:        .emtcode <identifier> <hex_number>    When given specifier evaluates to true, all emits associated with the specifier are output    in order they were declared. A star means that last-read character should be output instead    of constant value. Example:        (**** syntax script ****)        .syntax foobar;        .emtcode WORD_FOO       0x01        .emtcode WORD_BAR       0x02        foobar      FOO .emit WORD_FOO .or BAR .emit WORD_BAR .or .true .emit 0x00;        FOO         "foo" .and SPACE;        BAR         "bar" .and SPACE;        SPACE       ' ' .or '\0';        (**** sample text 1 ****)        foo        (**** sample text 2 ****)        foobar    For both samples the result will be one-element array. For first sample text it will be    value 1, for second - 0. Note that every text will be accepted because of presence of    .true as an alternative.    Another example:        (**** syntax script ****)        .syntax declaration;        .emtcode VARIABLE       0x01        declaration     "declare" .and .loop space .and                        identifier .emit VARIABLE .and          (1)                        .true .emit 0x00 .and                   (2)                        .loop space .and ';';        space           ' ' .or '\t';        identifier      .loop id_char .emit *;                  (3)        id_char         'a'-'z' .or 'A'-'Z' .or '_';        (**** sample code ****)        declare    fubar;    In specifier (1) symbol <identifier> is followed by .emit VARIABLE. If it evaluates to    true, VARIABLE constant and then production of the symbol is output. Specifier (2) is used    to terminate the string with null to signal when the string ends. Specifier (3) outputs    all characters that make declared identifier. The result of sample code will be the    following array:        { 1, 'f', 'u', 'b', 'a', 'r', 0 }    If .emit is followed by dollar $, it means that current position should be output. Current    position is a 32-bit unsigned integer distance from the very beginning of the parsed string to    first character consumed by the specifier associated with the .emit instruction. Current    position is stored in the output buffer in Little-Endian convention (the lowest byte comes    first).*/#include <stdio.h>static void mem_free (void **);/*    internal error messages*/static const byte *OUT_OF_MEMORY =          (byte *) "internal error 1001: out of physical memory";static const byte *UNRESOLVED_REFERENCE =   (byte *) "internal error 1002: unresolved reference '$'";static const byte *INVALID_GRAMMAR_ID =     (byte *) "internal error 1003: invalid grammar object";static const byte *INVALID_REGISTER_NAME =  (byte *) "internal error 1004: invalid register name: '$'";/*static const byte *DUPLICATE_IDENTIFIER =   (byte *) "internal error 1005: identifier '$' already defined";*/static const byte *UNREFERENCED_IDENTIFIER =(byte *) "internal error 1006: unreferenced identifier '$'";static const byte *error_message = NULL;    /* points to one of the error messages above */static byte *error_param = NULL;        /* this is inserted into error_message in place of $ */static int error_position = -1;static byte *unknown = (byte *) "???";static void clear_last_error (void){    /* reset error message */    error_message = NULL;    /* free error parameter - if error_param is a "???" don't free it - it's static */    if (error_param != unknown)        mem_free ((void **) (void *) &error_param);    else        error_param = NULL;    /* reset error position */    error_position = -1;}static void set_last_error (const byte *msg, byte *param, int pos){    /* error message can be set only once */    if (error_message != NULL)    {        mem_free ((void **) (void *) &param);        return;    }    error_message = msg;    /* if param is NULL, set error_param to unknown ("???") */    /* note: do not try to strdup the "???" - it may be that we are here because of */    /* out of memory error so strdup can fail */    if (param != NULL)        error_param = param;    else        error_param = unknown;    error_position = pos;}/*    memory management routines*/static void *mem_alloc (size_t size){    void *ptr = grammar_alloc_malloc (size);    if (ptr == NULL)        set_last_error (OUT_OF_MEMORY, NULL, -1);    return ptr;}static void *mem_copy (void *dst, const void *src, size_t size){    return grammar_memory_copy (dst, src, size);}static void mem_free (void **ptr){    grammar_alloc_free (*ptr);    *ptr = NULL;}static void *mem_realloc (void *ptr, size_t old_size, size_t new_size){    void *ptr2 = grammar_alloc_realloc (ptr, old_size, new_size);    if (ptr2 == NULL)        set_last_error (OUT_OF_MEMORY, NULL, -1);    return ptr2;}static byte *str_copy_n (byte *dst, const byte *src, size_t max_len){    return grammar_string_copy_n (dst, src, max_len);}static byte *str_duplicate (const byte *str){    byte *new_str = grammar_string_duplicate (str);    if (new_str == NULL)        set_last_error (OUT_OF_MEMORY, NULL, -1);    return new_str;}static int str_equal (const byte *str1, const byte *str2){    return grammar_string_compare (str1, str2) == 0;}static int str_equal_n (const byte *str1, const byte *str2, unsigned int n){    return grammar_string_compare_n (str1, str2, n) == 0;}static intstr_length (const byte *str){   return (int) (grammar_string_length (str));}/*    useful macros*/#define GRAMMAR_IMPLEMENT_LIST_APPEND(_Ty)\    static void _Ty##_append (_Ty **x, _Ty *nx) {\        while (*x) x = &(**x).next;\        *x = nx;\    }/*    string to byte map typedef*/typedef struct map_byte_{    byte *key;    byte data;    struct map_byte_ *next;} map_byte;static void map_byte_create (map_byte **ma){    *ma = (map_byte *) mem_alloc (sizeof (map_byte));    if (*ma)    {        (**ma).key = NULL;        (**ma).data = '\0';        (**ma).next = NULL;    }}static void map_byte_destroy (map_byte **ma){    if (*ma)    {        map_byte_destroy (&(**ma).next);        mem_free ((void **) &(**ma).key);        mem_free ((void **) ma);    }}GRAMMAR_IMPLEMENT_LIST_APPEND(map_byte)/*    searches the map for the specified key,    returns pointer to the element with the specified key if it exists    returns NULL otherwise*/static map_byte *map_byte_locate (map_byte **ma, const byte *key){    while (*ma)    {        if (str_equal ((**ma).key, key))            return *ma;        ma = &(**ma).next;    }    set_last_error (UNRESOLVED_REFERENCE, str_duplicate (key), -1);    return NULL;}/*    searches the map for specified key,    if the key is matched, *data is filled with data associated with the key,    returns 0 if the key is matched,    returns 1 otherwise*/static int map_byte_find (map_byte **ma, const byte *key, byte *data){    map_byte *found = map_byte_locate (ma, key);    if (found != NULL)    {        *data = found->data;        return 0;    }    return 1;}/*    regbyte context typedef    Each regbyte consists of its name and a default value. These are static and created at    grammar script compile-time, for example the following line:        .regbyte vertex_blend      0x00    adds a new regbyte named "vertex_blend" to the static list and initializes it to 0.    When the script is executed, this regbyte can be accessed by name for read and write. When a    particular regbyte is written, a new regbyte_ctx entry is added to the top of the regbyte_ctx    stack. The new entry contains information abot which regbyte it references and its new value.    When a given regbyte is accessed for read, the stack is searched top-down to find an    entry that references the regbyte. The first matching entry is used to return the current    value it holds. If no entry is found, the default value is returned.*/typedef struct regbyte_ctx_{    map_byte *m_regbyte;    byte m_current_value;    struct regbyte_ctx_ *m_prev;} regbyte_ctx;static void regbyte_ctx_create (regbyte_ctx **re){    *re = (regbyte_ctx *) mem_alloc (sizeof (regbyte_ctx));    if (*re)    {        (**re).m_regbyte = NULL;        (**re).m_prev = NULL;    }}static void regbyte_ctx_destroy (regbyte_ctx **re){    if (*re)    {        mem_free ((void **) re);    }}static byte regbyte_ctx_extract (regbyte_ctx **re, map_byte *reg){    /* first lookup in the register stack */    while (*re != NULL)    {        if ((**re).m_regbyte == reg)            return (**re).m_current_value;        re = &(**re).m_prev;    }
12 3 4 5 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -