📄 symboltable.otx
字号:
look at its primary goal and what the criteria are. Its primary goal is storing
symbol information an provide a fast lookup facility, for easy access to all the
stored symbol information.\\
Since we have only a short period of time to develop our language \emph{inger} and the
compiler, the only criteria we had in choosing a suitable data structure was that
it was easy to use, and implement.\\
\subsection{Data structures compared}
One of the first possible data structures which comes to mind when thinking of
how to store a list of symbols is perhaps an array. Although an array is very
convenient to store symbols, it has quite a few practical limitations. Arrays are
defined with a static size, so chances are you define a symbol table with array size
256, and end up with 257 symbols causing a buffer overflow (an internal compiler
error). Writing beyond an array limit will produce unpredictable results. Needless
to say, this situation is undesirable. Searching an array is not efficient at all due
to its linear searching algorithm. A binary search algorithm could be applied, but
this would require the array to be sorted at all times. For sorted arrays, searching
is a fairly straightforward operation and easy to implement.\\
If the table would be implemented as a stack, finding a symbol is a simple matter of
searching for the desired symbol from the top of the stack. The first variable found
is automatically the last variable added and thus the variable in the nearest scope.
This implementation makes it easy to use multi-level scoping, but is a heavy burden
on performance as with every search the stack has to be deconstructed and stored on a
second stack (until the first occurrence is found) and reconstructed, a very expensive
operation.\\
Another implementation would be a linked list of symbols. If implemented as an unsorted
double linked list it would be possible to use it just like the stack (append to the
back and search from the back) without the disadvantage of a lot of push and pop
operations. A search still takes place in a linear time frame but the operations
themselves are much cheaper than the stack implementation.\\
Binary search trees improve search time massively, but only in sorted form (an unsorted
tree after all, is not a tree at all). This results in the loss of an advantage the
stack and double linked list offered: easy scoping. Now the first symbol found is not
per definition the latest definition of that symbol name. It in fact is probably the
first occurrence. This means that the search is not complete until it is impossible to
find another occurrence (4<3). This also means that we have to include some sort of scope
field with every symbol to separate the symbols: (a,1) and (a,2) are symbols of the same
name, but a is in a higher scope and therefore the correct symbol. Another big disadvantage
is that when a function is processed we need to rid the tree of all symbols in that
function's scope. This requires a complete search and rebalancing of the tree. Since the
tree is sorted by string value every operation (insert, search, etc...) is quite expensive.
These operations could be made more time efficient by using an hash algorithm as explained
in the next paragraph.\\
String comparisons are relatively heavy compared to comparison of simple types such as
integers or bytes so using an hash algorithm to convert symbol names to simple types would
speed all operations on the tree considerably.\\
\subsection{Data structure selection}
We think a combination of an n-ary tree combined with linked lists is a suitable solution
for us. Initially we thought using a linked list was a good idea. Each list node, representing
a scope, would contain the root node of a binary tree. The major advantage of this approach
is that adding and removing a scope was easy and fast. This advantage was based on the idea
that the symbol table would grow and shrink during the first pass of compilation, which means
that the symbol table is not available anymore after the first pass. This is not what we want
since the symbol table must be available at all times after the first pass and therefore favour
a new data structure that is less efficient in removing scopes ( we do not remove scopes anyway
) but faster in looking up symbols in the symbol table.\\
\emph{Design the data structure. Give abstract structures for types (keep
this a black box), and show how symbols are stored. Give code for the
data structure. Include (boring) pointer-pictures.}\\
The symbol table data structure is not enough, it is just a tree and should be decorated
with symbol information like (return) types and modifiers. To store
this information correctly we designed several logical structures for symbols and types.
In this chapter we will only deal with the symbol implementation, the type design is
discussed in the Type chapter.
A symbol is identified by a, in its scope, unique name.
This part of the text involves our implementation of types, and is a the time of writing
not yet finished.
\section{Storing symbols in the symbol table}
\emph{Our symbols come from the AST. Show how the symbol table is filled from
the AST using an algorithm (present the algorithm and include a small example).}\\
We fill the symbol table from the AST. To illustrate how this symbol table filling algorithm
works, we will show how the n-ary tree evolves when constructing the symbol table from the
example code below.
\begin{example} Simple program to test expressions
\begin{verbatim}
module test_module;
start main: void -> void
{
int i;
i = (3 * 5) / 10 + 20 - 5 * (132 + 3);
}
\end{verbatim}
\end{example}
We can distinguish the following steps in parsing the example source code.\\
\begin{enumerate}
\item found a function declaration: start main: void -> void
\item add the function as a symbol to the symbol table in the current scope (in this case
the global scope)
\item Create a new scope, add one to the symbol table's scope counter
\item found a variable declaration: int I'
\item add the variable with all its properties (name: i, type: int, indirection level: 0) as a
symbol to the symbol table in the current scope (in this case scope level 1)
\item Exit the scope, when reaching the end of the processed function
\end{enumerate}
After these steps our symbol table will look like this.\\
PICTURE HERE\\
\newpage
\begin{verbatim}
1. Initialize the symbol table, create root node and allocate memory
2. Start retrieving the symbols from the AST (this is a recursive function)
a. If retrieved symbol is a variable declaration, process the declaration
i. Store name, type and indirection level of the symbol to the newly created
node in the symbol table.
b. If retrieved symbol is a function declaration/definition, process the
declaration/definition
i. Process the function header and store its parameter information just like
we do with variable declarations.
ii. Create a scope for the new function block, we add one to the scope counter.
iii. Process the parameters and store the function parameters as local variables
in the new scope.
iv. Start retrieving symbols from the AST (this is a recursive function)
3. Purge the symbol table.
e) Symbol table implementation
All considerations and decisions made resulted in the following code.
Concepts:
The following two points are the main items
* what information do we store about a symbol
* how do we store the collection of symbols
Criteria
1) Snelheid opzoeken
2) Snelheid verwijderen
3) Geheugengebruik
4) ....
vector, array, linked list, hash table
Opzoeken Verwijderen Geheugen Smaak Totalen
Vector O(1) ++ ++ -- 3
Array O(1) ++ -- + 4
linked list -- + + 5
hash table 6
Conclusion: hash table.
CRAP
Conclusion
Initialy we thought that using a linked list combined with a hashed binary
tree was a proper solution. The linked list would contain a node for every
scope encountered and each of these nodes holds a pointer to the root of a
hashed binary tree.........
--- old ---
We think a combination of a linked list and hashed binary trees is a proper
solution. The linked list contains a node for every scope encountered and
each of these nodes holds a pointer to the root of a hashed binary tree.
This tree contains all symbols in this scope.
We prefer a different structure for every scope as in practice most symbols
will be used in their own scope anyway which makes the use of one enormously
exremely titaneously large tree inefficient. Only when symbols from lower
scopes are accessed the search may take longer as every higher scope will be
searched seperately.
--- old ---
--- Rewrite, according to the new document stucture ---
There are several aspects to bear in mind when handling simple data types.
Secondly we have to think of an algorithm to fill the data structure of our
choice. We will talk about this later on in chapter [chapter number]
--- Rewrite, according to the new document stucture ---
\end{verbatim}
\begin{thebibliography}{99}
\bibitem{regex}H. Spencer: \emph{POSIX 1003.2 regular
expressions}, UNIX man page regex(7), 1994
\bibitem{lex_yacc}J. Levine: \emph{Lex and Yacc},
O'Reilly \& sons, 2000
\end{thebibliography}
\normalsize
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -