📄 symboltable.otx

📁 inger小型c编译器源码
💻 OTX
📖 第 1 页 / 共 2 页
字号:
上一页 12
	look at its primary goal and what the criteria are. Its primary goal is storing 
	symbol information an provide a fast lookup facility, for easy access to all the 
	stored symbol information.\\

	Since we have only a short period of time to develop our language \emph{inger} and the 
	compiler, the only criteria we had in choosing a suitable data structure was that 
	it was easy to use, and implement.\\

	\subsection{Data structures compared}
	One of the first possible data structures which comes to mind when thinking of 
	how to store a list of symbols is perhaps an array. Although an array is very 
	convenient to store symbols, it has quite a few practical limitations. Arrays are 
	defined with a static size, so chances are you define a symbol table with array size 
	256, and end up with 257 symbols causing a buffer overflow (an internal compiler 
	error). Writing beyond an array limit will produce unpredictable results. Needless 
	to say, this situation is undesirable. Searching an array is not efficient at all due 
	to its linear searching algorithm. A binary search algorithm could be applied, but 
	this would require the array to be sorted at all times. For sorted arrays, searching 
	is a fairly straightforward operation and easy to implement.\\

	If the table would be implemented as a stack, finding a symbol is a simple matter of 
	searching for the desired symbol from the top of the stack. The first variable found 
	is automatically the last variable added and thus the variable in the nearest scope. 
	This implementation makes it easy to use multi-level scoping, but is a heavy burden 
	on performance as with every search the stack has to be deconstructed and stored on a 
	second stack (until the first occurrence is found) and reconstructed, a very expensive 
	operation.\\
	 
	Another implementation would be a linked list of symbols. If implemented as an unsorted 
	double linked list it would be possible to use it just like the stack (append to the 
	back and search from the back) without the disadvantage of a lot of push and pop 
	operations. A search still takes place in a linear time frame but the operations 
	themselves are much cheaper than the stack implementation.\\
	
	Binary search trees improve search time massively, but only in sorted form (an unsorted 
	tree after all, is not a tree at all). This results in the loss of an advantage the 
	stack and double linked list offered: easy scoping. Now the first symbol found is not 
	per definition the latest definition of that symbol name. It in fact is probably the 
	first occurrence. This means that the search is not complete until it is impossible to 
	find another occurrence (4<3). This also means that we have to include some sort of scope 
	field with every symbol to separate the symbols: (a,1) and (a,2) are symbols of the same 
	name, but a is in a higher scope and therefore the correct symbol. Another big disadvantage 
	is that when a function is processed we need to rid the tree of all symbols in that 
	function's scope. This requires a complete search and rebalancing of the tree. Since the 
	tree is sorted by string value every operation (insert, search, etc...) is quite expensive. 
	These operations could be made more time efficient by using an hash algorithm as explained 
	in the next paragraph.\\

	String comparisons are relatively heavy compared to comparison of simple types such as 
	integers or bytes so using an hash algorithm to convert symbol names to simple types would 
	speed all operations on the tree considerably.\\

	\subsection{Data structure selection}

	We think a combination of an n-ary tree combined with linked lists is a suitable solution 
	for us. Initially we thought using a linked list was a good idea. Each list node, representing 
	a scope, would contain the root node of a binary tree. The major advantage of this approach 
	is that adding and removing a scope was easy and fast. This advantage was based on the idea 
	that the symbol table would grow and shrink during the first pass of compilation, which means 
	that the symbol table is not available anymore after the first pass. This is not what we want 
	since the symbol table must be available at all times after the first pass and therefore favour 
	a new data structure that is less efficient in removing scopes ( we do not remove scopes anyway 
	) but faster in looking up symbols in the symbol table.\\

	\emph{Design the data structure. Give abstract structures for types (keep 
	this a black box), and show how symbols are stored. Give code for the 
	data structure. Include (boring) pointer-pictures.}\\

	The symbol table data structure is not enough, it is just a tree and should be decorated 
	with symbol information like (return) types and modifiers. To store 
	this information correctly we designed several logical structures for symbols and types.
	In this chapter we will only deal with the symbol implementation, the type design is
	discussed in the Type chapter.

	A symbol is identified by a, in its scope, unique name. 

	This part of the text involves our implementation of types, and is a the time of writing 
	not yet finished.

	\section{Storing symbols in the symbol table}

	\emph{Our symbols come from the AST. Show how the symbol table is filled from 
	the AST using an algorithm (present the algorithm and include a small example).}\\

	We fill the symbol table from the AST. To illustrate how this symbol table filling algorithm 
	works, we will show how the n-ary tree evolves when constructing the symbol table from the 
	example code below.

	\begin{example} Simple program to test expressions
	\begin{verbatim}
	module test_module;
	
	start main: void -> void
	{
	    int i;
	    i = (3 * 5) / 10 + 20 - 5 * (132 + 3);
	}
	\end{verbatim}
	\end{example}
	
	We can distinguish the following steps in parsing the example source code.\\

	\begin{enumerate}
		\item	found a function declaration: start main: void -> void
		\item	add the function as a symbol to the symbol table in the current scope (in this case 
			the global scope)
		\item	Create a new scope, add one to the symbol table's scope counter
		\item	found a variable declaration: int I'
		\item	add the variable with all its properties (name: i, type: int, indirection level: 0) as a 
			symbol to the symbol table in the current scope (in this case scope level 1)
		\item	Exit the scope, when reaching the end of the processed function
	\end{enumerate}

	After these steps our symbol table will look like this.\\

		PICTURE HERE\\

\newpage

\begin{verbatim}
1.	Initialize the symbol table, create root node and allocate memory
2.	Start retrieving the symbols from the AST (this is a recursive function)
a.	If retrieved symbol is a variable declaration, process the declaration
i.	Store name, type and indirection level of the symbol to the newly created 
	node in the symbol table.
b.	If retrieved symbol is a function declaration/definition, process the 
	declaration/definition
i.	Process the function header and store its parameter information just like
	we do with variable declarations.
ii.	Create a scope for the new function block, we add one to the scope counter.
iii.	Process the parameters and store the function parameters as local variables
	in the new scope.
iv.	Start retrieving symbols from the AST (this is a recursive function)
3.	Purge the symbol table.

e) Symbol table implementation

All considerations and decisions made resulted in the following code.

Concepts:

The following two points are the main items 
* what information do we store about a symbol
* how do we store the collection of symbols


Criteria
1) Snelheid opzoeken
2) Snelheid verwijderen
3) Geheugengebruik
4) ....



vector, array, linked list, hash table

		Opzoeken	Verwijderen	Geheugen	Smaak	Totalen
Vector		O(1) ++		++		--			3
Array		O(1) ++		--		+			4
linked list		--	+		+			5
hash table								6

Conclusion: hash table.
CRAP

Conclusion

Initialy we thought that using a linked list combined with a hashed binary 
tree was a proper solution. The linked list would contain a node for every 
scope encountered and each of these nodes holds a pointer to the root of a 
hashed binary tree.........

--- old ---
We think a combination of a linked list and hashed binary trees is a proper 
solution. The linked list contains a node for every scope encountered and 
each of these nodes holds a pointer to the root of a hashed binary tree. 
This tree contains all symbols in this scope. 

We prefer a different structure for every scope as in practice most symbols 
will be used in their own scope anyway which makes the use of one enormously 
exremely titaneously large tree inefficient. Only when symbols from lower 
scopes are accessed the search may take longer as every higher scope will be 
searched seperately.   
--- old ---



--- Rewrite, according to the new document stucture ---

There are several aspects to bear in mind when handling simple data types. 

Secondly we have to think of an algorithm to fill the data structure of our 
choice. We will talk about this later on in chapter [chapter number]

--- Rewrite, according to the new document stucture ---
\end{verbatim}

        \begin{thebibliography}{99}
                \bibitem{regex}H. Spencer: \emph{POSIX 1003.2 regular
                expressions}, UNIX man page regex(7), 1994
                \bibitem{lex_yacc}J. Levine: \emph{Lex and Yacc},
                O'Reilly \& sons, 2000
        \end{thebibliography}

\normalsize
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -