⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ccsym.c

📁 KCC , a good c compiler, write by Ken Harrenstien
💻 C
📖 第 1 页 / 共 5 页
字号:
/*	CCSYM.C - Symbol table management (type table too)
**
**	(c) Copyright Ken Harrenstien 1989
**		All changes after v.165, 9-Mar-1988
**	(c) Copyright Ken Harrenstien, SRI International 1985, 1986
**		All changes after v.39, 8-Aug-1985
**
**	Original version (C) 1981  K. Chen
*/

#include "cc.h"
#include "ccchar.h"
#include <stdlib.h>	/* calloc, realloc, free */

/* Internal Data:
 *
 * Mapping function between SC_xxx and SCDB_xxx values
 */

static char scmap[] = {
	0
#define scdef(a,b) ,b
	scdefs
#undef  scdef
};

/* Exported functions - Symbol stuff */
void savesymtab();		/* CC */
void syminit();
SYMBOL *symfind(char *, int);
SYMBOL *symftag(SYMBOL *), *symfmember(SYMBOL *, SYMBOL *),
	*symflabel(SYMBOL *);
SYMBOL *symfidstr(char *), *symfnext(SYMBOL *);
SYMBOL *symqcreat(SYMBOL *);
SYMBOL *creatsym(char *), *symgcreat(char *), *uniqsym(SYMBOL *),
	*shmacsym(SYMBOL *);
void freesym(SYMBOL *), copysym(SYMBOL *, SYMBOL *);
SYMBOL *isdupsym(SYMBOL *);
int hash(char *);		/* Crock for CCEVAL's ecanon() */
SYMBOL *beglsym(void);
void endlsym(SYMBOL *), ridlsym(SYMBOL *);

/* Exported functions - Label stuff */
SYMBOL *newlabel(void);			/* Label functions */
void reflabel(SYMBOL *, int), freelabel(SYMBOL *), cleanlabs(void);

/* Exported functions - Mapping stuff */
int mapextsym(SYMBOL *);
void mapintsym(SYMBOL *);

/* Exported functions - Type stuff */
TYPE *findtype(int, TYPE *), *findctype(int, INT, unsigned INT, TYPE *),
	*findftype(TYPE *, TYPE *), *findutype(TYPE *),
	*findqtype(TYPE *, INT), *findptype(int, TYPE *, TYPE *);
TYPE *tcomposite(TYPE *, TYPE *);
INT sizetype(TYPE *);		/* For CCDECL, CCSTMT, CCGEN* */
INT sizeptobj(TYPE *);	/* For CCGEN2 */
INT sizearray(TYPE *);	/* For CCGEN, CCSTMT */
int elembsize(TYPE *);	/* ditto */
int tischarpointer(TYPE *), tischararray(TYPE *);
int tisbytepointer(TYPE *), tisbytearray(TYPE *);
#if DEBUG_KCC	/* 5/91 KCC size */
void symdump(SYMBOL *, char *);
void typedump(void);
#endif

/* Imported functions */
extern INT sixbit(char *);		/* CCASMB */
extern int codeseg(void);		/* CCOUT  */
extern void outsix(INT);		/* CCOUT  */
extern void savesymtab (SYMBOL *);
/* Local functions */
static void typeinit(void), labinit(void);
static TYPE *tcomproto(TYPE *, TYPE *);
static void inisymlist(SYMBOL **, SYMBOL **);
static SYMBOL *symfflag(SYMBOL *, int);
static void makelsym(SYMBOL *), makegsym(SYMBOL *);
static SYMBOL *getsym(SYMBOL **);
static void retsym(SYMBOL *);
static SYMBOL *mksym(char *, SYMBOL **);
static SYMBOL *symmk(SYMBOL *, int, SYMBOL **);
static int symhash(SYMBOL *), symcmp(SYMBOL *, SYMBOL *);
static int idcpy(SYMBOL *, char *);
static void smapinit(void);
static int smapmatch(INT);
static void aryerr(char *);
static void realfreelabel(SYMBOL *);

extern	char	mainname[];

#if 0	/* For debugging */
#define BUGMSG(a) if(symdeb) printf a;
static int symdeb = 0;
#else
#define BUGMSG(a) ;	/* Null stmt */
#endif

#if 0
		SYMBOL TABLE STRUCTURE

The "symbol table" is implemented as a collection of dynamically allocated
symbol entries.  A symbol is always linked either to the global symbol
list, or the local symbol list.  In addition to this linkage, all symbols
also belong to some hash chain list.

The hash table is used to look up symbols.  The identifier is hashed
to produce a index into the hash table, which contains pointers to all
of the hash chains;
a given chain consists of all symbols whose
identifiers produce that specific hash value.  This chain is then
searched sequentially, doing full string comparison on the identifiers,
until the matching symbol (if any) is found.

Symbols on a hash chain are linked MOST-RECENT-FIRST, and the first
matching symbol is considered to hide or shadow all other instances of
that identifier (unless it is flagged as no longer active).  This is how
the scope and visibility of symbols are implemented for symbol lookup.
Some special checking is done for macro symbols; see further comments at
end of this page.

Symbols must also be linked into either the global or local symbol
list.  These lists are doubly linked and new entries are added
MOST-RECENT-LAST (as opposed to the hash chain lists).  All symbols on
the global list have the same scope, and no duplicate identifiers
should exist on that list.  Symbols may initially be linked onto the
global list for a short time before being flushed or re-linked onto
the global list, but in general nothing is deleted from the global
symbol list unless re-initializing to compile a new file.

The local symbol list, however, is more dynamic.  During the parsing,
local symbols are added on the end of this list as they are
encountered;  once their scope has expired (the end of a block was
reached), they are marked inactive but remain on the list for the
benefit of the code generation routines.  Once the entire function
has been generated, all local symbols are flushed and the list
re-initialized.

The tricky part of the local symbol list has to do with how the block
structure of a C function is represented so that local symbols have
only their proper scope.  At any given moment, the pointer "lsymhead"
points to the symbol preceding the first symbol of the innermost active
block; if there is no active block (i.e.  parsing at top level), this
pointer is NULL.  The symbols belonging to this active block consist of
this first symbol and all succeeding symbols on the list, except for
those which are marked inactive (by setting the SF_XLOCAL flag).

Inactive symbols, if they exist, will be those belonging to inner
blocks (inside the current block) that have been exited.

When a block is first entered (via beglsym()) the old value is saved,
and lsymhead is set to the current tail of the local symbol list.  Now,
whenever a local symbol is defined in this block, it will be added to
the end of the list; to see whether a duplicate definition of the
symbol already exists, it suffices to scan the list starting at the
lsymhead pointer (the first symbol is lsymhead->Snext; see isdupsym()).
When the block is finally ended (via endlsym()), all symbols belonging
to this block will be marked inactive, and the old value of lsymhead
restored so that it now points to the next outer block.

The hash chain lists and the global/local lists are completely independent
of each other.

Note that the global and local lists are doubly linked and each has a "dummy"
initial symbol entry to render checks for NULL unnecessary.  These are the
only two "symbols" not also on a hash chain.  The hash chain lists are singly
linked and end in NULL.  Unused symbol entries are kept around on a freelist
to avoid the overhead of calls to calloc/free; they are never given to free(),
even at the start of a new file compilation, under the assumption that the
efficiency improvement is worth the (very slight) risk that storage will
become excessively fragmented over many compilations.

MACRO SYMBOLS:
Although normally macro symbols are unique, and thus shadowing is
never an issue, ANSI makes it possible for macro self-references to generate
non-macro symbols that are identical to macro names.  In other words, macro
symbols can also be shadowed.  But because this should only happen in
special circumstances, special checking is needed.
There are only two places where a macro symbol is looked for, both in CCPP:

findident() to handle an identifier token, and
findmacsym() to explicitly look for a macro name.

These two places both invoke symfind(), which finds the first hashed
symbol (whether macro or not).  All other symbol lookups can safely
assume that any identifiers they deal with have already been expanded
if necessary, and so they all use other routines like symfidstr() or
symftag(), etc., which ignore any macro symbols they encounter and will
thus find any shadowed symbols.  In order to ensure that symfind() finds
the macro symbol first if one exists, that symbol always has to come
BEFORE the shadowed symbol on the hash chain.  This ordering is ensured
by the shmacsym() function, which findident() invokes whenever a new
symbol is being shadowed by a macro.  uniqsym() also invokes this routine
if it is about to create a duplicate of a macro-shadowed symbol.

Handling of SC_XEXTREF:  There is a special category of block-scope
symbols which must actually become global symbols.  Declarations within
a block that have storage class "extern" must not be forgotten when
the block ends, because appropriate linkage commands must be generated
for the assembler, and multiple references to the same symbol must not
generate multiple linkage commands.  When a SC_EXTREF symbol is about
to be flushed by endlsym(), it is instead (1) given the type SC_XEXTREF
to distinguish it from SC_EXTREF, (2) moved from the local list to the
global list so it will stay around for the duration of the file
compilation, and (3) flagged with SF_XLOCAL to put it out-of-scope,
i.e. so it will not be found by any normal symbol-finder routine.  
Another external declaration of the same identifier needs to refer to
the same symbol, which is why symfxext() exists to find it.  External
declarations are handled in two places: CCDECL's funchk() and dodecl().

#endif

SYMBOL *lsymhead;	/* NULL at top level, else points to head of
			** current local symbol block.  The first sym
			** on the list is lsymhead->Snext.
			** Only reason this isn't static is cuz CCDECL
			** wants to know if we're in a local block.
			*/
static SYMBOL *symflist = NULL;	/* Symbol entry freelist, for efficiency */

static SYMBOL
/*  *symbol,	*/	/* Global symbol list head (CCDECL, CCOUT) */
    *symtail,		/* ptr to tail of global list */
    *locsymbol,		/* Local symbol list head */
    *loctail;		/* ptr to tail of local list */
#if DEBUG_KCC	/* 8/91 shrink KCC */
static int nsymbols = 0;	/* # of symbols allocated (except dummies) */
#endif

/* Semi-portable char masks for ident strings.
** chmask[n] has mask for N bytes in word, to quickly clear rest of word.
** lastwd has mask for last byte in word, to check for end of string.
*/
static INT chmask[sizeof(INT)+1];	/* Char mask table for ident strings */
static INT lastwd;			/* Mask for last byte in word */

/* SYMINIT - Initialize symbol table stuff.
*/
void
syminit()
{
    register int i, f;
    union
	{
	INT wd;
	char ch[sizeof(INT)];
	}
    mask;
    SYMBOL *s;

    /* Initialize char mask table used by identifier handling stuff */
    chmask[0] = 0;
    for (mask.wd = 0, i = 0; i < sizeof(INT); ++i)
	{
	mask.ch[i] = (char) ~0;				// FW KCC-NT
	chmask[i+1] = mask.wd;
	}
    lastwd = ~chmask[sizeof(INT)-1];

    /* Initialize labels, symbols, and types */
    labinit();				/* Initialize internal label stuff */
    smapinit();				/* Init symbol map stuff */

    inisymlist(&symbol, &symtail);	/* Initialize global symbol list */
    inisymlist(&locsymbol, &loctail);	/* Initialize local symbol list */
    lsymhead = NULL;			/* Currently at top level */

    /* Clear out symbol hash table and set initial reserved-word symbols */
    for (i = 0 ; i < MAXHSH ; i++)	/* Clear hash table */
	htable[i] = NULL;
    for (i = 0; ++i < NTOKDEFS;)	/* Enter all reserved words */
	{
	switch (tok[i].tktype)	/* Check token table for RW's */
	    {
	    default:
		continue;			/* Nope, keep scanning */
	    case TKTY_RWTYPE:
	    case TKTY_RWSC:
	    case TKTY_RWCOMP:
	    case TKTY_RWOP:
		break;			/* Is reserved word, hack it! */
	    }
	if ((f = tok[i].tkprec)&(RWF_ANSI+RWF_KCC))	/* Any flags set? */
	    {
	    if (((f & RWF_ANSI) && clevel >= CLEV_ANSI)	/* If ANSI and OK, */
		|| ((f & RWF_KCC) && clevkcc))		/* or KCC and OK, */
		;
	    else
		continue;	/* then go ahead, else skip sym. */
	    }
	/* Make reserved-word symbol! */
	s = symgcreat(tokstr[i]);	/* Make symbol for the word */
	s->Sclass = SC_RW;		/* Say it's a reserved word */
	s->Stoken = i;			/* Set token number */
	s->Skey = tok[i].tktype;	/* and token's type */
	}
    minsym = symtail;		/* Crock for CCDUMP's symdump, someday flush */

    typeinit();		/* Now initialize tables etc. for C data types */
}

static void
inisymlist(ahead, atail)
SYMBOL **ahead, **atail;
{
    SYMBOL *s, *head;

    if (*ahead == NULL)	/* Initialize for first time only */
	/* KAR-8/91, Changed to calloc() call to ensure memory is zeroed */
	{
	s = (SYMBOL *) calloc(1, sizeof(SYMBOL)); /* Allocate a sym entry */
	if (s == NULL)
	    efatal("No memory for symbols");
	}
    else			/* Symbols already exist, free them. */
	{
	for (head = (*ahead)->Snext; (s = head) != NULL;)	/* For all but 1st */
	    {
	    if (s->Sclass == SC_MACRO && s->Smacptr)
		free(s->Smacptr);	/* If macro body exists, free it */
	    head = s->Snext;
	    retsym(s);			/* Free up the symbol */
	    }
	s = *ahead;			/* Done, re-use 1st sym */
	}

    /* Initialize the 1st sym on list, which is just a dummy that
    ** is never used for anything.  Its existence allows the list routines
    ** to skip some checks for NULL-ness of pointers.
    */
    *ahead = *atail = s;		/* Head and tail point to dummy sym */
    s->Snext = s->Sprev = NULL;		/* Nothing else on list */
    s->Sclass = SC_UNDEF;		/* Just in case... */
}

/* Symbol lookup routines */

/* SYMFIND - Given string, finds or makes symbol for it.
**	This is the main symbol find/create routine.
**	It is used primarily by CCPP and sometimes by CCLEX.
**	Only searches for macros and ordinary identifiers (not tags,
**	labels, members, or out-of-scope local symbols).
**	If "creatf" is true and no symbol was found, makes a global symbol
**	with class SC_UNDEF.
** Subtle point: if the found symbol already has SC_UNDEF, then don't
** use it -- leave it alone because some other part of the parser is
** almost certainly hanging on to it while doing token read-ahead!
** This can happen for "struct foo foo".
**	Issues a warning [Note] if symbol is being returned for
**	an identifier string that was truncated.
*/
SYMBOL *
symfind(str, creatf)
char *str;
int creatf;
{
    register SYMBOL *sym;
    int trunc;
    SYMBOL stmp;

    BUGMSG(("symfind \"%s\" %d\n", str, creatf))
	trunc = idcpy(&stmp, str);		/* Set up, puts hash value in Svalue */
    for (sym = htable[(int) stmp.Svalue]; sym != NULL; sym = sym->Snhash)
	if (((sym->Sflags&(SF_XLOCAL|(SF_OVCLS&~SF_MACRO))) == 0)
	  && symcmp(sym, &stmp)
	  && sym->Sclass != SC_UNDEF)
	    {
	    sym->Srefs++;
	    break;
	    }

    /* Symbol not found, so make it if OK to do so */
    if (!sym)
	{
	if (!creatf)
	    return NULL;
	sym = symmk(&stmp, (int) stmp.Svalue, &symtail);/* Put on global list */
	}
    if (trunc)
	note("Identifer truncated: %S", sym);
    return sym;
}


/* SYMFIDSTR - Given string, finds symbol for an ordinary identifier.
**	Similar to symfind() but never creates a symbol and ignores macros.
**	This is only used for easy lookup of certain literal identifiers:
**		CC: "main"
**		CCDECL: "setjmp"
**		CCOUT: "`$$$CRT" and "`$$$CPU"
*/
SYMBOL *
symfidstr(str)
char *str;
{
    register SYMBOL *sym;

    BUGMSG(("symfidstr \"%s\"\n", str))
	if ((sym = symfind(str, 0)) != NULL)
	{
	if (sym->Sclass == SC_MACRO)	/* If got macro, */
	    sym = symfnext(sym);	/* get non-macro instead if any */
	}
    return sym;
}

/* SYMFNEXT - Finds Next Symbol.
**	Used only by CCPP (and symfidstr() above) to find non-macro
** instance of a symbol when the macro instance is being suppressed.
** Note that the argument is a symbol pointer to the macro instance,
** not an identifier string.
*/
SYMBOL *
symfnext(osym)
SYMBOL *osym;
{
    register SYMBOL *sym = osym;
    BUGMSG(("symfnext \"%s\"\n", sym->Sname))
	while ((sym = sym->Snhash) != NULL)		/* Scan hash list */
	if (((sym->Sflags&(SF_XLOCAL|SF_OVCLS)) == 0)
	 && symcmp(sym, osym))
	    {
	    sym->Srefs++;
	    break;
	    }
    return sym;
}

/* FINDGSYM - Find global (file-scope) non-macro symbol.
**	Starts searching from the specified sym, rather
**	than taking an identifier string.
*/
SYMBOL *
findgsym(osym)
SYMBOL *osym;
{
    register SYMBOL *sym = osym;
    BUGMSG(("findgsym \"%s\"\n", sym->Sname))
	while ((sym = sym->Snhash) != NULL)
	if (((sym->Sflags&(SF_LOCAL|SF_XLOCAL|SF_OVCLS)) == 0)
	 && symcmp(sym, osym))
	    {
	    sym->Srefs++;
	    break;
	    }
    return sym;
}

/*
** SYMFLABEL - Find a label symbol.
** SYMFTAG - Find a struct/union/enum tag symbol.
** SYMFMEMBER - Find a structure member symbol.  Takes tag arg also.
*/
SYMBOL *symflabel(sym)
SYMBOL *sym;
{
    return symfflag(sym, SF_LABEL);
}

SYMBOL *symftag(sym)
SYMBOL *sym;
{
    return symfflag(sym, SF_TAG);
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -