📄 ss3
字号:
.SHSection 3: Lexical Analysis.PPThe user must supply a lexical analyzer which reads the input stream and communicates tokens(with values, if desired) to the parser.The lexical analyzer is an integer valued function called yylex, in both C and Ratfor.The function returns an integer which represents the type of the token.The value to be associated in the parser with that token isassigned to the integer variable yylval.Thus, a lexical analyzer written in C should begin.DSyylex ( ) { extern int yylval; . . ..DEwhile a lexical analyzer written in Ratfor should begin.DSinteger function yylex(yylval) integer yylval . . ..DE.PPClearly, the parser and the lexical analyzer must agree on the type numbers in order forcommunication between them to take place.These numbers may be chosen by Yacc, or chosen by the user.In either case, the ``define'' mechanisms of C and Ratfor are used to allow the lexical analyzerto return these numbers symbolically.For example, suppose that the token name DIGIT has been defined in the declarations section of thespecification.The relevant portion of the lexical analyzer (in C) might look like:.DSyylex( ) { extern int yylval; int c; . . . c = getchar( ); . . . if( c >= \'0\' && c <= \'9\' ) { yylval = c\-\'0\'; return(DIGIT); } . . ..DE.PPThe relevant portion of the Ratfor lexical analyzer might look like:.DSinteger function yylex(yylval) integer yylval, digits(10), c . . . data digits(1) / "0" /; data digits(2) / "1" /; . . . data digits(10) / "9" /; . . .# set c to the next input character . . . do i = 1, 10 { if(c .EQ. digits(i)) { yylval = i\-1 yylex = DIGIT return } } . . ..DE.PPIn both cases, the intent is to return a token type of DIGIT, and a value equal to the numerical value of thedigit.Provided that the lexical analyzer code is placed in the programs section of the specification,the identifier DIGIT will be redefined to be equal to the type number associatedwith the token name DIGIT..PPThis mechanism leads to clearand easily modified lexical analyzers; the only pitfall is that it makes itimportant to avoid using any names in the grammar which are reservedor significant in the chosen language; thus, in both C and Ratfor, the use oftoken names of ``if'' or ``yylex'' will almost certainly cause severedifficulties when the lexical analyzer is compiled.The token name ``error'' is reserved for error handling, and should not be used naively(see Section 5)..PPAs mentioned above, the type numbers may be chosen by Yacc or by the user.In the default situation, the numbers are chosen by Yacc.The default type number for a literalcharacter is the numerical value of the character, considered as a 1 byte integer.Other token names are assigned type numbersstarting at 257.It is a difficult, machine dependentoperation to determine the numerical value of an input characterin Ratfor (or Fortran).Thus, the Ratfor user of Yacc will probably wishto set his own type numbers, or not use any literals in his specification..PPTo assign a type number to a token (including literals),the first appearance of the token name or literal.Iin the declarations section.Rcan be immediately followed bya nonnegative integer.This integer is taken to be the type number of the name or literal.Names and literals not defined by this mechanism retain their default definition.It is important that all type numbers be distinct..PPThere is one exception to this situation.For sticky historical reasons, the endmarker must have typenumber 0.Note that this is not unattractive in C, since the nul character is returned uponend of file; in Ratfor, it makes no sense.This type number cannot be redefined by the user; thus, alllexical analyzers should be prepared to return 0 as a type numberupon reaching the end of their input.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -