📄 excel.txt
字号:
We can narrow down the area that we have to search by using BIFF's INDEX and ROW records. If a non-protected BIFF file has an INDEX record, it will be the second record in the file (immediately after the BOF record). If the second record is not an INDEX record, then we must resort to the exhaustive record search described above. Or, alternatively, we could simply fail the search and return some sort of error code. Having located the INDEX record, we fetch the rwMic and rwMac fields, which tell us the range of defined rows on the document. If the row we are searching for is outside of that range, then we know right away that the desired cell doesn't exist, so we can return zero. The next step is to locate the ROW record for the row of the desired cell. To do this, we need to understand how Excel saves ROW records and cell records. When Excel saves a document in BIFF format, it divides the document into blocks of 32 rows, starting at the first defined row on the document. Since rwMic is by definition the first defined row, the first block consists of rows rwMic through rwMic+31; the second, from rwMic+32 through rwMic+63; and in general, the i-th block, assuming that i is zero-based, consists of rows (rwMic+i*32) through (rwMic+i*32+31). Excel writes a block of ROW records to the file, then follows this with all the cell records for cells in those rows. This process is repeated until all ROW and cell records have been written. The INDEX record contains an array of file pointers to the blocks of ROW records. Working backwards from our rule above for ROW blocks, we see that to locate the block for row 'rw', we fetch array element (rw-rwMic)/32. Here, the '/' operator is integer division that truncates. Having found the proper array element, we position the BIFF file at that location. The file pointer that we fetched from the array is an absolute byte offset from the beginning of the file, which is byte 0. For example, if the file pointer were 17,540, then we would position the file at byte 17,540. The file is now positioned at the correct block of ROW records. The next step is to search for the correct ROW record. Since Excel documents have sparse cell tables, blocks of ROW records contain only the defined rows within the block range. This means that if the row we are searching for doesn't exist, then it won't have a ROW record in the BIFF file. We must read at most 32 records at this point. If we do not find a ROW record for the desired row, then we know that the row doesn't exist, so we can return zero. We know that the row doesn't exist as soon as we find a non-ROW record, or a ROW record for a row beyond the one we are searching for. Having found the correct ROW record, we fetch the colMic and colMac fields, which tell us the range of defined columns in the row. If the column we are searching for lies outside of the defined range, then we know that the desired cell doesn't exist, and we can return zero. From the ROW record, we can now determine the position within the file of the cell records for the desired row. The next step is to position the file at that point and search for the cell record for the desired cell. The dbRtcell field contains the offset to cells for the ROW record. This field is limited to 16 bits to save space in BIFF files; thus the largest offset that will fit is 65,535. In a large Excel document, however, it is possible for cells to be located farther than 65,535 bytes from their ROW record. Therefore we encode the offset to get more value from it. The first ROW record in a block contains an offset to cells relative to the second ROW record. This is because after reading the first ROW record, you are positioned at the second ROW record, so finding the cells is just a matter of skipping some number of bytes relative to the current file position. The second and all subsequent ROW records in a block contain offsets to cells relative to the previous ROW record's cells. This iterative approach works like this: after reading the first ROW record, you get its offset and add it to the current file position to get the absolute file position of the first ROW's cell records. When you read the second ROW record, you add the offset contained therein to your computed position of the first ROW's cells, and you get the position of the second ROW's cells. Continuing in this manner, you find that by the time you find the proper ROW record, you have already computed the absolute file position of its cells, so you position there and continue your search. Having computed the file position of our row's cell records, we set the file there and start sequentially searching for the desired cell. If we find the cell, we fetch its value and return it. Our search fails as soon as we encounter a cell record for a cell beyond ours, or we encounter a record which is not a cell record. If a ROW has no defined cells, we will set its dbRtcell offset to zero. If a ROW's cells are more than 64K from the previous ROW's cells (which is rare but possible), we will write out a zero offset for that ROW and ALL subsequent ROW records in the same block. All this means is that we have to search a little harder for the correct cell record: instead of being able to start our search at cells in the desired row, we will have to start searching at cells in some previous row.Excel Formulas-------------- This section describes how Excel stores formulas within BIFF files. Formulas appear in FORMULA, ARRAY, and NAME records. In this section, the term "formula" is a synonym for "parsed expression"; it is the internal tokenized representation of an Excel formula. Formulas are stored in a reverse Polish scheme. A formula consists of a sequence of parse tokens, each of which is either an operand, operator, or a control token. Operand tokens provide values; operator tokens perform arithmetic operations upon the operands; and control tokens assist in formula evaluation by describing properties of the formula. A token consists of two parts: a token type and a token value. Token types are called "ptg's" in Excel; they are one byte long, ranging in value from 1 to 0x7F. Ptg's above 0x7F are reserved for internal Excel use. The ptg specifies only what kind of information is contained in a token. The information itself is stored in the token value, immediately following the ptg in the parsed expression. Some tokens consist only of a ptg, without an accompanying token value; for example, to specify an addition operation, only the token type, ptgAdd, is required. But to specify an integer operand, both the ptg, ptgInt, and the token value, an integer, must be specified. As an illustration, consider the parsed expression for =5+6. This parsed expression consists of three tokens: two integer operands and an operator. ptgInt 0x0005 ptgInt 0x0006 ptgAdd < token 1 > < token 2 > <token 3> Notice that each ptgInt is immediately followed by the integer token value. In many cases, the token value consists of a structure of two or more fields. In describing structures for these cases, offset zero is assumed to be the first byte of the token value, i.e. the first byte immediately following the token type. Unless otherwise noted, all tokens can occur in FORMULA, ARRAY, and NAME records. Some tokens do not appear in one or more of these record types; they are explained as encountered.Expression Evaluation--------------------- The evaluation of Excel formulas is a straightforward process. One LIFO stack, the operand stack, is maintained during evaluation. When an operand is encountered, it is pushed onto the stack. When an operator is encountered, it operates on the topmost operand or operands. Operator precedence is irrelevant at evaluation time; operators are handled as soon as they are encountered. There are three kinds of operators: unary, binary, and function. Unary operators, like the minus sign which negates a number, operate only on the topmost operand. Binary operators, like the addition operator, operate on the top two operands. Function operators, which implement Excel functions, operate on a variable number of operands, depending on how many arguments the function accepts. All operators work by popping the required operands from the stack, performing calculations, and pushing the result back onto the operand stack.Unary Operators--------------- Here are the unary operator tokens. All of these operators pop the top argument from the operand stack, perform a calculation, and push the result back onto the operand stack.ptgUplus - unary plus (ptg = 0x12) This operator has no effect.ptgUminus - unary minus (ptg = 0x13) Negates the top operand.ptgPercent - percent sign (ptg = 0x14) Divides the top operand by 100Binary Operators---------------- Here are the binary operator ptg's. All of these operators pop the top two arguments from the operand stack, perform a calculation, and push the result back onto the operand stack.ptgAdd - addition (ptg = 0x03) Adds the top two operands together.ptgSub - subtraction (ptg = 0x04) Subtracts the top operand from the second-to-top.ptgMul - multiplication (ptg = 0x05) Multiplies the top two operands.ptgDiv - division (ptg = 0x06) Divides the top operand by the second-to-top.ptgPower - exponentiation (ptg = 0x07) Raises the second-to-top operand to the power of the top operand.ptgConcat - concatenation (ptg = 0x08) Appends the top operand to the second-to-top operand.ptgLT - less than (ptg = 0x09) Evaluates to TRUE if the second-to-top operand is less than the top operand; FALSE otherwise.ptgLE - less than or equal (ptg = 0x0A) Evaluates to TRUE if the second-to-top operand is less than or equal to the top operand; FALSE otherwise.ptgEQ - equal (ptg = 0x0B) Evaluates to TRUE if the top two operands are equal; FALSE otherwise.ptgGE - greater than or equal (ptg = 0x0C) Evaluates to TRUE if the second-to-top operand is greater than or equal to the top operand; FALSE otherwise.ptgGT - greater than (ptg = 0x0D) Evaluates to TRUE if the second-to-top operand is greater than the top operand; FALSE otherwise.ptgNE - not equal (ptg = 0x0E) Evaluates to TRUE if the top two operands are not equal; FALSE otherwise.ptgIsect - intersection (ptg = 0x0F) This is the Excel space operator. It computes the intersection of the top two operands.ptgUnion - union (ptg = 0x10) This is the Excel comma operator. It computes the union of the top two operands.ptgRange - range (ptg = 0x11) This is the Excel colon operator. It computes the minimal bounding rectangle of the top two operands.Operand Tokens - Constant------------------------- The following operand tokens push a single constant operand onto the operand stack.ptgMissArg - missing argument (operand, ptg = 0x16) Missing argument to an Excel function. For example, the second argument to DCOUNT(Database,,Criteria) would be stored as a ptgMissArg.ptgStr - string constant (operand, ptg = 0x17) String constant. Followed by the string. Offset Name Size Contents ------ ---- ---- -------- 0 cch 1 length of the string 1 rgch var the string ptgStr requires special handling when parsed expressions are scanned. See the section "Scanning a Parsed Expression" for an explanation.ptgErr - error value (operand, ptg = 0x1C) Error constant. Followed by the error value. See the BOOLERR record for a list of error values. Offset Name Size Contents ------ ---- ---- -------- 0 err 1 Excel error valueptgBool - boolean (operand, ptg = 0x1D) Boolean constant. Followed by a byte value. Offset Name Size Contents ------ ---- ---- -------- 0 f 1 =1 for TRUE =0 for FALSEptgInt - integer (operand, ptg = 0x1E) Integer constant. Followed by a word value. Offset Name Size Contents ------ ---- ---- -------- 0 w 2 unsigned integer valueptgNum - number (operand, ptg = 0x1F) Numeric constant. Followed by an 8-byte IEEE floating point number. Offset Name Size Contents ------ ---- ---- -------- 0 num 8 IEEE floating point numberOperand Tokens - Classes------------------------ As described above, operand tokens push operand values onto the operand stack. These values are divided into three different classes, depending on what type of value the formula expects from the operand. The type of value is determined at parse time by the context of the operand. REFERENCE CLASS. Some operands are required by context to be references. In this case, the term "reference" is a general term meaning the specification of one or more areas on an Excel document, without regard for the underlying cell values in those areas. When the Excel expression evaluator encounters a reference type operand, it pushes only the reference itself onto the operand stack; it does not dereference it to find the underlying cell values. For example, consider the formula CELL("width",B5), which returns the column width of cell B5. Clearly, only the reference to cell B5 is important here; the value stored at cell B5 is irrelevant to the cell' s width. VALUE CLASS. This is the most common type of operand; it pushes a single dereferenced value onto the operand stack. For example, consider the formula A1+1. Here, we are interested in the value stored in cell A1, so we dereference the A1 reference. ARRAY CLASS. This operand pushes an array of values onto the operand stack. The values may be specified either in an array constant or in a reference to cells. For example, consider the formula SUM({1,2,3;4,5,6}). Here, to evaluate the SUM function, the expression evaluator must push an entire array of values onto the operand stack. The three classes of operand tokens are numerically divided as follows: Operand Class Ptg's ------------- ----- Reference 0x20 - 0x3F Value 0x40 - 0x5F Array 0x60 - 0x7F Notice that the numerical difference between ptg classes is 0x20. This is the basis for forming the class variants of ptg's. Class variants of ptg's are formed from the reference class ptg, also known as the "base" ptg. To form the value class ptg from the base ptg, you add 0x20 to the ptg and append "V" (for "value") to the ptg name. To form the array class ptg from the base ptg, you add 0x40 to the ptg and append "A" (for "array") to the ptg name. These rules are summarized below for a hypothetical ptg called ptgFoo: Class Na
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -