📄 ch31.htm
字号:
<HTML>
<HEAD>
<TITLE>Chapter 31 -- Generating Code</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.0b5aGold (WinNT; I) [Netscape]">
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">
<H1><FONT COLOR=#FF0000>Chapter 31</FONT></H1>
<H1><B><FONT SIZE=5 COLOR=#FF0000>Generating Code</FONT></B>
</H1>
<P>
<HR WIDTH="100%"></P>
<P>
<H3 ALIGN=CENTER><FONT COLOR="#000000"><FONT SIZE=+2>CONTENTS<A NAME="CONTENTS"></A>
</FONT></FONT></H3>
<UL>
<LI><A HREF="#Introduction" >Introduction</A>
<LI><A HREF="#ChoosingtheInputFile" >Choosing the Input File</A>
<LI><A HREF="#ParsingRecords" >Parsing Records</A>
<LI><A HREF="#WritingthecheaderFiles" >Writing the C Header Files</A>
<LI><A HREF="#WritingtheEncoderSourceFile" >Writing the Encoder Source File</A>
<LI><A HREF="#WritingtheDecoder" >Writing the Decoder</A>
<LI><A HREF="#PuttingItTogether" >Putting It Together</A>
<LI><A HREF="#Summary" >Summary</A>
</UL>
<HR>
<P>
This chapter introduces you to using Perl to solve real-world
problems. The ideas you'll gain from this chapter pertain to applying
Perl to solve coding problems. The sample problem chosen is complicated
enough to be encountered during your programming endeavors. Remember
to concentrate on <I>how</I> the problem is solved, not on for
what the final solution is being used.
<H2><A NAME="Introduction"><B><FONT SIZE=5 COLOR=#FF0000>Introduction</FONT></B></A>
</H2>
<P>
The main concepts introduced in this chapter have nothing to do
with navigation and seismic fields. However, the problem that
I address in this chapter is related to writing a seismic navigation
record parser. This parser was crucial in getting a delayed project
up and running in the field. Rather than spend days writing a
parser in C, we were up in a few hours of coding effort. Plus
the concepts gained during this experience helped me write parsers
for other data formats for the same project with just as little
effort.
<P>
Basically, I'll be covering ways to use Perl to generate C code
for a parser. Instead of writing another phone book manager or
a database for music or home inventory records, it's probably
better that we deal with a real-world example. Perhaps after reading
this chapter and seeing how this problem was tackled, you can
draw parallel solutions for your current problems.
<P>
The sample project involves writing a parser to read FORTRAN-based
records for a seismic survey. Most seismic navigation data, believe
it or not, is based on an archaic standard based on the use of
FORTRAN programs to read it. The standard is known as the UKOOA
P2/86 standard. Sometimes this format is simply referred to as
<I>P286</I>. This format was developed for old FORTRAN programs,
and a huge amount of data still exists in this format. Now with
the wide acceptance of C for most of the code being developed
for graphical interfaces and numeric processing, it's only natural
to look for ways to read these files without requiring the FORTRAN
executable.
<P>
Of course, the decode and encode functions for P286 had to be
done yesterday. Given the options of getting another job or writing
100+ functions and structure declarations, I tried tackling this
problem with the lazy programmer's approach: Let Perl do the grunt
work.
<P>
Here is what we were dealing with. Each record of data in a P286
data file is exactly 80 characters long with a trailing newline
or null character. Fields within a record are based on character
column positions and lengths in the record. Columns are numbered
from 1 up. Because white spaces and commas can be part of the
data, there are no "field separators" as such.
<P>
The first few characters of a record identify the type of data
in the record. For example, records begin as <TT><FONT FACE="Courier">H0001</FONT></TT>,
<TT><FONT FACE="Courier">H0002</FONT></TT>, and so on, with the
rest of the characters as fields within the record. There are
52 such records in all. Records in a file are not in sequential
order. The first identifying characters are between three and
five characters long. The only guarantees are that there will
always be only one record per line and that there will be no blank
lines.
<P>
By reading the specification I discovered these things that would
make the coding process programmable:
<UL>
<LI><FONT COLOR=#000000>The only variables that could be defined
in all records were either integers, doubles, or character strings.
This implied a closed set of variables to check for.</FONT>
<LI><FONT COLOR=#000000>The specification of all the headers was
explicit about where each data item began in a line and where
it ended. The lack of delimiters between fields was no longer
a problem since a line could be chopped into substrings quite
easily.</FONT>
<LI><FONT COLOR=#000000>These were text files instead of packed
binary files. This made debugging a little easier because no special
routines were required to display the data.</FONT>
</UL>
<P>
Voilà! The two important criteria (consistency and working
with a closed set) made it possible to do the encoding and decoding
functions manually. By using a Perl program to do the coding for
me, I reduced the possibility of errors. Any one error would be
propagated to all functions and would be easy to catch and fix.
<P>
The most obvious question was, <I>Why not use Perl to do all the
decoding and encoding?</I> The encoding and decoding routines
were to be incorporated into a C program running on different
platforms running DOS or another lower operating system. Embedding
Perl within the C program would involve installing and maintaining
Perl on these platforms. Most of the platforms the final code
would run on did not support Perl.
<H2><A NAME="ChoosingtheInputFile"><B><FONT SIZE=5 COLOR=#FF0000>Choosing
the Input File</FONT></B></A></H2>
<P>
After reading the specification, I extracted all the header declarations
into one file called <TT><FONT FACE="Courier">P286hdrs</FONT></TT>.
The specification listed the contents of headers in plain text
in between two keywords, <TT><FONT FACE="Courier">RECORD</FONT></TT>
and <TT><FONT FACE="Courier">END</FONT></TT>. The specifications
were not consistent enough to be extracted using a program. Actually,
I extracted most of the lines defining the format using an <TT><FONT FACE="Courier">awk</FONT></TT>
script:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">/RECORD/,/^ENDREC$/ { print ; }</FONT></TT>
</BLOCKQUOTE>
<P>
This script removes all the lines between the lines containing
<TT><FONT FACE="Courier">RECORD</FONT></TT> and <TT><FONT FACE="Courier">END</FONT></TT>.
I still had to do some editing after extracting all these records
to get the correct input format.
<P>
The case presented in this example had a text file with the following
format:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">RECORD TypeOfRecord StringToUse<BR>
variableName varType startCol
endCol [format]<BR>
variableName varType startCol
endCol [format]<BR>
variableName varType startCol
endCol [format]<BR>
[REPEAT count
st1 st2 ... ]<BR>
variableName varType startCol
endCol [format]<BR>
variableName varType startCol
endCol [format]<BR>
variableName varType startCol
endCol [format]<BR>
ENDREC</FONT></TT>
</BLOCKQUOTE>
<P>
The <TT><FONT FACE="Courier">variableName</FONT></TT> would be
the name of a variable in a structure; the <TT><FONT FACE="Courier">VarType</FONT></TT>
would be <TT><FONT FACE="Courier">int</FONT></TT>, <TT><FONT FACE="Courier">double</FONT></TT>,
or <TT><FONT FACE="Courier">char</FONT></TT>. The <TT><FONT FACE="Courier">startCol</FONT></TT>
and <TT><FONT FACE="Courier">endCol</FONT></TT> values defined
the locations in the string where the data could be picked up.
The first column was still numbered 1 instead of 0. It's easier
to increment by 1 in a program than to change so many declarations.
<P>
Some blocks of variables in some records were repeated. These
were defined after the option <TT><FONT FACE="Courier">REPEAT</FONT></TT>
keyword. The syntax for the <TT><FONT FACE="Courier">REPEAT</FONT></TT>
keyword was this:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">REPEAT count st1 st2 ... stN</FONT></TT>
</BLOCKQUOTE>
<P>
The <TT><FONT FACE="Courier">st1</FONT></TT> to <TT><FONT FACE="Courier">stN</FONT></TT>
are the starting offsets for all the fields that follow the <TT><FONT FACE="Courier">REPEAT</FONT></TT>
word. The count specified the number of times to repeat these
blocks.
<P>
For example, the following record is interpreted as "Record
H0001, with one variable starting at column 29 up to column 80."
<BLOCKQUOTE>
<TT><FONT FACE="Courier">RECORD H0001<BR>
SurveyType char 29 80<BR>
ENDREC</FONT></TT>
</BLOCKQUOTE>
<P>
Another example of a record using more than one field is shown
here:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">RECORD H011<BR>
datumId int 5 5
<BR>
spheroidName char
6 23<BR>
datumName char
24 41<BR>
semimajorAxis double
42 53 12.3<BR>
conversionFactor double 66 77 12.8
<BR>
inverseFlattening double 66 77 12.7
<BR>
ENDREC</FONT></TT>
</BLOCKQUOTE>
<P>
Note how only the first four characters are relevant in identifying
the record. Also, the integers and characters don't have a floating
point string, whereas the numbers defining the double type do
have a floating point specification of the form: <TT><FONT FACE="Courier">length.decimals</FONT></TT>.
The <TT><FONT FACE="Courier">length</FONT></TT><I> </I>is the
total number of columns in the number including the decimal point,
the <TT><FONT FACE="Courier">decimals</FONT></TT> portion is the
number of digits to the right of the decimal point. For example,
12.8 will occupy 12 character spaces in one column, 3 digits to
the left of the decimal point and 8 digits to the right of the
decimal point. The format for the floating point number is the
same as that for a <TT><FONT FACE="Courier">printf()</FONT></TT>
statement in C.
<P>
Another example is a record using <TT><FONT FACE="Courier">REPEAT</FONT></TT>
fields, as shown here:
<BLOCKQUOTE>
<TT><FONT FACE="Courier">RECORD E3100<BR>
velprop double 6 12 7.2<BR>
REPEAT 5 13
26 39 52 65<BR>
srcNdx int 13 15<BR>
dstNdx int 16 18<BR>
slant double 19
25 7.2<BR>
ENDREC</FONT></TT>
</BLOCKQUOTE>
<P>
In this record type, the block <TT><FONT FACE="Courier">{srcNdx,dstNdx,slant}</FONT></TT>
is repeated at columns 13, 26, 39, 52, and 65. This implies that
each of these variables can be interpreted as arrays of five elements
each.
<P>
The entire file for parsing these records is about 520 lines long.
A shorter sample file is shown in Listing 31.1. Note how comment
lines in that listing are inserted in this input file with the
use of the <TT><FONT FACE="Courier">#</FONT></TT> character. Actually,
any lines could be used for comments as long as the comments are
outside the confines of <TT><FONT FACE="Courier">RECORD</FONT></TT>
and <TT><FONT FACE="Courier">ENDREC</FONT></TT> statements. The
reason to use the hash is to maintain some consistency with Perl.
<HR>
<BLOCKQUOTE>
<B>Listing 31.1. The input file.<BR>
</B>
</BLOCKQUOTE>
<BLOCKQUOTE>
<TT><FONT FACE="Courier"> 1 #<BR>
2 # Comment lines are permitted in the usual Perl style.
<BR>
3 #<BR>
4 RECORD H0001<BR>
5 SurveyType char
29 80<BR>
6 ENDREC<BR>
7<BR>
8 RECORD H0010<BR>
9 numPatterns int 6 7<BR>
10 sblInUse int
8 8<BR>
11 sattInUse int
9 9<BR>
12 numVessels int 10 10
<BR>
13 numDatum int
11 11<BR>
14 offsetMode int 12 12
<BR>
15 ENDREC<BR>
16<BR>
17 RECORD H011<BR>
18 datumId int
5 5<BR>
19 spheroidName char
6 23<BR>
20 datumName char
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -