📄 ch31.htm

📁 《Perl 5 Unreleased》
💻 HTM
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页
<HTML>



<HEAD>

   <TITLE>Chapter 31 -- Generating Code</TITLE>

   <META NAME="GENERATOR" CONTENT="Mozilla/3.0b5aGold (WinNT; I) [Netscape]">

</HEAD>

<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#CE2910">

<H1><FONT COLOR=#FF0000>Chapter 31</FONT></H1>

<H1><B><FONT SIZE=5 COLOR=#FF0000>Generating Code</FONT></B>

</H1>

<P>

<HR WIDTH="100%"></P>

<P>

<H3 ALIGN=CENTER><FONT COLOR="#000000"><FONT SIZE=+2>CONTENTS<A NAME="CONTENTS"></A>

</FONT></FONT></H3>

<UL>

<LI><A HREF="#Introduction" >Introduction</A>

<LI><A HREF="#ChoosingtheInputFile" >Choosing the Input File</A>

<LI><A HREF="#ParsingRecords" >Parsing Records</A>

<LI><A HREF="#WritingthecheaderFiles" >Writing the C Header Files</A>

<LI><A HREF="#WritingtheEncoderSourceFile" >Writing the Encoder Source File</A>

<LI><A HREF="#WritingtheDecoder" >Writing the Decoder</A>

<LI><A HREF="#PuttingItTogether" >Putting It Together</A>

<LI><A HREF="#Summary" >Summary</A>

</UL>

<HR>

<P>

This chapter introduces you to using Perl to solve real-world

problems. The ideas you'll gain from this chapter pertain to applying

Perl to solve coding problems. The sample problem chosen is complicated

enough to be encountered during your programming endeavors. Remember

to concentrate on <I>how</I> the problem is solved, not on for

what the final solution is being used.

<H2><A NAME="Introduction"><B><FONT SIZE=5 COLOR=#FF0000>Introduction</FONT></B></A>

</H2>

<P>

The main concepts introduced in this chapter have nothing to do

with navigation and seismic fields. However, the problem that

I address in this chapter is related to writing a seismic navigation

record parser. This parser was crucial in getting a delayed project

up and running in the field. Rather than spend days writing a

parser in C, we were up in a few hours of coding effort. Plus

the concepts gained during this experience helped me write parsers

for other data formats for the same project with just as little

effort.

<P>

Basically, I'll be covering ways to use Perl to generate C code

for a parser. Instead of writing another phone book manager or

a database for music or home inventory records, it's probably

better that we deal with a real-world example. Perhaps after reading

this chapter and seeing how this problem was tackled, you can

draw parallel solutions for your current problems.

<P>

The sample project involves writing a parser to read FORTRAN-based

records for a seismic survey. Most seismic navigation data, believe

it or not, is based on an archaic standard based on the use of

FORTRAN programs to read it. The standard is known as the UKOOA

P2/86 standard. Sometimes this format is simply referred to as

<I>P286</I>. This format was developed for old FORTRAN programs,

and a huge amount of data still exists in this format. Now with

the wide acceptance of C for most of the code being developed

for graphical interfaces and numeric processing, it's only natural

to look for ways to read these files without requiring the FORTRAN

executable.

<P>

Of course, the decode and encode functions for P286 had to be

done yesterday. Given the options of getting another job or writing

100+ functions and structure declarations, I tried tackling this

problem with the lazy programmer's approach: Let Perl do the grunt

work.

<P>

Here is what we were dealing with. Each record of data in a P286

data file is exactly 80 characters long with a trailing newline

or null character. Fields within a record are based on character

column positions and lengths in the record. Columns are numbered

from 1 up. Because white spaces and commas can be part of the

data, there are no &quot;field separators&quot; as such.

<P>

The first few characters of a record identify the type of data

in the record. For example, records begin as <TT><FONT FACE="Courier">H0001</FONT></TT>,

<TT><FONT FACE="Courier">H0002</FONT></TT>, and so on, with the

rest of the characters as fields within the record. There are

52 such records in all. Records in a file are not in sequential

order. The first identifying characters are between three and

five characters long. The only guarantees are that there will

always be only one record per line and that there will be no blank

lines.

<P>

By reading the specification I discovered these things that would

make the coding process programmable:

<UL>

<LI><FONT COLOR=#000000>The only variables that could be defined

in all records were either integers, doubles, or character strings.

This implied a closed set of variables to check for.</FONT>

<LI><FONT COLOR=#000000>The specification of all the headers was

explicit about where each data item began in a line and where

it ended. The lack of delimiters between fields was no longer

a problem since a line could be chopped into substrings quite

easily.</FONT>

<LI><FONT COLOR=#000000>These were text files instead of packed

binary files. This made debugging a little easier because no special

routines were required to display the data.</FONT>

</UL>

<P>

Voil&agrave;! The two important criteria (consistency and working

with a closed set) made it possible to do the encoding and decoding

functions manually. By using a Perl program to do the coding for

me, I reduced the possibility of errors. Any one error would be

propagated to all functions and would be easy to catch and fix.

<P>

The most obvious question was, <I>Why not use Perl to do all the

decoding and encoding?</I> The encoding and decoding routines

were to be incorporated into a C program running on different

platforms running DOS or another lower operating system. Embedding

Perl within the C program would involve installing and maintaining

Perl on these platforms. Most of the platforms the final code

would run on did not support Perl.

<H2><A NAME="ChoosingtheInputFile"><B><FONT SIZE=5 COLOR=#FF0000>Choosing

the Input File</FONT></B></A></H2>

<P>

After reading the specification, I extracted all the header declarations

into one file called <TT><FONT FACE="Courier">P286hdrs</FONT></TT>.

The specification listed the contents of headers in plain text

in between two keywords, <TT><FONT FACE="Courier">RECORD</FONT></TT>

and <TT><FONT FACE="Courier">END</FONT></TT>. The specifications

were not consistent enough to be extracted using a program. Actually,

I extracted most of the lines defining the format using an <TT><FONT FACE="Courier">awk</FONT></TT>

script:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">/RECORD/,/^ENDREC$/ { print ; }</FONT></TT>

</BLOCKQUOTE>

<P>

This script removes all the lines between the lines containing

<TT><FONT FACE="Courier">RECORD</FONT></TT> and <TT><FONT FACE="Courier">END</FONT></TT>.

I still had to do some editing after extracting all these records

to get the correct input format.

<P>

The case presented in this example had a text file with the following

format:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">RECORD TypeOfRecord StringToUse<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[REPEAT count

st1 st2 ... ]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

&nbsp;&nbsp;&nbsp;&nbsp;variableName&nbsp;&nbsp;&nbsp;&nbsp;varType&nbsp;&nbsp;&nbsp;&nbsp;startCol

endCol [format]<BR>

ENDREC</FONT></TT>

</BLOCKQUOTE>

<P>

The <TT><FONT FACE="Courier">variableName</FONT></TT> would be

the name of a variable in a structure; the <TT><FONT FACE="Courier">VarType</FONT></TT>

would be <TT><FONT FACE="Courier">int</FONT></TT>, <TT><FONT FACE="Courier">double</FONT></TT>,

or <TT><FONT FACE="Courier">char</FONT></TT>. The <TT><FONT FACE="Courier">startCol</FONT></TT>

and <TT><FONT FACE="Courier">endCol</FONT></TT> values defined

the locations in the string where the data could be picked up.

The first column was still numbered 1 instead of 0. It's easier

to increment by 1 in a program than to change so many declarations.

<P>

Some blocks of variables in some records were repeated. These

were defined after the option <TT><FONT FACE="Courier">REPEAT</FONT></TT>

keyword. The syntax for the <TT><FONT FACE="Courier">REPEAT</FONT></TT>

keyword was this:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">REPEAT count st1 st2 ... stN</FONT></TT>

</BLOCKQUOTE>

<P>

The <TT><FONT FACE="Courier">st1</FONT></TT> to <TT><FONT FACE="Courier">stN</FONT></TT>

are the starting offsets for all the fields that follow the <TT><FONT FACE="Courier">REPEAT</FONT></TT>

word. The count specified the number of times to repeat these

blocks.

<P>

For example, the following record is interpreted as &quot;Record

H0001, with one variable starting at column 29 up to column 80.&quot;

<BLOCKQUOTE>

<TT><FONT FACE="Courier">RECORD H0001<BR>

&nbsp;&nbsp;&nbsp;&nbsp;SurveyType  char 29 80<BR>

ENDREC</FONT></TT>

</BLOCKQUOTE>

<P>

Another example of a record using more than one field is shown

here:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">RECORD H011<BR>

&nbsp;&nbsp;&nbsp;&nbsp;datumId&nbsp;&nbsp;&nbsp;&nbsp;int 5 5

<BR>

&nbsp;&nbsp;&nbsp;&nbsp;spheroidName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;char

6 23<BR>

&nbsp;&nbsp;&nbsp;&nbsp;datumName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;char

24 41<BR>

&nbsp;&nbsp;&nbsp;&nbsp;semimajorAxis&nbsp;&nbsp;&nbsp;&nbsp;double

42 53&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;12.3<BR>

&nbsp;&nbsp;&nbsp;&nbsp;conversionFactor double 66 77&nbsp;&nbsp;&nbsp;&nbsp;12.8

<BR>

&nbsp;&nbsp;&nbsp;&nbsp;inverseFlattening double 66 77&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;12.7

<BR>

ENDREC</FONT></TT>

</BLOCKQUOTE>

<P>

Note how only the first four characters are relevant in identifying

the record. Also, the integers and characters don't have a floating

point string, whereas the numbers defining the double type do

have a floating point specification of the form: <TT><FONT FACE="Courier">length.decimals</FONT></TT>.

The <TT><FONT FACE="Courier">length</FONT></TT><I> </I>is the

total number of columns in the number including the decimal point,

the <TT><FONT FACE="Courier">decimals</FONT></TT> portion is the

number of digits to the right of the decimal point. For example,

12.8 will occupy 12 character spaces in one column, 3 digits to

the left of the decimal point and 8 digits to the right of the

decimal point. The format for the floating point number is the

same as that for a <TT><FONT FACE="Courier">printf()</FONT></TT>

statement in C.

<P>

Another example is a record using <TT><FONT FACE="Courier">REPEAT</FONT></TT>

fields, as shown here:

<BLOCKQUOTE>

<TT><FONT FACE="Courier">RECORD E3100<BR>

&nbsp;&nbsp;&nbsp;&nbsp;velprop&nbsp;&nbsp;double 6 12 7.2<BR>

&nbsp;&nbsp;&nbsp;&nbsp;REPEAT&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;13

26 39 52 65<BR>

&nbsp;&nbsp;&nbsp;&nbsp;srcNdx&nbsp;&nbsp;&nbsp;int 13 15<BR>

&nbsp;&nbsp;&nbsp;&nbsp;dstNdx&nbsp;&nbsp;&nbsp;int 16 18<BR>

&nbsp;&nbsp;&nbsp;&nbsp;slant&nbsp;&nbsp;&nbsp;&nbsp;double 19

25 7.2<BR>

ENDREC</FONT></TT>

</BLOCKQUOTE>

<P>

In this record type, the block <TT><FONT FACE="Courier">{srcNdx,dstNdx,slant}</FONT></TT>

is repeated at columns 13, 26, 39, 52, and 65. This implies that

each of these variables can be interpreted as arrays of five elements

each.

<P>

The entire file for parsing these records is about 520 lines long.

A shorter sample file is shown in Listing 31.1. Note how comment

lines in that listing are inserted in this input file with the

use of the <TT><FONT FACE="Courier">#</FONT></TT> character. Actually,

any lines could be used for comments as long as the comments are

outside the confines of <TT><FONT FACE="Courier">RECORD</FONT></TT>

and <TT><FONT FACE="Courier">ENDREC</FONT></TT> statements. The

reason to use the hash is to maintain some consistency with Perl.

<HR>

<BLOCKQUOTE>

<B>Listing 31.1. The input file.<BR>

</B>

</BLOCKQUOTE>

<BLOCKQUOTE>

<TT><FONT FACE="Courier">&nbsp;1 #<BR>

&nbsp;2 # Comment lines are permitted in the usual Perl style.

<BR>

&nbsp;3 #<BR>

&nbsp;4 RECORD H0001<BR>

&nbsp;5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;SurveyType&nbsp;&nbsp;char

29 80<BR>

&nbsp;6 ENDREC<BR>

&nbsp;7<BR>

&nbsp;8 RECORD H0010<BR>

&nbsp;9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numPatterns int 6 7<BR>

10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sblInUse&nbsp;&nbsp;&nbsp;&nbsp;int

8 8<BR>

11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sattInUse&nbsp;&nbsp;&nbsp;int

9 9<BR>

12&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numVessels&nbsp;&nbsp;int 10 10

<BR>

13&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numDatum&nbsp;&nbsp;&nbsp;&nbsp;int

11 11<BR>

14&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;offsetMode&nbsp;&nbsp;int 12 12

<BR>

15 ENDREC<BR>

16<BR>

17 RECORD H011<BR>

18&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;datumId&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;int

5 5<BR>

19&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;spheroidName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;char

6 23<BR>

20&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;datumName&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;char
12 3 4 5 下一页
💿 文件大小 1200 K
👤 上传用户 cz6891297
📂 所属分类其他书籍
🏷️ 相关标签

#Unreleased #Perl
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -