📄 file.hlp
字号:
byte. There is nothing wrong with coding that (the "test" can be read back
easily enough), but this is probably not what you intended to write.
{pstd}
Also concerning strings, you can output string literals{hline 2}just enclose the
string in quotes{hline 2}or you can output the results of string expressions.
Expressions, as in the case of using {cmd:file} {cmd:write} to output text
files, must be enclosed in parentheses:
{phang2}{cmd:. file write} {it:handle} {cmd:%4s (substr(a,2,6))}
{pstd}
The following program will output a user-specified matrix to a user-specified
file; the syntax of the command being implemented is
{phang2}{cmd:mymatout1} {it:matname} {cmd:using} {it:filename} [{cmd:, replace}]
{pstd}
and the code is,
{cmd:program define mymatout1}
{cmd:version {ccl stata_version}}
{cmd:gettoken mname 0 : 0 }
{cmd:syntax using/ [, replace]}
{cmd:local r = rowsof(`mname')}
{cmd:local c = colsof(`mname')}
{cmd:tempname hdl}
{cmd:file open `hdl' using `"`using'"', `replace' write binary}
{cmd:file write `hdl' %2b (`r') %2b (`c')}
{cmd:forvalues i=1(1)`r' {c -(}}
{cmd:forvalues j=1(1)`c' {c -(}}
{cmd:file write `hdl' %8z (`mname'[`i',`j'])}
{cmd:{c )-}}
{cmd:{c )-}}
{cmd:file close `hdl'}
{cmd:end}
{pstd}
A significant problem with {cmd:mymatout1} is that, if we wrote a matrix
on our Unix computer (an Intel based computer) and copied the file to a
Macintosh computer, we would discover that we could not read the file. Intel
computers write multiple-byte numbers with the least-significant byte first;
Macintosh computers write the most-significant byte first. Who knows what
your computer does? Thus even though there is general agreement across
computers on how numbers and characters are written, this byte-ordering
difference is enough to stop binary files in their tracks.
{pstd}
{cmd:file} can handle this problem for you, but you have to insert a little
bit of code. The recommended procedure is this: before writing any numbers
in the file, write a field saying which byte order this computer uses (see
{helpb functions:byteorder()} function).
Later, when we write the command
to read the file, it will read the ordering we recorded. We will then tell
{cmd:file} which byte ordering the file is using and {cmd:file} itself will
reorder the bytes if that is necessary. There are other ways we could
handle this{hline 2}such as always writing in a known byte order{hline 2}but
the recommended procedure is better because it is, on average, faster. Most
files are read on the same computer that wrote them, and thus the computer
wastes no time rearranging bytes in that case.
{pstd}
The improved version of {cmd:mymatout1} is
program define {cmd:mymatout2}
version {ccl stata_version}
gettoken mname 0 : 0
syntax using/ [, replace]
local r = rowsof(`mname')
local c = colsof(`mname')
tempname hdl
file open `hdl' using `"`using'"', `replace' write binary
/* new */ {cmd:file write `hdl' %1b (byteorder())}
file write `hdl' %2b (`r') %2b (`c')
forvalues i=1(1)`r' {c -(}
forvalues j=1(1)`c' {c -(}
file write `hdl' %8z (`mname'[`i',`j'])
{c )-}
{c )-}
file close `hdl'
end
{pstd}
{cmd:byteorder()} returns 1 if the machine is hilo and 2 if lohi, but all that
matters is that it is a small enough to fit in a byte. The important thing is
that we write this number using {cmd:%1b} about which there is no
byte-ordering disagreement. What we do with this number we will deal with
later.
{pstd}
The second significant problem with our program is that it does not write a
signature. Binary files are difficult to tell apart: they all look like
binary junk. It is important that we include some sort of marker at the top
saying who wrote this file and what format it was written in. That is called
a signature. The signature we will use is
mymatout 1.0.0
{pstd}
We will write that 14-character long string first thing in the file so that,
later when we write {cmd:mymatin}, we can read the string and verify it
contains what we expect. Signature lines should always contain a generic
identity ({cmd:mymatout} in this case) along with a version number which we
can change if we modify the output program to change the output format. This
way, the wrong input program cannot be used with a more up-to-date file format.
{pstd}
Our improved program is
program define {cmd:mymatout3}
version {ccl stata_version}
gettoken mname 0 : 0
syntax using/ [, replace]
local r = rowsof(`mname')
local c = colsof(`mname')
tempname hdl
file open `hdl' using `"`using'"', `replace' write binary
/* new */ {cmd:file write `hdl' %14s "mymatout 1.0.0"}
file write `hdl' %1b (byteorder())
file write `hdl' %2b (`r') %2b (`c')
forvalues i=1(1)`r' {c -(}
forvalues j=1(1)`c' {c -(}
file write `hdl' %8z (`mname'[`i',`j'])
{c )-}
{c )-}
file close `hdl'
end
{pstd}
This program works very well. After we wrote the corresponding
input routine (see {bf:{help file##3.3:3.3 Reading binary files}} below),
however, we noticed that our restored matrices lacked their original row and
column names, which led to a final round of changes:
program define {cmd:mymatout4}
version {ccl stata_version}
gettoken mname 0 : 0
syntax using/ [, replace]
local r = rowsof(`mname')
local c = colsof(`mname')
tempname hdl
file open `hdl' using `"`using'"', `replace' write binary
/* changed */ file write `hdl' %14s "mymatout 1.0.{cmd:1}"
file write `hdl' %1b (byteorder())
file write `hdl' %2b (`r') %2b (`c')
/* new */ {cmd:local names : rownames `mname'}
/* new */ {cmd:local len : length local names}
/* new */ {cmd:file write `hdl' %4b (`len') %`len's `"`names'"'}
/* new */ {cmd:local names : colnames `mname'}
/* new */ {cmd:local len : length local names}
/* new */ {cmd:file write `hdl' %4b (`len') %`len's `"`names'"'}
forvalues i=1(1)`r' {c -(}
forvalues j=1(1)`c' {c -(}
file write `hdl' %8z (`mname'[`i',`j'])
{c )-}
{c )-}
file close `hdl'
end
{pstd}
In this version, we added the lines necessary to write the row and column
names into the file. We write the row names by coding,
{cmd:local names : rownames `mname'}
{cmd:local len : length local names}
{cmd:file write `hdl' %4b (`len') %`len's `"`names'"'}
{pstd}
and we similarly write the column names. The interesting thing here is
that we need to write a string into our binary file for which the length
of the string varies. One solution would be
{cmd:file write `hdl' %67783s `"`mname'"'}
{pstd}
but that would be inefficient since, in general, the names are a lot shorter
than 67,783 characters. The solution is to obtain the length of the string
to be written and then write the length into the file. In the above
code, macro {cmd:`len'} contains the length, we write {cmd:`len'} as a
4-byte integer, and then we write the string using a {cmd:%`len's} format.
Consider what happens when {cmd:`len'} is, say, 50. We write 50 into the
file and then we write the string using a {cmd:%50s} format. Later, when
we read back the file, we can reverse this process, reading the length,
and then using the appropriate format.
{pstd}
Note one other change we made: we changed the signature from "mymatout
1.0.0" to "mymatout 1.0.1" and we did that because the file format changed.
Making that change ensures that an old read program does not attempt to read
a more modern format (and so produce incorrect results).
{hline}
{p 4 4 4}
{it:Technical note:}
You may write strings using {cmd:%}{it:#}{cmd:s} formats that are
narrower than, equal to, or wider than the length of the string being
written. When the format is too narrow, only that many characters of
the string are written. When the format and string are of the same width,
the entire string is written. When the format is wider than the string,
the entire string is written and then the excess positions in the file
are filled with binary zeros.
{p 4 4 4}
Binary zeros are special in strings because binary denotes the end of the
string. Thus when you read back the string, even if it was written in a
field that was too wide, it will appear exactly as it appeared originally.
{p_end}
{hline}
{marker 3.3}{...}
{title:3.3 Reading binary files}
{pstd}
You read binary files just as you wrote them,
{phang2}{cmd:file open} {it:handle} {cmd:using} {it:filename}{cmd:, read binary}{p_end}
{cmd:file read} {it:handle} ...
...
{cmd:file close} {it:handle}
{pstd}
When reading them, you must be careful to specify the same formats as you did
when you wrote the file.
{pstd}
The program that will read the matrices written by {cmd:mymatout1}, presented
below, has syntax
{cmd:mymatin1} {it:matname} {it:filename}
{pstd}
and the code is
{cmd:program define mymatin1}
{cmd:version {ccl stata_version}}
{cmd:gettoken mname 0 : 0 }
{cmd:syntax using/}
{cmd:tempname hdl}
{cmd:file open `hdl' using `"`using'"', read binary}
{cmd:tempname val}
{cmd:file read `hdl' %2b `val'}
{cmd:local r = `val'}
{cmd:file read `hdl' %2b `val'}
{cmd:local c = `val'}
{cmd:matrix `mname' = J(`r', `c', 0)}
{cmd:forvalues i=1(1)`r' {c -(}}
{cmd:forvalues j=1(1)`c' {c -(}}
{cmd:file read `hdl' %8z `val'}
{cmd:matrix `mname'[`i',`j'] = `val'}
{cmd:{c )-}}
{cmd:{c )-}}
{cmd:file close `hdl'}
{cmd:end}
{pstd}
When {cmd:file} {cmd:read} reads numeric values, they are always stored into
{help scalar:scalars}, and you specify the name of the scalar directly after
the binary numeric format. In our case, we are using the scalar named
{cmd:`val'}, where {cmd:`val'} is a name we obtained from {helpb tempname}. We
could just as well have used a fixed name, say {cmd:myscalar}, so the first
{cmd:file} {cmd:read} would read
{cmd:file read `hdl' %2b myscalar}
{pstd}
and we would similarly substitute {cmd:myscalar} everywhere {cmd:`val'}
appears, but that would make our program less elegant. If the user had
previously stored a value under the name {cmd:myscalar}, our values would
replace it.
{pstd}
In the second version of {cmd:mymatout}, we included the byte order. The
correspondingly improved version of {cmd:mymatin} is
program define {cmd:mymatin2}
version {ccl stata_version}
gettoken mname 0 : 0
syntax using/
tempname hdl
file open `hdl' using `"`using'"', read binary
tempname val
/* new */ {cmd:file read `hdl' %1b `val'}
/* new */ {cmd:local border = `val'}
/* new */ {cmd:file set `hdl' byteorder `border'}
file read `hdl' %2b `val'
local r = `val'
file read `hdl' %2b `val'
local c = `val'
matrix `mname' = J(`r', `c', 0)
forvalues i=1(1)`r' {c -(}
forvalues j=1(1)`c' {c -(}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -