📄 dta.hlp

📁 是一个经济学管理应用软件很难找的但是经济学学生又必须用到
💻 HLP
📖 第 1 页 / 共 2 页
字号:
上一页 12
is the second variable in the {cmd:varlist}, the record will appear as

	{cmd:00020000............}  (if {cmd:byteorder}==HILO)
	{cmd:02000000............}  (if {cmd:byteorder}==LOHI)

{pstd}
If the dataset is sorted by {cmd:myvar} and within {cmd:myvar} by {cmd:vbl1},
and if {cmd:vbl1} is the first variable in the dataset, the record will appear
as

	{cmd:000200010000........}  (if {cmd:byteorder}==HILO)
	{cmd:020001000000........}  (if {cmd:byteorder}==LOHI)


{pstd}
If {cmd:srtlist} were read into the C-array {cmd:short} {cmd:int}
{cmd:srtlist[]}, then {cmd:srtlist[0]} would be the number of the first sort
variable or, if the data were not sorted, 0.  If the number is not zero,
{cmd:srtlist[1]} would be the number of the second sort variable or, if there
is not a second sort variable, 0, and so on.

{pstd}
{cmd:fmtlist} contains the formats of the variables 1, ..., {cmd:nvar}.  Each
format is 12 bytes long and includes a binary zero end-of-string marker.  For
instance,

	{cmd:%9.0f\0......%8.2f\0......%20.0g\0.....%d\0.........%dD_m_Y\0....}

{pstd}
indicates that variable 1 has a {cmd:%9.0f} format, variable 2 a {cmd:%8.2f}
format, variable 3 a {cmd:%20.0g} format, and so on.  Note that these are
Stata formats, not C formats.  In particular, {cmd:%d} is not an integer
format, it is Stata's default date format.  {cmd:%dD_m_Y} is a detailed Stata
date format.  Be aware,

{phang2}
1.  Formats beginning with {cmd:%d}, {cmd:%-d}, {cmd:%t}, and {cmd:%-t} are
    date formats.

{phang2}
2.  Nondate formats ending in {cmd:gc} or {cmd:fc} are similar to C's {cmd:g}
    and {cmd:f} formats, but with commas.  Most translation routines would
    ignore the ending {cmd:c} (change it to {cmd:\0}).

{phang2}
3.  Formats may contain commas rather than period, such as {cmd:%9,2f},
    indicating European format.

{pstd}
If {cmd:fmtlist} is read into the C-array {cmd:char} {cmd:fmtlist[]}, then
{cmd:&fmtlist[12*(i-1)]} refers to the starting address of the format for the
{cmd:i}th variable.

{pstd}
{cmd:lbllist} contains the names of the value formats associated with the
variables 1, ..., {cmd:nvar}.  Each value-format name is 33 bytes long and
includes a binary zero end-of-string marker.  For instance,

	0   33        66   99
	|    |         |    |
	{cmd:\0...yesno\0...\0...yesno\0...}

{pstd}
indicates that variables 1 and 3 have no value label associated with them,
whereas variables 2 and 4 are both associated with the value label named
{cmd:yesno}.  If {cmd:lbllist} is read into the C-array {cmd:char}
{cmd:lbllist[]}, then {cmd:&lbllist[33*(i-1)]} points to the start of the
label name associated with the {cmd:i}th variable.


{marker variable_labels}{...}
{title:5.3  Variable labels}

{pstd}
The Variable Labels are recorded as

	Contents            Length    Format     Comments
	{hline 54}
	Variable 1's label      81    char       \0 terminated
	Variable 2's label      81    char       \0 terminated
	...
	Variable {cmd:nvar}'s label   81    char       \0 terminated
	{hline 54}
	Total              81*{cmd:nvar}

{pstd}
If a variable has no label, the first character of its label is \0.


{marker expansion_fields}{...}
{title:5.4  Expansion fields}

{pstd}
The Expansion Fields are recorded as

	Contents            Length    Format     Comments
	{hline 68}
	data type                1    byte       coded, only 0 and 1 defined
	len                      4    int        encoded per {cmd:byteorder}
	contents               len    varies

	data type                1    byte       coded, only 0 and 1 defined
	len                      4    int        encoded per {cmd:byteorder}
	contents               len    varies

	data type                1    byte       code 0 means end
	len                      4    int        0 means end
	{hline 68}

{pstd}
Expansion fields conclude with code 0 and len 0; before the termination
marker, there may be no or many separate data blocks.
Expansion fields are used to record information that is unique to
Stata and has no equivalent in other data management packages.
Expansion fields are always optional when writing data and, generally,
programs reading Stata datasets will want to ignore the expansion fields.
The format makes this easy.  When writing, write 5 bytes of zeros for this
field.  When reading, read five bytes; the last four bytes now tell you the
size of the next read, which you discard.  You then continue like this until
you read 5 bytes of zeros.

{pstd}
The only expansion fields currently defined are type 1 records for variable's
{help char:characteristics}.  The design, however, allows new types of
expansion fields to be included in subsequent releases of Stata without
changes in the data format since unknown expansion types can simply be
skipped.

{pstd}
For those who care, the format of type 1 records is a binary-zero terminated
variable name in bytes 0-32, a binary-zero terminated characteristic name in
bytes 33-65, and a binary-zero terminated string defining the contents in
bytes 66 through the end of the record.


{marker data}{...}
{title:5.5  Data}

{pstd}
The Data are recorded as

	Contents                  Length         Format
	{hline 47}
	obs 1, var 1         per {cmd:typlist}    per {cmd:typlist}
	obs 1, var 2         per {cmd:typlist}    per {cmd:typlist}
	...
	obs 1, var {cmd:nvar}      per {cmd:typlist}    per {cmd:typlist}

	obs 2, var 1         per {cmd:typlist}    per {cmd:typlist}
	obs 2, var 2         per {cmd:typlist}    per {cmd:typlist}
	...
	obs 2, var {cmd:nvar}      per {cmd:typlist}    per {cmd:typlist}
	.
	.
	obs {cmd:nobs}, var 1      per {cmd:typlist}    per {cmd:typlist}
	obs {cmd:nobs}, var 2      per {cmd:typlist}    per {cmd:typlist}
	...
	obs {cmd:nobs}, var {cmd:nvar}   per {cmd:typlist}    per {cmd:typlist}
	{hline 47}

{pstd}
The data are written as all the variables on the first observation, followed
by all the data on the second observation, and so on.  Each variable is
written in its own internal format, as given in {cmd:typlist}.  All values are
written per {cmd:byteorder}.  Strings are null terminated if they are shorter
than the allowed space, but they are not terminated if they occupy the full
width.

{pstd}
End-of-file may occur at this point.  If it does, there are no value labels to
be read.  End-of-file may similarly occur between value labels.  On end-of-file,
all data have been processed.


{marker value_labels}{...}
{title:5.6  Value labels}

{pstd}
If there are no value labels, end-of-file will have occurred while reading the
data.  If there are value labels, each value label is written as

	Contents               len   format     comment
	{hline 67}
	{cmd:len}                      4   int        length of {cmd:value_label_table}
	{cmd:labname}                 33   char       \0 terminated
	padding                  3
	{cmd:value_label_table}      {cmd:len}              see next table
	{hline 67}

{pstd}
and this is repeated for each value label included in the file.
The format of the {cmd:value_label_table} is

	Contents               len   format     comment
	{hline 58}
	{cmd:n}                        4   int        number of entries
	{cmd:txtlen}                   4   int        length of {cmd:txt[]}
	{cmd:off[]}                  4*{cmd:n}   int array  {cmd:txt[]} offset table
	{cmd:val[]}                  4*{cmd:n}   int array  sorted value table
	{cmd:txt[]}               {cmd:txtlen}   char       text table
	{hline 58}

{pstd}
{cmd:len}, {cmd:n}, {cmd:txtlen}, {cmd:off[]}, and {cmd:val[]} are encoded per
{cmd:byteorder}.  The maximum length of {cmd:txt[]} for a label is 32,000
characters.  Stata is robust to datasets which might contain labels longer
than this; labels which exceed the limit, if any, will be dropped
during a {helpb use}.

{pstd}
For example, the {cmd:value_label_table} for 1<->yes and 2<->no, shown in HILO
format, would be

	byte position:  00 01 02 03   04 05 06 07   08 09 10 11   12 13 14 15
	     contents:  00 00 00 02   00 00 00 07   00 00 00 00   00 00 00 04
	      meaning:        n = 2    txtlen = 7    off[0] = 0    off[1] = 4

	byte position:  16 17 18 19   20 21 22 23   24 25 26 27 28 29 30
	     contents:  00 00 00 01   00 00 00 02    y  e  s 00  n  o 00
	      meaning:   val[0] = 1    val[1] = 2    txt --->

{pstd}
The interpretation is that there are {cmd:n}=2 values being mapped.
The values being mapped are {cmd:val[0]}=1 and {cmd:val[1]}=2.
The corresponding text for {cmd:val[0]} would be at {cmd:off[0]}=0
(and so be "{cmd:yes}") and for {cmd:val[1]} would be at
{cmd:off[1]}=4 (and so be "{cmd:no}").

{pstd}
Interpreting this table in C is not as daunting as it appears.  Let
{cmd:(char *) p} refer to the memory area into which {cmd:value_label_table}
is read.  Assume your compiler uses 4-byte {cmd:int}s.  The following
manifests make interpreting the table easier:

	{cmd}#define SZInt               4
	#define Off_n               0
	#define Off_nxtoff          SZInt
	#define Off_off             (SZInt+SZInt)
	#define Off_val(n)          (SZInt+SZInt+n*SZInt)
	#define Off_txt(n)          (Off_val(n) + n*SZInt)
	#define Len_table(n,nxtoff) (Off_txt(n) + nxtoff)

	#define Ptr_n(p)            ( (int *) ( ((char *) p) + Off_n ) )
	#define Ptr_nxtoff(p)       ( (int *) ( ((char *) p) + Off_nxtoff ) )
	#define Ptr_off(p)          ( (int *) ( ((char *) p) + Off_off ) )
	#define Ptr_val(p,n)        ( (int *) ( ((char *) p) + Off_val(n) ) )
	#define Ptr_txt(p,n)        ( (char *) ( ((char *) p) + Off_txt(n) ) ){txt}

{pstd}
It is now the case that {cmd:for(i=0; i < *Ptr_n(p); i++)},
the value {cmd:*Ptr_val(p,i)} is mapped to the character string
{cmd:Ptr_txt(p,i)}.

{pstd}
Remember in allocating memory for {cmd:*p} that the table can be big.  The
limits are {cmd:n}=65,536 mapped values with each value being up to 81
characters long (including the null terminating byte).  Such a table would be
5,823,712 bytes long.  No user is likely to approach that limit and, in any
case, after reading the 8 bytes preceding the table ({cmd:n} and
{cmd:txtlen}), you can calculate the remaining length as
2*4*{cmd:n}+{cmd:txtlen} and {cmd:malloc()} the exact amount.

{pstd}
Constructing the table is more difficult.  The easiest approach is to set
arbitrary limits equal to or smaller than Stata's as to the maximum number of
entries and total text length you will allow and simply declare the three
pieces {cmd:off[]}, {cmd:val[]}, and {cmd:txt[]} according to
those limits:

	{cmd}int off[MaxValueForN] ;
	int val[MaxValueForN] ;
	char txt[MaxValueForTxtlen] ;{txt}

{pstd}
Stata's internal code follows a more complicated strategy of always keeping
the table in compressed form and having a routine that will "add one position"
in the table.  This is slower but keeps memory requirements to be no more than
the actual size of the table.

{pstd}
In any case, when adding new entries to the table, remember that {cmd:val[]}
must be in ascending order:  {cmd:val[0]} < {cmd:val[1]} < ... < {cmd:val[n]}.

{pstd}
It is not required that {cmd:off[]} or
{cmd:txt[]} be kept in ascending order.  We previously offered the example of
the table that mapped 1<->yes and 2<->no:

	byte position:  00 01 02 03   04 05 06 07   08 09 10 11   12 13 14 15
	     contents:  00 00 00 02   00 00 00 07   00 00 00 00   00 00 00 04
	      meaning:        n = 2    txtlen = 7    off[0] = 0    off[1] = 4

	byte position:  16 17 18 19   20 21 22 23   24 25 26 27 28 29 30
	     contents:  00 00 00 01   00 00 00 02    y  e  s 00  n  o 00
	      meaning:   val[0] = 1    val[1] = 2    txt --->

{pstd}
This table could just as well be recorded as

	byte position:  00 01 02 03   04 05 06 07   08 09 10 11   12 13 14 15
	     contents:  00 00 00 02   00 00 00 07   00 00 00 03   00 00 00 00
	      meaning:        n = 2    txtlen = 7    off[0] = 3    off[1] = 0

	byte position:  16 17 18 19   20 21 22 23   24 25 26 27 28 29 30
	     contents:  00 00 00 01   00 00 00 02    n  o 00  y  e  s 00
	      meaning:   val[0] = 1    val[1] = 2    txt --->

{pstd}
but it could not be recorded as

	byte position:  00 01 02 03   04 05 06 07   08 09 10 11   12 13 14 15
	     contents:  00 00 00 02   00 00 00 07   00 00 00 04   00 00 00 00
	      meaning:        n = 2    txtlen = 7    off[0] = 4    off[1] = 0

	byte position:  16 17 18 19   20 21 22 23   24 25 26 27 28 29 30
	     contents:  00 00 00 02   00 00 00 01    y  e  s 00  n  o 00
	      meaning:   val[0] = 2    val[1] = 1    txt --->

{pstd}
It is not the out-of-order values of {cmd:off[]} that cause problems; it is
out-of-order values of {cmd:val[]}.  In terms of table construction, we find
it easier to keep the table sorted as it grows.  This way one can use a binary
search routine to find the appropriate position in {cmd:val[]} quickly.

{pstd}
The following routine will find the appropriate slot.  It uses the manifests
we previously defined and thus it assumes the table is in compressed form, but
that is not important.  Changing the definitions of the manifests to point to
separate areas would be easy enough.

	{cmd}/*
	    slot = vlfindval(char *baseptr, int val)

	    Looks for value val in label at baseptr.
		If found:
			returns slot number:  0, 1, 2, ...
		If not found:
			returns k<0 such that val would go in slot -(k+1)
				k== -1        would go in slot 0.
				k== -2        would go in slot 1.
				k== -3        would go in slot 2.
	*/

	int vlfindval(char *baseptr, int myval)
	{c -(}
		int	n ;
		int	lb, ub, try ;
		int	*val ;
		char    *txt ;
		int	*off ;
		int	curval ;

		n   = *Ptr_n(baseptr) ;
		val =  Ptr_val(baseptr, n) ;

		if (n==0) return(-1) ;  /* not found, insert into 0 */

					/* in what follows,		   */
					/* we know result between [lb,ub   */
					/* or it is not in the table	   */
		lb = 0 ;
		ub = n - 1 ;
		while (1) {c -(}
			try = (lb + ub) / 2 ;
			curval = val[try] ;
			if (myval == curval) return(try) ;
			if (myval<curval) {c -(}
				ub = try - 1 ;
				if (ub<lb) return(-(try+1)) ;
				/* because want to insert before try, ergo,
			   	want to return try, and transform is -(W+1). */
			{c )-}
			else /* myval>curval */ {c -(}
				lb = try + 1 ;
				if (ub<lb) return(-(lb+1)) ;
				/* because want to insert after try, ergo,
			   	want to return try+1 and transform is -(W+1) */
			{c )-}
		{c )-}
		/*NOTREACHED*/
	{c )-}{txt}


{title:Also see}

{psee}
Manual:  {bf:[P] file formats .dta}

{psee}
Online:  {helpb save}, {helpb use};
{helpb sysuse}, {helpb webuse};
{help dta_contents}
{p_end}
上一页 12
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -