codebook_problems.hlp
来自「是一个经济学管理应用软件 很难找的 但是经济学学生又必须用到」· HLP 代码 · 共 259 行
HLP
259 行
{smcl}
{* 09dec2004}{...}
{cmd:codebook problems}
{hline}
{title:Potential problems reported by codebook}
{pstd}
{helpb codebook} with the {cmd:problems} option diagnoses possible
problems with variables.
{p 8 10 2}
1. Constant variables, including variables that are always missing
{p_end}
{p 8 10 2}
2. Variables with nonexisting value labels
{p_end}
{p 8 10 2}
3. Incompletely labeled variables
{p_end}
{p 8 10 2}
4. String variables that may be compressed
{p_end}
{p 8 10 2}
5. String variables with leading or trailing blanks
{p_end}
{p 8 10 2}
6. String variables with embedded blanks
{p_end}
{p 8 10 2}
7. Noninteger-valued date variables
{p_end}
{pstd}
These possible problems are discussed below.
{title:Discussion}
{pstd}
1. Constant variables, including variables that are always missing
{pin}
Variables that are constant, taking the same value in all observations,
or that are always missing, are in many cases superfluous. Such variables,
however, may also indicate problems. For instance,
variables that are always missing may occur when importing data with an
incorrect input specification. Such variables may also occur if you generate
a new variable for a subset of the data, selected with an expression that
is false for all observations.
{pin}
Advice: Carefully check the origin of constant variables. If you are saving
a constant variable, be sure to {helpb compress} the variable to use minimal
storage.
{pstd}
2. Variables with nonexisting value labels
{pin}
Stata treats value labels as separate objects that can be attached to one or
more variables; see {helpb label} for more information. A problem may
arise if variables are linked to value labels that are not yet defined or an
incorrect value label name was used.
{pin}
Advice: Attach the correct value label, or {helpb label define} the value
label.
{pstd}
3. Incompletely labeled variables
{pin}
A variable is called "incompletely value labeled" if the variable is value
labeled but no mapping is provided for some values of the variable. An
example is a variable with values 0, 1, and 2 and value labels for 1, 2, and
3. This situation usually indicates an error, either in the data or in the
value label.
{pin}
Advice: Change either the data or the value label.
{pstd}
4. String variables that may be compressed
{pin}
The storage space used by a string variable is determined by its
{help data types:data type}. For instance, the storage type
{cmd:str20} indicates that 20 bytes are used per observation. If the
declared storage type exceeds your requirements, memory and disk space is
wasted.
{pin}
Advice: Use {helpb compress} to store the data as compactly as possible.
{pstd}
5. String variables with leading or trailing blanks
{pin}
In most applications, leading and trailing spaces do not affect the
meaning of variables but are likely side effects from importing the
data or from data manipulation. Spurious leading and trailing spaces force
Stata to use more memory than required. Also, manipulating strings with
leading and trailing spaces is harder.
{pin}
Advice: Remove leading and trailing blanks from a string
variable {cmd:s} by
{space 12}{cmd:replace s = trim(s)}
{pin}
see {helpb trim()}.
{pstd}
6. String variables with embedded blanks
{pin}
Many times string variables with embedded blanks are appropriate; however,
sometimes they indicate problems importing the data.
{pin}
Advice: Verify that blanks are meaningful in the variables.
{pstd}
7. Noninteger-valued date variables
{pin}
Stata's {help dfmt:date formats} were designed for use with integer values
but will work with noninteger values.
{pin}Advice: Carefully inspect the nature of the noninteger values. In
case noninteger values in a variable are the consequence of
round-off error, you may want to round the variable to the nearest integer.
{pin2}
{cmd:replace time = round(time)}
{title:Other potential problems}
{pstd}
The list of potential problems in data is likely endless. Therefore,
{cmd:codebook} cannot do a complete job. A partial list of other
common problems and possible remedies in Stata follows.
{pstd}
a. Numerical data stored as strings
{pin}
After importing data into Stata, you may discover that some
string variables can actually be interpreted as numbers. Stata can
do much more with numerical data than with string data. Moreover, string
representation usually makes less-efficient use of computer resources.
{helpb destring} will convert string variables to numeric.
{pin}
A string variable may contain a "field"
with numeric information. An example is an address variable that contains
the street name followed by the house number. The Stata
{help string functions} can extract the relevant substring.
{pstd}
b. Categorical variables stored as strings
{pin}
Most statistical commands do not allow string variables. Moreover,
string variables that take only a limited number of distinct values
are an inefficient storage method. Use value labeled
numeric values, instead. These are easily created with {helpb encode}.
{pstd}
c. Duplicate observations
{pin}
See {helpb duplicates}.
{pstd}
d. Observations that are always missing
{pin}
Drop observations that are missing for all variables in {it:varlist} using
the {cmd:robs()} {helpb egen} function:
{space 12}{cmd:egen nobs = robs(}{it:varlist}{cmd:)}
{space 12}{cmd:drop if nobs==0}
{pin}
Specify {cmd:_all} for {it:varlist} if only observations that are
always missing should be dropped.
{title:Saved results}
{pstd}
{cmd:codebook} with the {cmd:problems} option saves in {cmd:r()} the lists of
variables with potential problems:
{p 8 24 2}
{cmd:r(cons)} {space 12} constant or always missing
{p_end}
{p 8 24 2}
{cmd:r(labelnotfound)} {space 3} undefined value labeled
{p_end}
{p 8 24 2}
{cmd:r(notlabeled)} {space 6} value labeled, but with unlabeled categories
{p_end}
{p 8 24 2}
{cmd:r(str_type)} {space 8} compressible
{p_end}
{p 8 24 2}
{cmd:r(str_leading)} {space 5} leading blanks
{p_end}
{p 8 24 2}
{cmd:r(str_trailing)} {space 4} trailing blanks
{p_end}
{p 8 24 2}
{cmd:r(str_embedded)} {space 4} embedded blanks
{p_end}
{p 8 24 2}
{cmd:r(realdate)} {space 8} noninteger dates
{p_end}
{pstd}
After running {cmd:codebook}, you can review the lists of variables with
potential problems
{tab}{cmd:. return list}
{pstd}
To describe the variables with potential problem {it:prob}:
{tab}{cmd:. describe `r(}{it:prob}{cmd:)'}
{pstd}
For example, to describe the variables that are incompletely value labeled,
{tab}{cmd:. describe `r(notlabeled)'}
{title:Also see}
{psee}
Manual: {bf:[D] codebook}
{psee}
Online: {helpb codebook}, {helpb label}, {helpb labelbook},
{help labelbook problems}
{p_end}
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?