⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 sample.ado

📁 是一个经济学管理应用软件 很难找的 但是经济学学生又必须用到
💻 ADO
字号:
*! version 2.1.2  28sep2004
program define sample, byable(onecall)
	version 7, missing

	if _caller()<7 {		/* deal with prior versions 	*/
		local c = string(_caller())
		if _by() {
			di as err /*
*/ "sample may not be combined with by when run under version `c'"
			exit 190
		}
		version `c', missing: Sample6 `0'
		exit
	}

					/* parse			*/
	local 0 `"=`0'"'
	syntax =/exp [if] [in] [, BY(varlist) Count]
	if _by() {
		if "`by'"!="" {
			di as err /*
			*/ "by varlist: and by() option may not be combined"
			exit 190
		}
		local by `_byvars'
	}

	confirm number `exp'
	if "`count'"=="" {
		if (`exp'<0 | `exp'>=100) {
			if `exp'==100 { exit }
			di as err "{p 0 4}"
			di as err "`exp':"
			di as err /*
*/ "# must be between 0 and 100 and is interpreted as a percent unless" _n /*
*/ "option count is also specified, in which case # is interpreted as" _n /*
*/ "the number of observations to be drawn" _n /*
*/ "{p_end}"
			error 198
		}
	}
	else {
		confirm integer number `exp'
		if `exp'<0 | `exp'>=_N {
			if `exp'>=_N { exit }
			di as err "{p 0 4}"
			di as err "`exp':"
			di as err /*
*/"# must be 0 or positive; with the count option, it represents the" _n /*
*/ "# of observations to be drawn" _n /*
*/ "{p_end}"
		}
	}
					/* parsing complete except for
					   in with by()			*/
	tempvar tokeep u1 u2

	qui gen byte `tokeep' = 0 `if' `in'
	quietly count if `tokeep'==0
	if r(N)==0 { exit }

	qui gen float `u1' = uniform()
	qui gen float `u2' = uniform()

					/* does not have by()		*/
	if "`by'"=="" {
		if "`count'"=="" {
			local N = int(r(N)*(`exp')/100+.5)
		}
		else	local N = `exp'
		sort `tokeep' `u1' `u2'
		qui replace `tokeep'=1 if `tokeep'==0 & _n<=`N'
		drop if `tokeep'==0
		exit
	}

					/* does have by()		*/
	if _by()==0 & "`in'"!="" {
		di in red "in may not be combined with by()"
		exit 190
	}

	sort `by' `u1' `u2'
	quietly {
		if "`count'"=="" {
			tempvar N
			by `by': gen long `N'=sum(`tokeep'==0)
			replace `tokeep'=1 if `tokeep'>=.
			by `by': replace `N'=int(`N'[_N]*`exp'/100+.5)
		}
		else	local N = `exp'
		by `by': replace `tokeep'=1 if `tokeep'==0 & _n<=`N'
	}
	drop if `tokeep'==0
end



* Sample 6 is version 1.1.4  03oct2000/22dec1998
program define Sample6
	version 6, missing
	local 0 `"=`0'"'
	syntax =/exp [if] [in] [, BY(varlist)]

	if "`by'"~="" {
		if "`in'"!="" {
			di in red "in may not be combined with by()"
			exit 190
		}
		local sort "sort `by'"
		local byp "by `by':"
	}
	confirm number `exp'
	if `exp'<0 | `exp'>100 { error 198 }
	if `exp'==100 { exit }

	tempvar TOKEEP RANDOM N
	quietly {
		`sort'
		`byp' gen byte `TOKEEP' = 0 `if' `in'
		count if `TOKEEP'==0
		if r(N)==0 { exit }
		`byp' gen long `N' = sum(`TOKEEP'==0)
		replace `TOKEEP' = 1 if `TOKEEP'>=.
		`byp' replace `N' = int(`N'[_N]*`exp'/100+.5)
	}
	if _caller()<4.0 {
		gen float `RANDOM' = uniform0()
	}
	else	gen float `RANDOM' = uniform()
	sort `by' `TOKEEP' `RANDOM'
	quietly `byp' replace `TOKEEP'=1 if `TOKEEP'==0 & _n<=`N'
	drop if `TOKEEP'==0
end
exit

Sat Feb 17 12:19:27 CST 2001
----------------------------

Comments by William Gould, wgould@stata.com:

The version 7 sample is oddly written in that it is struggling to perserve
compatibility with the pre-version 7 versions of -sample-; the same samples
are returned by pre-version 7 and version 7 whenever that is possible.


The problem
-----------

Define "reproducibility" as obtaining the same results from -sample-
if you set the random-number seed before hand.

Concerning the pre version 7 -sample-, there are the following problems:

	1.  -sample, by()- is not reproducible because, early in the
	    code, -sample- sorts by by().

	2.  In all cases, there is another problem with reproducibility
	    when -sample- is used with large datasets.  -sample- stores
	    random numbers as floats.  One must worry that the same random
	    number is drawn for two or more observations.

	    Your first take is that this must be a zero-probability event.
	    Not true.

		                 sims	             (stored as float)
		   obs   sims   w/ dups   med dups
		----------------------------------
		 2,500	  100      12        0
		 5,000    100      42        0
		10,000    100      86        1
		20,000    100      99        7
		40,000    100     100       32
	       100,000    100     100      197

	    Storing uniform() as double does not really solve the problem,
	    because the float rounding is losing only one hex digit:

		                 sims                 (stored as double)
		   obs   sims   w/ dups   med dups
		----------------------------------
		 2,500	  100       0        0
		 5,000    100       0        0
		10,000    100       1        0
		20,000    100       6        0
		40,000    100      22        0
	       100,000    100      64        1

	    What will solve the problem is generating two uniform()s
	    (and in that case, both can be stored as floats).

	    The above tables concerning dups may concern you more than
	    is justified; remember, for the problem to affect -sample-,
	    a duplicate must occur and it must occur at the cutoff
	    point.  This is why the problem has gone undiscovered for
	    so long.


Backwards compatability
-----------------------

The code has been modified with an eye to backwards compatibility.

Using -sample- without -by()-, results will be the same if the above
problem did not previously occur.  In cases where the problem did
previously occur, the samples will be nearly identical; the tie will now
be broken deterministically.   The above applies when you set the
random-number seed immediately before use of -sample-.  Consider

	. set seed ...
	. sample 5 if group==1
	. sample 5 if group==2
	. sample 5 if group==3

The samples you will now obtain for groups 2 and 3 will be different
because each -sample- will leave the random-number generator in a different
state.  All is deterministic, it is just different.

Concerning use of -sample- with the -by()- option, that is now different
due to the bug previously mentioned.  You will, however, obtain the same
results in the case where the data was already in the order of -by()-
(whether the sort markers were set or not).


Version control
---------------

If you set -version 6- or earlier, you get the old -sample-.  Results will
be the same as they can be.  The problem with -sample- has nothing to do
with it not pulling a random sample, but being able to set the seed and
pull the same random sample.

-sample- existed for years before anyone noticed the reproducibility problem.
It is therefore judged that backwards compatibility is more important than
fixing a reproducibility problem.

<end>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -