📄 rfimpute
字号:
rfImpute package:randomForest R Documentation
_M_i_s_s_i_n_g _V_a_l_u_e _I_m_p_u_t_a_t_i_o_n_s _b_y _r_a_n_d_o_m_F_o_r_e_s_t
_D_e_s_c_r_i_p_t_i_o_n:
Impute missing values in predictor data using proximity from
randomForest.
_U_s_a_g_e:
## Default S3 method:
rfImpute(x, y, iter=5, ntree=300, ...)
## S3 method for class 'formula':
rfImpute(x, data, ..., subset)
_A_r_g_u_m_e_n_t_s:
x: A data frame or matrix of predictors, some containing 'NA's,
or a formula.
y: Response vector ('NA''s not allowed).
data: A data frame containing the predictors and response.
iter: Number of iterations to run the imputation.
ntree: Number of trees to grow in each iteration of randomForest.
...: Other arguments to be passed to 'randomForest'.
subset: A logical vector indicating which observations to use.
_D_e_t_a_i_l_s:
The algorithm starts by imputing 'NA's using 'na.roughfix'. Then
'randomForest' is called with the completed data. The proximity
matrix from the randomForest is used to update the imputation of
the 'NA's. For continuous predictors, the imputed value is the
weighted average of the non-missing obervations, where the weights
are the proximities. For categorical predictors, the imputed
value is the category with the largest average proximity. This
process is iterated 'iter' times.
Note: Imputation has not (yet) been implemented for the
unsupervised case. Also, Breiman (2003) notes that the OOB
estimate of error from randomForest tend to be optimistic when run
on the data matrix with imputed values.
_V_a_l_u_e:
A data frame or matrix containing the completed data matrix, where
'NA's are imputed using proximity from randomForest. The first
column contains the response.
_A_u_t_h_o_r(_s):
Andy Liaw
_R_e_f_e_r_e_n_c_e_s:
Leo Breiman (2003). Manual for Setting Up, Using, and
Understanding Random Forest V4.0. <URL:
http://oz.berkeley.edu/users/breiman/Using_random_forests_v4.0.pdf>
_S_e_e _A_l_s_o:
'na.roughfix'.
_E_x_a_m_p_l_e_s:
data(iris)
iris.na <- iris
set.seed(111)
## artificially drop some data values.
for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA
set.seed(222)
iris.imputed <- rfImpute(Species ~ ., iris.na)
set.seed(333)
iris.rf <- randomForest(Species ~ ., iris.imputed)
print(iris.rf)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -