⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 cdset.m

📁 wekaUT是 university texas austin 开发的基于weka的半指导学习(semi supervised learning)的分类器
💻 M
字号:
function cdset(dsname, N, dim, k)% CDSET         Create a Dataset having moVMF data and stores as CCS% % Description% % CDSET(DSNAME, N, DIM, K) takes as input:% %  DSNAME : Filename prefix for dataset%  N      : Num of points to create in set%  DIM    : Dimensionality of each point%  K      : Num of clusters to create%  %  It produces the following output files:%  DSNAME_DIM, DSNAME_COL_CCS, DSNAME_ROW_CCS, DSNAME_TFN_NZ%  DSNAME_LABEL, DSNAME_SOFT_LABEL, DSNAME_PRIORS, DSNAME_CENTERS,%  DSNAME_KAPPAS%  %  The above files are in CCS format and can be used by spkmeans%  family of programs. The DSNAME_LABEL file has the truelabels.%  %  Also see:%  EMSAMP, MIXINIT, VSAMPmix = mixinit(dim, k);			% Initialize Data struct% Now to initialize the mean vectors with random unit vectors.% Lets init the kappas also with some values (heuristically based% on dimension)for i=1:k  mix.centers(i,:)=unitrand(dim);  mix.kappas(i) = dim*dim;  if (dim > 10)    mix.kappas(i) = mix.kappas(i)*rand;  end  if (dim > 20)     mix.kappas(i) = max(dim*rand,200);  endenddfile=strcat(dsname, '_centers');fid = fopen(dfile, 'w');for i=1:k  fprintf(fid, '%f ', mix.centers(i,:));  fprintf(fid, '\n');endfclose(fid);dfile=strcat(dsname, '_kappas');fid = fopen(dfile, 'w');for i=1:k  fprintf(fid, '%f\n', mix.kappas(i));endfclose(fid);[A,l] = emsamp(mix, N);			% Sample data% Now A is the matrix of the sampled data% A is N x dim. We want to actually write out the CCS for A'% Because that corresponds with our notion of documents and words.% l is the vector of true labels.% From l we can also gain information about the true priors!%%%% NOTE: FILE I/O Error checking not done right now% Write out true label filelfile = strcat(dsname, '_label');fid = fopen(lfile, 'w');l=l-1;					% true label adjustfprintf(fid, '%d\n', l);                % Write out labelfclose(fid);% ========================================================% ADDED On 9/17/02% Computation of soft labels for the datasetpriors = ones(1,k);			% allocate space for priorsfor i=0:k-1  tmp = find(l==i);  priors(i+1) = size(tmp,1)/N;enddfile = strcat(dsname, '_priors');fid = fopen(dfile, 'w');for i=1:k  fprintf(fid, '%f\n', priors(i));endfclose(fid);% Now we compute the soft label for each point!!!% p(h|x) -> prior(h) * vmf(x,mix.kappas(h), mix.centers(h))% Then normalize All these to sum to 1 and voila we have the soft% labels.%probs = zeros(k,1);dfile = strcat(dsname, '_soft_label');fid = fopen(dfile, 'w');for i=1:size(A,1)  sum = 0;  for h=1:k    probs(h) = priors(h)*vmf(A(i,:), mix.centers(h,:), mix.kappas(h));    sum = sum + probs(h);  end  for h=1:k    probs(h) = probs(h)/ sum;  end  for j = 1:k    fprintf(fid, '%d %f ', j-1, probs(j));  end  fprintf(fid,'\n');endfclose(fid);% % ========================================================% Now we begin outputting the CCS format% First write out the dim filedfile = strcat(dsname,'_dim');fid = fopen(dfile, 'w');fprintf(fid, '%d %d %d', dim, N, N*dim);fclose(fid);% write the _words filewordfile = strcat(dsname,'_words');fid = fopen(wordfile, 'w');fprintf(fid, '%d\n', dim);for i=1:dim  fprintf(fid, 'feature_%d\n', i);endfclose(fid);% Now we write the _ccs filesdfile = strcat(dsname,'_col_ccs');fid = fopen(dfile, 'w');for i=1:N  fprintf(fid, '%d \n', (i-1)*dim);end% print an extra bogus linefprintf(fid, '%d \n', N*dim);fclose(fid);% Now we write the _row_ccs filedfile = strcat(dsname, '_row_ccs');fid = fopen(dfile, 'w');for j=1:N  for i=0:dim-1    fprintf(fid,'%d\n', i);  endendfclose(fid);% Now write out the _nz filedfile = strcat(dsname, '_tfn_nz');fid = fopen(dfile, 'w');% We write out rows of A coz they are A'for j=1:N  for i=1:dim    fprintf(fid, '%f\n', A(j,i));  endendfclose(fid);% Create arffCCScommand = ['CCS2arff.pl -i ' dsname];system(CCScommand);% rename the arff fileMVcommand = ['mv ' dsname '_fromCCS.arff ' num2str(dim) '-' num2str(k) '-' num2str(N) '.arff'];system(MVcommand);% remove CCS filesRMcommand = ['rm -rf ' dsname '_*'];system(RMcommand);% Now we have created the dataset

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -