📄 changelog

📁 机器学习作者tom mitchell的书上代码
💻
📖 第 1 页 / 共 5 页
字号:
	vectors	with copies of data).		* svm_base.c (svm_options[]): Alphabetized to improve readability.	(svm_parse_opt): Re-ordered to mostly alphabetical to improve 	readability.	(get_top_n): fixed a bug that popped up in obscure places & switched	to a more intelligent algorithm (don't know why it was dumb in the 	first place).	(svm_remove_bound_examples): changed the removal code around (again)	as part of the new svm model.  The fn now removes either bound, or 	misclassified documents & is called by solve_svm (the most inner 	svm fn. that calls a solver).	(svm_trans_or_chunk): removed chunk_svm for this.  Calls either 	transduce_svm or solve_svm depending on the parameters/data.	(svm_tlf): Top-Level-Fn.  Permutes data & outputs a hyperplane in 	bow_wv if possible.  This fn also chooses/sets up the proper fn 	(al, trans, removal, etc) to call.		* svm_loqo.c: Updated to work with cvect instead of svm_C.  Now 	all upper bounds come from the cvect parameter which MUST be 	properly initialized.  (this is necessary for transduction & 	possibly other things).        * svm_smo.c (opt_pair): fixed a blatant bug in the solver (the        examples were added to I0 set in cases where they shouldn't have        been [see keerthi, et al for exactly where the examples should be        added if they weren't already present]).        Now the upper bounds come from the cvect instead of svm_C.  The        algorithm is almost identical.  The only difference is a little        bit more notice to the exact upper bounds on each of the boxes.        * svm_trans.c: stable version.  Has new interface with the svm        model.  No known bugs.  The code does have some gross        inefficiencies (always zero-ing out temporary values & weights,        causing the solvers to restart each time), but all of the output        examined has been correct.        * bow/svm.h: Updated for a new svm interface.  The relationship        between the different solvers is much cleaner now that redundant        code has been mostly eliminated.        Note - the prototypes for most functions have changed, as the        structure of most of the higher-level svm code has changed.1999-12-01  Kamal Nigam  <knigam@zeno.jprc.com>	* .cvsignore: added rainbow-be to ignore list	* .cvsignore: added rainbow-rank to ignore list.	* em.c (bow_cdoc_is_train_or_unlabeled): moved to split.c	(bow_em_new_vpc_with_weights): removed usage of halt_using_perplexity.	This option is broken, and its code was hurting performance.	* bow/libbow.h (bow_cdoc_is_train_or_unlabeled): New prototype.	* split.c (bow_files_source_type): added code for the	bow_files_source_fraction_train and bow_files_source_number_train.	This is indicated by a following t which converts some number of	training documents.  For example, --unlabeled-set=500t takes 500	training docs and converts them to unlabeled docs.	(bow_split_options): Likewise.	(bow_split_parse_opt): Likewise.	(bow_set_doc_types_randomly_by_fraction_remaining): Likewise.	(bow_set_doc_types): Likewise.	(bow_set_doc_types_randomly_by_count_per_class): Added argument	source_tag.  To get previous behavior, call this with source_tag	equal to bow_doc_untagged.  Used for the new options	bow_files_source_number_train and bow_files_source_fraction_trai.	(bow_set_doc_types_randomly_by_count): Likewise.	(bow_set_doc_types_randomly_by_fraction): Likewise.	(bow_cdoc_is_train_or_unlabeled): New function.	* maxent.c (maxent_options): Added code for new options	--maxent-iteration-docs and --maxent-constraint-docs.	(maxent_parse_opt): Likewise.	(bow_maxent_new_vpc_with_weights_doc_then_word): Likewise.	(bow_maxent_new_vpc_with_weights): Likewise.1999-11-10  Andrew McCallum  <mccallum@justresearch.com>	* svm_base.c (sqrtf): New macro, necessary on some non-Linux	machines.  Bug reported by Chuck Rosenberg.1999-11-08  Andrew McCallum  <mccallum@justresearch.com>	* readme.texi: Add simple usage examples for arrow.	* arrow.c (arrow_serve2): Implement the 'query' command.  Change	XML labels from "archer" to "arrow".	(main): Change default number of hits on a query from 1 to 10.	* libbow-desc.texi: Update descriptions.	* svm_base.c: Surround many condition man printf's on the	bow_verbosity_level, so that by default rainbow-stats will still	work.	* array.c (cdocs_iterator_count_for_doc): Replace NAN macro with	arithmetic equivalent.	* barrel.c (barrel_iterator_count_for_doc): Likewise.	* wv.c (bow_wv_weight_sum): New function.	* bow/libbow.h: Declare new function.	* train_dirichlet.c (moment_match_mccallum): Separate	implementation of moment matching that determines the variance by	averaging the variance of all dimentions.	(train_dirichlet_mom_sparse): New function.	* bow/train_dirichlet.h: Declare new function.	* tfidf.c (TFIDF_METHOD): Use	bow_wv_set_weights_to_count_times_idf() instead of	bow_wv_set_weights_to_count(), as is correct for TFIDF.  This was	previously corrected in the scoring function.	(bow_tfidf_params_tfidf): Change parameter settings for "tfidf"	method.  Previously it was identical to the "tfidf_log_words"	method, now it is identical to the "tfidf_log_occur" method.  In	other words, previously it calculated IDF using the number of	times the word occurred in the training data; now it uses the	number of training documents in which the word occurs.	* split.c (bow_split_options): Remove documentation for 'r'	suffix.  It's confusing and shouldn't be used unawares.	(bow_split_parse_opt): Add a 'pcr' suffix, but its not implemented	yet.	(bow_set_doc_types_randomly_by_count_per_class): Count the number of	untagged documents in each class, and if this function is trying	to tag more than are available, simply have this function tag	less.	* rainbow.c (bow_print_log_odds_ratio): Handle words that are not	in the vocabulary.	* ddf.c: Implement ddfmm classification method.  This method fits	the Dirichlet by moment matching only.	* arrow.c (arrow_serve2): New function.  Now call this instead of	arrow_serve.  It provides output in XML, like archer does.  Only	the rank command is implemented.1999-11-02  Andrew McCallum  <mccallum@justresearch.com>	* int4str.c (bow_int2str): Assert that INDEX argument is	non-negative.1999-10-28  Gregory C Schohn  <gcs@cmu.edu>	* svm_base.c (svm_vpc_merge): fixed bug for svml-basename - all 	the docs still need to be output, so that the other data (like 	word weights can be properly extracted).1999-10-28  Andrew McCallum  <mccallum@justresearch.com>	* cdmem.c (cdmem_options): New command-line option	"cdmem-dist-data".	(cdmem_parse_opt): Handle it.	(bow_cdmem_new_vpc_with_weights): Let the command-line option	determine what documents are used to learn the distance metric.	* README-SVM (Outputing data): Added new section describing how to	produce files ready for input into SVM^light.1999-10-27  Gregory C Schohn  <gcs@cmu.edu>	* svm_base.c (svm_vpc_merge): fixed svml bugs	* svm_base.c fixed outdated documentation for parse info.	* svm_smo.c (smo): fixed a parse error1999-10-26  Gregory C Schohn  <gcs@cmu.edu>	* rainbow.c (rainbow_test): added a line for svms.  When svmlight 	output is being generated, rainbow_test prints the label (only 	works for binary barrels) so that svm_score can append the data 	for that example.	* svm_base.c (svm_options[]): removed some of the single character 	switches.  Added arguments for tsvms & added svml-basename arg.	(svm_permute_data, svm_unpermute_data): added.	(infogain): should have made infogain compatible with sets with	unlabeled data (it ignore those docs with y = 0). 	(svm_vpc_merge): added support for using unlabeled docs for	transduction.  Also added code to spit out svmlight friendly	files. 	(svm_score): added code to write svmlight files.	* svm_trans.c: initial version - pretty much empty now.	* bow/svm.h: added svm_*permute_data declarations & the 	transduce_svm declaration.	* svm_al.c (al_svm_test_wrapper): replaced permutation code with 	calls to svm_permute_data & svm_unpermute_data.	* svm_smo.c (smo): removed srandom(1) - was only there for 	debugging.	* README-SVM (Bugs): removed section about smo being broken (was 	fixed).	* Makefile.in: added svm_trans.c (transductive svms) to the 	svm_files.1999-10-25  Andrew McCallum  <mccallum@justresearch.com>	* .cvsignore: Add automatically-generated archer files, and a few	others.1999-10-21  Andrew McCallum  <mccallum@justresearch.com>	* barrel.c (bow_barrel_keep_top_words_by_infogain): Don't set the	NUM_WORDS_TO_KEEP to be the WI2IG_SIZE (which is the total number	of words).  Set it to the MIN of this and the original	NUM_WORDS_TO_KEEP.  Before this fix, no words were ever getting	removed.  What a bug!  I wonder how long this has been in there?	Reported by Carsten Lanquillon <lanqui@cs.cmu.edu>.1999-10-20  Andrew McCallum  <mccallum@justresearch.com>	* ddf.c (bow_ddf_dirichlet_from_doc_word_counts): Only print the	diagnostics for 10 sampled words, not 50.	* bpe.c (bow_bpe_set_cdoc_word_count_from_wi2dvf_weights): Print	the alphas for only 10 sampled words intead of 20.1999-10-19  Andrew McCallum  <mccallum@justresearch.com>	* svm_base.c: Check verbosity level before printing to stdout.	Only print if above bow_progress.1999-10-19  Gregory C Schohn  <gcs@cmu.edu>	* svm_base.c (svm_score): removed cnt variable (useless) & fixed a 	typo-bug (sub_model[i] -> barrel).	* svm_smo.c (smo): changed the printf for information of where 	opt_pair failed to an fprintf.1999-10-19  Gregory C Schohn  <gcs@justresearch.com>	* Makefile.local (DIST_ALL_FILES): added -DGCSJPRC (turn local 	pedantic debugging) to DEFS.	* Makefile.in (ALL_CPPFLAGS): added -Ibow (so that pr_loqo.h is 	found by pr_loqo.c even though they aren't in the same directory 	[since we can't change pr_loqo.*]).	(DEFS): Changed from _DEFS & now using += instead of the temporary.	* svm_base.c: the epsilon_crit is now /2 for SMO (since the actual 	eps is 2x the variable).  fixed some printfs.	* svm_loqo.c (build_svm_guts): added code to remember previous KKT 	epsilon (even though nobody sets the initial value to anything 	different than the macro).	(build_svm_guts): added local define (GCSJPRC) for debugging stuff 	which includes stopping the proc & sending mail.	* svm_smo.c: commented #DEBUG.  added kcache_ages to appropriate 	spots across the file.  removed some print statements that weren't 	to useful anymore.	(opt_pair): changed an optimality check - used to use (a2+ao2)*eps 	to detrmine if something moved far enough, now just using eps_a 	(may not be right, but its more correct than before) - we need it 	to prevent inf. looping.	(opt_pair): Removed some unreachable in if statements.	(opt_pair): Fixed calculations of bup & blow - they were backwards	(smo): the threshold, b is now (bup+blow)/2 instead of blow (which 	is at most epsilon_crit different).1999-10-16  Gregory C Schohn <gcs@justresearch.com>	* svm_base.c: Added #ifdef HAVE_LOQO around calls to build_svm_guts	* svm_al.c: Added #ifdef HAVE_LOQO around calls to build_svm_guts	* Makefile.in: Re-enabled svm code.  Made the pr_loqo checks look 	./bow/pr_loqo.h1999-10-16  Andrew McCallum  <mccallum@justresearch.com>	* README-SVM (Obtaining sources): File renamed from README_SVM.	Clarify directions for where to put pr_loqo.h.1999-10-15  Andrew McCallum  <mccallum@justresearch.com>	* Version (BOW_MINOR_VERSION): Changed from 9 to 95.	* bow/libbow.h (BOW_MINOR_VERSION): Changed from 9 to 95.	Bug fixes for distribution.	* .cvsignore: Added rainbow-rank and rainbow-ts.	* Makefile.in: Temporarily disable SVM from rainbow.	(ARCHER_GENERATED_C_FILES): New variable.  Remove this files from	those distributed, because they should be generated.	(ARCHER_DIST_FILES): Added archer.c and archer_query.c	* Makefile.in (DEMO_EXECUTABLES): New variable.	(ARCHER_DIST_FILES): Added dirichlet.c.	(DIST_FILES): Added archer.el	* multiclass.c: Comment out unused variables.	Odd assortment of clean-ups.	* bow/libbow.h (bow_random_reset_seed): Declare function.	* train_dirichlet.c (MOMENT_MATCH_ONLY): New macro.	(SPARSE): Change macro value from 0 to 1.  This only effects running	train_dirichlet's main() directly.	(main): comment out the printing of the gammaln() tests.  New local	variable COUNTS_SIZE, increased from 100 to 10000.  Print more	diagnostics at the end.	* readme.texi: Update for new front-ends and fix command-line	options so they work.	* rainbow.c (rainbow_options): Clean up wording in several places.	(rainbow_query): Change behavior of repeated queries.	(bow_print_log_odds_ratio): Add a new FILE* argument.  All callers	changed.	* nbshrinkage.c: Allow different lambda hierarchical mixture	weights for different classes.	* mix.c (mix_options): New command-line option for setting the	number of EM iterations.	(mix_new_vpc): Don't allow initial random class_probs to be zero.	* libbow-desc.texi: Update for new front-ends and MSWin.	* lex-gram.c (bow_lexer_gram_open_text_fp): Properly save the	return value of bow_realloc().  This fixes a nasty crash.	* emsimple.c (bow_emsimple_new_vpc_with_weights): Print	diagnostics using odds_ratio.	* dirichlet.c (main): New command-line argument -I.  Handle it.	* dice.c (print_usage): Expand help statement.	* ddf.c (ddf_force_large_alphas): New variable.	(bow_ddf_dirichlet_from_doc_word_counts): Handle it.	(ddfla): New method.	* cdmm.c (CDMM_PRINT_ALPHAS_KEY): Change value to not conflict	with the cdm method.	* bpe.c (bpe_prior_alpha): Change default prior "ghost count" from	1 to 0.	(bow_bpe_set_cdoc_word_count_from_wi2dvf_weights): Make the verbosity	work even when the vocabulary size is less than 20.	(bow_bpe_score): Print more information when BOW_PRINT_WORD_SCORES.	Print more digits of precision of BOW_PRINT_WORD_SCORES for	individual words.	* Makefile.local (RAINBOW_METHOD_C_FILES): Move some of these to	the Makefile.in.
💿 文件大小 522 K
👤 上传用户 yuanata
📂 所属分类数值算法/人工智能
🏷️ 相关标签

#mitchell #tom #机器学习 #代码
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -