⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 changelog

📁 贝叶斯学习算法分类文本。基于朴素贝叶斯分类器的文本分类的通用算法
💻
📖 第 1 页 / 共 5 页
字号:
 	warning.	--test-files-loo should now work.	* prind.c: Convert scoring function to take LOO_CLASS arguement.	* kl.c: Likewise.	* naivebayes.c: Likewise.	* tfidf.c: Likewise.	* evi.c: Likewise.	* rainbow.c: Call scoring function with LOO_CLASS argument.	* bow/libbow.h (bow_barrel_score): Add extra LOO_CLASS argument.	(bow_method): Likewise to (*score) member.	* rainbow-ac.pl: Make sure last confidence number gets printed 	properly.  Before it was always just zero.	* rainbow.c (rainbow_options): Added "test-files-loo" for 	Leave-One-Out testing.  Not implemented yet, however.	(struct rainbow_arg_state): New member LOO_CV.	(rainbow_query): Do proper checks before using lisp score truncation.	(rainbow_test): Likewise.  Also, add (commented out) code to print	more stats.	(main): Call _register_method_evi() to make sure it gets linked in.	* Makefile.in (LIBBOW_C_FILES): Added evi.c.	* evi.c: New file.	* naivebayes.c (bow_naivebayes_set_weights): Add checks that make 	sure that Sum_w Pr(w|c) is 1 for all classes.	* kl.c (bow_kl_score_loo): Implement normalized KL scores, with 	Witten-Bell discounting.  (NOTE: NaiveBayes does not yet have	Witten-Bell implemeted.  Thus the accuracy of Witten-Bell can be	easily compared with Laplace by comparing "kl" with "naivebayes".)	* rainbow-stats.pl (confusion): Initialize $MAX_CLASSNAME_LENGTH 	to the length of "classname", so that we still get proper 	formatting with very short classnames.	* istext.c (bow_fp_is_text): Temporarily comment out the code that 	tries to avoid files with uuencoded blocks, because the current 	scheme also seems to avoid many HTML files.  (Reported by Sean 	Slattery.)  Warning, trying to index the 20_newsgroups data in 	this state will give bad results.Mon Jun 23 11:59:50 1997  Andrew McCallum  <mccallum@jprc.com>	* prind.c (bow_prind_score): Comment fixes.  Describe the 	smoothing situation accurately.	* int4word.c (bow_words_keep_top_by_infogain): Don't try to "keep" 	more words than are available in the BARREL!  (Bug reported by 	Daniel A Dipasquo <greenface+@CMU.EDU>.)  If NUM_WORDS_TO_KEEP is 	greater than or equal to the number of words in the BARREL, put 	all these words in the new vocabulary.Wed Jun 11 16:40:14 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (rainbow_test): Don't do CommonLisp score truncation 	if the score is negative.  (This change should be made to other 	score-printing functions too.)	(main): Gratuitously call _register_method_kl(), so that kl.c gets	linked in with the rainbow executable.	* kl.c (_register_method_kl): Make sure we can't register the 	method twice, even if this function is called twice.	* naivebayes.c (bow_naivebayes_score_loo): When using uniform 	class priors, set SCORES[CI] based on log of uniform distribution 	of classes, not to 1.  When setting log_pr_tf, instead of using 	pow() before taking the log(), just multiply after using log().	* Makefile.in (LIBBOW_C_FILES): Added kl.c.Fri Jun  6 09:48:06 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (NO_LISP_SCORE_TRUNCATION_KEY): New macro.	(rainbow_options): New option "no-lisp-score-truncation".	(rainbow_parse_opt): Handle it.	(struct rainbow_arg_state): New member USE_LISP_SCORE_TRUNCATION.	(rainbow_query): Obey it.	(rainbow_test): Likewise.	(main): Make its default value 1.Tue Jun  3 10:31:10 1997  Andrew McCallum  <mccallum@jprc.com>	* readme.texi: Use BOWVERSION, not BOW_VERSION to match 	version.texi.Thu May 29 15:25:06 1997  Andrew McCallum  <mccallum@jprc.com>	* Version (BOW_MINOR_VERSION): Version 0.8.	* bow/libbow.h (BOW_MINOR_VERSION): Version 0.8.	* docnames.c (bow_map_filenames_from_dir): Remove local variables 	no longer used.Mon May 26 12:59:50 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (main): New commented-out code for computing the 	number of word co-occurrences.Fri May 23 11:34:05 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (USE_VOCAB_IN_FILE_KEY): New macro.	(rainbow_options): New option "use-vocab-in-file".	(rainbow_parse_opt): Handle it.	(struct rainbow_arg_state): New member VOCAB_MAP.	(rainbow_query): Use it to remove words from the vocabulary.	(rainbow_test): Likewise.	(main): Likewise.	* rainbow-stats.pl (prune_from_classname): New global variable.  A 	regular expression to be removed from the end of classnames before 	gathering stats on them.  This allows us to gather stats on 	performance in the middle of class hierarchies.	(read_trial): Use it.	* int4str.c (bow_int4str_new_from_text_file): Return MAP instead 	of NULL!	* barrel.c (bow_barrel_prune_words_not_in_map): Define MAX_WI and 	use it, so we don't ask for word indices larger than 	bow_num_words().	(bow_barrel_print_word_count): Also print word probability according	to counts.	* rainbow-h.c (main) [printing_word_counts]: Print word that is 	being counted.Wed May 21 15:01:51 1997  Andrew McCallum  <mccallum@jprc.com>	* barrel.c (bow_barrel_prune_words_not_in_map): Remove the words 	instead of hiding them, so that future 	bow_keep_top_words_by_infogain() calls won't unhide them.	This version got 46% on hier/yahoo-science (dataset with a 10	document-per-class threshold).	* rainbow-h.c (rainbowh_options): Added --use-vocab-in-file 	command-line option.	(rainbowh_arg_state): Added PARENT and CI_IN_PARENT.  Added HIER_LEAF.	Removed printing of leaf- and intermediate-results.	(hier_barrel_prob_wi_in_ci): New function.	(check_prob_wi_in_ci): New function.	(_hier_barrel_local_score): New function.	(_hier_barrel_set_node_scores): Use it.	(hier_barrel_print_infogain): Print FULL_NAME with interspersed	spaces, so it won't get lexed by bow_int4str_new_from_text_file().	(main): Change defaults.  Before populate_by_scoring=1 and	hier_structure=hier_niece.  Populate branches first thing, and 	check prob_wi_ci consistency.	* naivebayes.c (bow_naivebayes_score_loo): Comment change.	* int4str.c (bow_int4str_new_from_text_file): New function.	* bow/libbow.h: Declare new functions.Tue May 20 16:02:24 1997  Andrew McCallum  <mccallum@jprc.com>	* barrel.c (bow_barrel_prune_words_not_in_map): New function.Mon May 19 09:52:09 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-stats.pl (confusion): Calculate longest classname and 	use it to fix indentation.	* wi2dvf.c (bow_wi2dvf_add_di_wv): Set SEEK_START to special flag 	2.	(bow_wi2dvf_add_wi_di_count_weight): Likewise.	(bow_wi2dvf_hide_wi): Decrement WI2DVF->NUM_WORDS in the right place.	(bow_wi2dvf_unhide_all_wi): Increment WI2DVF->NUM_WORDS.	(bow_wi2dvf_write): Unhide all words first.	(bow_wi2dvf_dv): Change assertion to deal with special flag 2.	* rainbow.c (main): Pass new argument to 	bow_infogain_per_wi_print().	* rainbow-h.c: Misc changes.  Print infogain during run.	(hier_barrel_set_local_class_model): Add IS_ROOT argument.  Unhide	vocabulary after pruning by infogain, so lower levels get all 	words.	* naivebayes.c (M_EST_M): New macro.	(M_EST_P): New macro.	(bow_naivebayes_score_loo): Use them to implement M-estimates, instead	of old Laplace smoothing.	* info_gain.c (bow_infogain_per_wi_print): Add FP argument.	* bow/libbow.h: Add argument to infogain function.	* barrel.c: Fix the math for assigning CDOC->PRIOR, and add 	assertion checks.Fri May 16 10:19:19 1997  Andrew McCallum  <mccallum@jprc.com>	This was state of code on Thursday night.	* rainbow-h.c: Add options for changing population scheme and tree 	structure.  Add ability to output intermediate and leaf results.	* naivebayes.c (WORD_PRIOR_COUNT): New macro.  Current value 1.0.	(bow_naivebayes_score_loo): Use it.Thu May 15 16:22:27 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (rainbow_test): Assert that the ACTUAL_NUM_HITS 	returned by bow_barrel_score() is the same as the 	NUM_HITS_TO_RETRIEVE requested.	* split.c (bow_test_split): Use rand() properly so that the number 	of test documents in each class are not so biased.  Add special 	code that *ensures* that the test documents are evenly distributed 	across classes.	* rainbow.c (rainbow_print_weight_vector): Don't use 	CDOC->NORMALIZER if the method is "naivebayes", because NaiveBayes 	doesn't use it.  Previously the printed values were bogus.Wed May 14 11:02:44 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: -q RAINBOWH_QUERYING now seems to work.	* naivebayes.c (bow_naivebayes_score_loo): Add assertion that 	CDOC->PRIOR is greater than zero.  This restriction should be 	relaxed!	* array.c (bow_array_free): Decrement length after testing for 	non-zero-ness, not before.  Without this change, empty arrays 	would call free() on un-malloc'ed() memory.Tue May 13 18:16:31 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: Add code for doing selective population of lower 	branches.  This population seems to be working.  Querying/scoring 	does not yet work.	* wi2dvf.c (bow_wi2dvf_hide_wi): Change assertion to "if" so that 	we won't crash if we try to hide words that are already hidden.	* split.c (bow_tmp_word_struct2): New type.	(bow_model_next_wv): New function.	(bow_nontest_next_wv): New function.	* rainbow.c (rainbow_options): Fix documentation for test-files.	(rainbow_test): Choose vocabulary by info gain *after* the test/train	split.  Add temporary code to test bow_naivebayes_score_loo().  	Remove this later!	* naivebayes.c (bow_naivebayes_score_loo): New function, copy of 	bow_naivebayes_score_loo, with extra code to do leave-one-out 	testing if argument LOO is non-negative.	(bow_naivebayes_score): Call above function with -1 for LOO.	(bow_method_naivebayes): Change NORMALIZE_WEIGHTS from	bow_barrel_normalize_weights_by_summing() to NULL.  The 	normalizing function was not taking account of the Laplace 	smoothing numbers, and was giving incorrect weights.	(bow_method_crossentropy): Likewise.	* istext.c (bow_fp_is_text): Increase NUM_LINE_LENGTHS to 	NUM_TEST_CHARS to avoid potential crash.	* docnames.c (bow_map_filenames_from_dir): For directory names and 	filenames, make it use names of soft links, not the directories 	that the links point to.	* barrel.c (bow_barrel_add_document): New function.	* bow/libbow.h: Declare new function.	* docnames.c (bow_map_filenames_from_dir): Change commented-out 	code so that, if uncommented, this function will work if you pass 	it a filename instead of a directory name.Tue May  6 15:30:30 1997  Andrew McCallum  <mccallum@jprc.com>	* Makefile.local (rainbow-h): Make it depend on libbow.a.	* rainbow-h.c: May 5 changes from Andrew Ng.	(rainbowh_unarchive): Switch order of unarchiving for vocabulary	and hier_barrel.	(hier_barrel_new_from_file): Use bow_barrel_new_from_data_file()	instead of bow_barrel_new_from_fp(), so we close FILE*'s instead	of keeping them open.  Otherwise we run out of UNIX's available	open file descriptor's.	* wi2dvf.c (FREE_WHEN_HIDING_WI): New macro.	(bow_wi2dvf_hide_wi): Heed it.	(bow_wi2dvf_dv): Don't check to make sure that WI is less than	bow_num_words().  Check SEEK_START before returning a non-NULL DV, 	because if SEEK_START is less than -1, the DV should be considered 	`hidden'.	* opts.c (bow_exclude_filename): New global variable.	(bow_options): New option "exclude-filename".	(parse_bow_opt): Handle it.	* docnames.c (bow_map_filenames_from_dir): Make sure 	BOW_EXCLUDE_FILENAME is non-NULL before passing it to strcmp().	* bow/libbow.h (bow_exclude_filename): Declare new global 	variable.	* barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Use 	bow_malloc() instead of alloca(), so that bow_realloc() will work.  	free() it at the end.	(bow_barrel_new_from_data_file): New function.Mon May  5 21:08:34 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: Changes by Andrew Ng, before Andrew McCallum's 	changes to close barrel FP's.Fri May  2 09:53:12 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: Additions by Andrew Ng to implement cousin scheme.Wed Apr 30 10:48:30 1997  Andrew McCallum  <mccallum@jprc.com>	* Makefile.in: Include Makefile.local, avoiding error if it isn't 	present.	* barrel.c (bow_barrel_keep_top_words_by_infogain): Unhide and 	hide the DVF's instead of removing them, so that we can call this 	function mulitple times with increasing NUM_WORDS_TO_KEEP.	* wi2dvf.c (bow_wi2dvf_hide_wi): New function.	(bow_wi2dvf_unhide_all_wi): New function.	(bow_wi2dvf_dv): Handle new negative values of SEEK_START set by	BOW_WI2DVF_HIDE_WI().	* bow/libbow.h: Declare new functions.	(bow_doc_type): Add ignored_model, for rainbow-h.c.Thu Apr 24 09:03:10 1997  Andrew McCallum  <mccallum@jprc.com>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -