⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 changelog

📁 贝叶斯学习算法分类文本。基于朴素贝叶斯分类器的文本分类的通用算法
💻
📖 第 1 页 / 共 5 页
字号:
	* vpc.c (bow_barrel_set_vpc_priors_by_counting): Fix crash that 	occurs if limited vocabulary causes all files in a class to be 	empty.	* stoplist.c (bow_stoplist_add_word): New function.	* rainbow-stats.pl (confusion): Print percentage correct for each 	category.	* istext.c (bow_fp_is_text): Also return 0 for files that have 	more than 30% of their lines of the same length.  This way we 	avoid files containing uuencoded blocks.	* bow/libbow.h: Declare new function.Tue Apr 22 11:19:03 1997  Andrew McCallum  <mccallum@jprc.com>	* deflexer.c (bow_default_lexer): Add cast to initialization to 	avoid warning.	Add a uniform, global way of keeping track of binary file format	versions.	* io.c (bow_file_format_version): New global variable.	(bow_write_format_version_to_file): New function.	(bow_read_format_version_from_file): New function.	* bow/libbow.h (bow_file_format_version): Declare new global 	variable.	(BOW_DEFAULT_FILE_FORMAT_VERSION): New macro.	(bow_write_format_version_to_file): New function declaration.	(bow_read_format_version_from_file): New function declaration.	* rainbow.c (FORMAT_VERSION_FILENAME): New macro.	(rainbow_archive): Write format version to disk.	(rainbow_unarchive): Read it from disk if the file exists, otherwise	set it to 3, which is the format version number of data before 	BOW_FILE_FORMAT_VERSION was added to the library.	* rainbow.c (rainbow_options): New option "print-word-counts", 	alias for "print-counts-for-words".  Hide the later option from 	the --help text.	* rainbow-stats.pl (confusion): Print confusion matrix in a more 	readable format.	Add new command-line option to rainbow for using only 0 or 1 word	counts. 	* opts.c (bow_binary_word_counts): New global variable.	(bow_options): New option "binary-word-counts".	(parse_bow_opt): Handle it.	* bow/libbow.h: Declare new global variable.	* dv.c (bow_dv_add_di_count_weight): When BOW_BINARY_WORD_COUNTS 	is true, insist on keeping DV's entry count below 2, i.e. 0 or 1.	Fri Apr 18 16:09:06 1997  Andrew McCallum  <mccallum@jprc.com>	* configure.in: Add -Wno-implicit to default CFLAGS.	* rainbow.c (rainbow_lisp_query): Return if QUERY_WV is emtpy.  	(Previously would have crashed.)	* tfidf.c (TFIDF_METHOD): Fix typo that defined 	_register_method_tfidf_.. functions without the last underscore.	(Reported by Kamal Nigam.)		* split.c (bow_test_split): When selecting documents for test set, 	and randomly pick a document that was already in the test set, 	don't just scan sequentially for the next non-test document, pick 	a new random number.  This will avoid long contiguous stretches of 	test documents.	* naivebayes.c (bow_naivebayes_score): Move the handling of 	SCORE_WITH_LOG_PROBABILITIES.	* barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Assert 	that CDOC->PRIOR must be greater or equal, not just greater.Thu Apr 10 14:54:08 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: Fix the `compile-command'.	(PRINT_TREE_SCORES): New macro.	(hier_set_method): New function.	(main): Call it if BOW_ARGP_METHOD is non-NULL.	* deflexer.c (bow_default_lexer): Initialize it to -1, so that 	deflexer.o will get linked in under SunOS.  Ug.  See comment.	* bow/libbow.h (bow_methods): Declare extern!Wed Apr  9 11:14:13 1997  Andrew McCallum  <mccallum@jprc.com>	* lex-html.c (bow_lexer_html_get_raw_word): Return last word in 	document, even if it is not followed by a non-word character!	* lex-simple.c (bow_lexer_simple_get_raw_word): Likewise.	* rainbow.c (rainbow_lisp_setup): Call all 	__attribute__((constructor)) functions here since this will be 	dynamically loaded and the contructor functions won't be called 	then.	* opts.c (parse_bow_opt): Remove call to 	_bow_default_lexer_init(); moved to rainbow.c.	Fix a bug whereby --skip-html was a no-op.	* deflexer.c (bow_default_lexer_simple, 	bow_default_lexer_indirect, bow_default_lexer_gram, 	bow_default_lexer_html, bow_default_lexer_email): Change global 	variable from struct's to pointers to structs.	(_bow_default_lexer_simple, _bow_default_lexer_gram,	_bow_default_lexer_html, _bow_default_lexer_email): New static 	variables.	(_bow_default_lexer_init): Set BOW_DEFAULT_LEXER_INDIRECT to point	inside of BOW_DEFAULT_LEXER_GRAM, which is the BOW_DEFAULT_LEXER.	* opts.c: Now use all default lexers as pointers to struct's 	instead of struct's.	* bow/libbow.h (bow_default_lexer_simple, 	bow_default_lexer_indirect, bow_default_lexer_gram, 	bow_default_lexer_html, bow_default_lexer_email): Change global 	variable from struct's to pointers to structs.	* vpc.c (bow_barrel_new_vpc_merge_then_weight): Assert the method 	name.	* Makefile.in (dist-cmu, bow-$(BOW_VERSION).tar.gz): New targets.Tue Apr  8 08:00:00 1997  Andrew McCallum  <mccallum@jprc.com>	* Version (BOW_MINOR_VERSION): Version 0.7.	* bow/libbow.h (BOW_MINOR_VERSION): Likewise.	* rainbow.c (RAINBOW_MINOR_VERSION): Version 0.2.	* arrow.c (ARROW_MINOR_VERSION): Version 0.2.	* NEWS: Update for new version of library and rainbow.	* readme.texi: Likewise.	* Makefile.in (DIST_FILES): Add NEWS.	* Makefile.in (dist): Fix invocation of `tr' for cvs rtag.	* split.c (bow_test_next_wv): Initialize CURRENT_DI to avoid 	warning.	* split.c (bow_test_split): Initialize DOC to avoid warning.	* int4word.c (bow_words_keep_top_by_infogain): Initialize 	MAX_IG_WI to avoid warning.	* dv.c (bow_dv_add_di_count_weight): Only give "overflowed short" 	message at BOW_VERBOSE level, not BOW_PROGRESS level.	* crossbow.c (main): Initialize NORMALIZER to zero.	* Makefile.in (dist): Create ./bow directory.  Fix invocation of 	argp.	(snapshot): Likewise.	* configure.in: Add -O to the default CFLAGS.	* rainbow.c (rainbow_options): Improve some option help text.	(rainbow_parse_opt) [INFOGAIN_PAIR_VECTOR_KEY]: Handle it.	* opts.c (bow_options): Improve some option help text.	* Makefile.in (version.texi): Define BOWVERSION instead of 	BOW_VERSION, so makeinfo can get the value.	(%.dvi, %.info): Fix typo.	* libbow.texi: Fix typos and begin preliminary documentation.	* rainbow.c (rainbow_options): New option "repeat"/'r'.	(rainbow_parse_opt): Handle it.	(rainbow_arg_state): New member REPEAT_QUERY.	(rainbow_query): Attend to REPEAT_QUERY.	* naivebayes.c (bow_naivebayes_set_weights): Fix assertion so it 	works for both naivebayes and crossentropy.Mon Apr  7 11:00:06 1997  Andrew McCallum  <mccallum@jprc.com>	* sarray.c (bow_sarray_entry_at_keystr): If there is no index for 	that KEYSTR, print an error message.  This way if user mistypes a 	method name to rainbow's -m option, they get a message that makes 	some sense.	* opts.c (_help_filter): New function to add the names of the 	available methods to the help text.	(bow_argp): Put it in.	Use strings to identify methods instead of integers.  Separate	method declarations instead separate .h files.	* bow/tfidf.h, bow/naivebayes.h, bow/prind.h: New files.	* Makefile.in (LIBBOW_H_FILES): Add files bow/naivebayes.h, 	bow/tfidf.h, bow/prind.h.	* naivebayes.c (bow_method_naivebayes, bow_method_crossentropy): 	Use string method identifier instead of integer.	* prind.c (bow_method_prind): Likewise.	* tfidf.c (TFIDF_METHOD): Likewise.	* rainbow.c (rainbow_parse_opt) [G]: Step through methods 	according to new BOW_METHODS bow_sarray, instead of old static 	array.	* methods.c (bow_methods): Static array removed.	(bow_methods): Renamed from _bow_str4method, and made non-static.	* barrel.c (bow_method_id, _old_bow_methods): Put copies of what 	used to be in libbow.h here, so we can unarchive old-format 	barrel's.	(BOW_DEFAULT_BARREL_VERSION): Changed from 2 to 3.	(bow_barrel_new_from_data_fp): If VERSION_TAG is less than 3, read the	method id integer and use _OLD_BOW_METHOD, otherwise, read a 	string and use new BOW_METHOD_AT_NAME().	(bow_barrel_write): Write the method as a string instead of as an	integer.	* Makefile.in (ALL_CPPFLAGS): -I$(srcdir) instead of 	-I$(srcdir)/bow.	* All files: Include <bow/libbow.h> instead of "libbow.h".	* bow/libbow.h: Include <bow/tfidf.h>, <bow/naivebayes.h>,	<bow/prind.h>.	(bow_method_register_with_name,	bow_method_at_name): Declare functions.	(bow_method_id): Typedef removed.	(bow_str_to_method_id): Macro removed.	(bow_methods): Global variable removed.	(bow_method_tfidf_words, bow_method_tfidf_log_words,	bow_method_tfidf_log_occur, bow_params_tfidf): Removed.	(bow_method_prind, bow_params_prind): Removed.	(bow_method_naivebayes, bow_params_naivebayes): Removed.	* methods.c (bow_method_at_name): Comment function.	(bow_method_register_with_name): Likewise.	* opts.c (parse_bow_opt) [m]: Use bow_method_at_name().	* naivebayes.c: Use bow_method_register_with_name().  Add new 	method "crossentropy".	(bow_naivebayes_score): Pay attention to SCORE_WITH_LOG_PROBABILITIES	when setting class priors.  When it is true, use inverse of 	cross-entropy instead of negative!	* prind.c: Use bow_method_register_with_name().	* tfidf.c: Use bow_method_register_with_name().	* rainbow.c (main): Strip any trailing `/'s from classnames, so 	FILENAME_TO_CLASSNAME() will find the classnames.  (Reported by 	Jason Rennie <jr6b@syrinx.res.cmu.edu>.)	* rainbow-h.c (PRINT_COUNTS_FOR_WORD_KEY): New macro.	(rainbowh_options): New option "print-counts-for-words".	(rainbowh_parse_opt): Handle it.	(struct rainbowh_arg_state): New member PRINTING_WORD.	(hier_barrel_print_word_counts): New function.	(main): Handle new option.  Do the right think for `-O' if 	BOW_PRUNE_VOCAB_BY_OCCUR_COUNT_N.	* info_gain.c (LEAVE_OUT_LAST_CLASS): Macro defined once at top.  	Changed from 0 to 1.	* install.texi: Explain the results of --prefix.  Remove old 	references to Objective C installation.Thu Apr  3 12:50:23 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (rainbow_test_files): Use macros for setting QUERY_WV 	weights, so we can handle case in which the wv normalizer is NULL!	(main): Replace code for implementing word-count-printing with	call to new function.	* barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): 	Initialize ci2dc entries to zero!	(bow_barrel_print_word_count): New function.	* opts.c (bow_options): Add new option 	"naivebayes-score-with-log-probs".	(parse_bow_opt): Handle it.	* naivebayes.c (bow_naivebayes_score): Begin adding code to 	support SCORE_WITH_LOG_PROBABILITIES parameter; not yet finished.	(bow_naivebayes_params): Add initializer for	SCORE_WITH_LOG_PROBABILITIES, initialize it BOW_NO.	* bow/libbow.h: Declare new function.	(bow_params_naivebayes): New entry SCORE_WITH_LOG_PROBABILITIES.Wed Apr  2 10:07:30 1997  Andrew McCallum  <mccallum@jprc.com>	* configure.in: Add a check to see if __attribute__((constructor)) 	works.  If it does not, define CONSTRUCTOR_FAILS.	* rainbow.c (rainbow_lisp_setup): Fix typo.	* Makefile.in ($(PERL_RUNNABLE_FILES)): Use % in pattern and $< in 	rule so that we get the .pl file from the $(srcdir).	* rainbow-h.c (rainbowh_options): New option 	"print-infogain-vector", 'I'.	(struct rainbowh_arg_state): Add state for it.	(rainbowh_parse_opt): Handle it.	(hier_barrel_write_to_file): Close the FP after writing a barrel.	(hier_barrel_set_vpc_with_weights): Construct and pass a CLASSNAMES	array.	(hier_barrel_set_cdoc_priors_to_class_uniform): New function.	(_hier_barrel_set_node_scores): Print a little header/separator if	BOW_PRINT_WORD_SCORES.	(hier_barrel_test): Initialize the QUERY_WV to NULL, so	BOW_TEST_NEXT_WV doesn't try to free unallocated memory.	(hier_barrel_print_infogain): New function.	(rainbowh_archive): New function.	(rainbowh_unarchive): New function.	(main): Use above two functions.  Deal with printing infogain.	* rainbow.c: Re-written for using libargp.  This should make it	work with the WebKB lisp crawler again.	* prind.c (bow_prind_score): Make sure CDOC->FILENAME is non-NULL 	before trying to print it when BOW_PRINT_WORD_SCORES is true.	* opts.c (parse_bow_opt) [ARGP_KEY_INIT]: Call 	_bow_default_lexer_init().	* deflexer.c (_bow_default_lexer_init): Don't make it static.  Use 	static local variable to make sure we don't run through it twice.  	This is because we will call is explicitly in 	opts.c:parse_bow_opt(), because __attribute__ ((constructor)) 	doesn't seem to work on SunOS.	* Makefile.in (PERL_FILES): Added rainbow-ac.pl and rainbow-pr.pl.	* (rainbow-ac.pl, rainbow-pr.pl): New files from	Dayne Freitag <dayne@cs.cmu.edu>.Tue Apr  1 10:11:03 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c (rainbowh_parse_opt): Implement option 'M' for 	use_maximum_likelihood_path.	(hier_default_method): Renamed from METHOD; all uses changed.	(hier_barrel): New member NUM_NON_REST_CDOCS, to keep track of	DOC_BARREL->CDOCS->LENGTH *before* the `rest' documents start 	getting added, so that we can implement 	HIER_PARENT_DI_TO_CHILD_INDEX_AND_DI properly.	(hier_barrel_new): Initialize it to -1.	(hier_barrel_add_child): Set it.	(hier_barrel_new_from_text_dir_leaf): Set it.	(hier_barrel_write_to_file): Write it.	(hier_barrel_new_from_file): Read it.	(hier_parent_di_to_child_index_and_di): Use it.	(hier_barrel_print): Print it instead of DOC_BARREL->CDOCS->LENGTH.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -