⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 changelog

📁 贝叶斯学习算法分类文本。基于朴素贝叶斯分类器的文本分类的通用算法
💻
📖 第 1 页 / 共 5 页
字号:
	(hier_barrel_add_stats): New function split out from	HIER_BARREL_ADD_CHILD.	(hier_barrel_add_child): Use it.	(hier_barrel_add_rest): New function.	(hier_barrel_new_from_text_dir): Call it to add `rest' documents.	(hier_barrel_test): Allocate space for 3 as many SCORES, to make room	for the `rest' classes.	(main): Set HIER_DEFAULT_METHOD from BOW_ARGP_METHOD, if non-NULL.	* scale.c (bow_barrel_scale_weights_by_given_infogain): Only 	verbosify every 100 words.	(bow_barrel_scale_weights_by_given_foilgain): Likewise.	* vpc.c (bow_barrel_set_vpc_priors_by_counting): Fix indentation.	* rainbow-h.c: Converted to do command-line argument processing 	with libargp.	* opts.c (bow_options): Remove "version" 'V' option.  libargp can 	handle that automatically.	(_print_version): New function to print both program version and	library version.	(argp_program_version_hook): Set it to _PRINT_VERSION().	* rainbow.c (rainbow_print_usage): Function removed.  Libargp does 	that now.Mon Mar 31 11:07:30 1997  Andrew McCallum  <mccallum@jprc.com>	* barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): Use 	ALLOCA() instead of BOW_MALLOC() to avoid memory leak.	* Makefile.in (configure, config.status): Sprinkle with $(srcdir).	* configure.in: Move the setting of CFLAGS above AC_PROC_CC, so 	that it will have an effect.	* install.texi: Mention how to set CPPFLAGS in the ./configure 	line.	* vpc.c (bow_barrel_set_vpc_priors_by_counting): Properly set the 	CDOC->PRIOR's.	* rainbow.c (INFOGAIN_PAIR_VECTOR_KEY): New macro.	(rainbow_options): New option "infogain-pair-vector".	(rainbow_parse_opt): Handle it.	(main): Likewise.  When RAINBOW_WORD_COUNT_PRINTING, also print the	total number of words in each class.	* prind.c (bow_prind_set_weights): Get MAX_WI from MIN of 	WI2DVF->SIZE and BOW_NUM_WORDS(), not just BOW_NUM_WORDS().	* opts.c (bow_uniform_class_priors): New global variable.	(bow_options): New option "uniform-class-priors".	(parse_bow_opt): Handle it.	* naivebayes.c (bow_naivebayes_set_weights): Get MAX_WI from MIN 	of WI2DVF->SIZE and BOW_NUM_WORDS(), not just BOW_NUM_WORDS().	(bow_naivebayes_score): Pay attention to BOW_UNIFORM_CLASS_PRIORS.	Don't sum in score of words that don't have a DV entry!  	Previously we were allowing words that `aren't in the vocabulary' 	of the BARREL to contribute!  This was wrong.  They were 	contributing according to the Laplace Estimators, and classes with 	larger numbers of words were getting penalized.	* info_gain.c (bow_infogain_per_wi_new): Sum floating point 	CDOC->PRIOR's instead of increment integer count of documents, so 	that infogain can be calculated from documents with different 	`weights'.	(bow_infogain_per_wi_new_using_pairs): New function.  For now it	prints its results instead of returning them.	* barrel.c (bow_barrel_set_cdoc_priors_to_class_uniform): New 	function.	* bow/libbow.h: Declare new functions.Mon Mar 31 11:56:48 1997  Andrew McCallum  <mccallum@cs.cmu.edu>	* Makefile.in (CFLAGS, CPPFLAGS): Get values from configure.	* configure.in: Do AC_SUBST() for CPPFLAGS and CFLAGS.Fri Mar 28 10:28:26 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-h.c: Fix spelling: "heir" -> "hier".  How embarrassing!	* dv.c (bow_dv_new_from_data_fp): Fix typo in feof() assertion.  	(Reported by Doreen Cheng <dcheng@PRPA.Philips.COM>.)	* rainbow.c (PRINT_COUNTS_FOR_WORD_KEY): New macro.	(rainbow_options): New option "print-counts-for-word".	(rainbow_parse_opt): Handle it.	(main): Implement it.	* bow/libbow.h: (bow_wi2dvf): Add new element to structure:	`num_words'.	(bow_barrel): Put `is_vpc' at end of structure instead of the	beginning. 	* wi2dvf.c (bow_wi2dvf_new): Initialize NUM_WORDS.	(bow_wi2dvf_add_di_wv): Increment it.	(bow_wi2dvf_add_wi_di_count_weight): Likewise.	(bow_wi2dvf_new_from_data_fp): Likewise.	(bow_wi2dvf_remove_wi): Decrement it.	(bow_wi2dvf_print_stats): Print it.	* prind.c (bow_prind_set_weights): Use BARREL->WI2DVF->SIZE and 	BARREL->WI2DVF->NUM_WORDS instead of BOW_NUM_WORDS().  In 	particular, this will allow us to set the Laplace estimators using 	the correct number of words in the barrel, not the arbitrary 	libbow-wide vocabulary size.  Properly use CDOC->WORD_COUNT 	instead of overloading CDOC->NORMALIZER.	(bow_prind_score): Likewise use BARREL->WI2DVF->SIZE and	BARREL->WI2DVF->NUM_WORDS instead of BOW_NUM_WORDS().	(bow_print_word_scores): Removed to opts.c.	* opts.c (bow_print_word_scores): Global variable moved here from 	prind.c.	(bow_options): New option "print-word-scores".	(parse_bow_opt): Handle it.	* naivebayes.c (bow_naivebayes_set_weights): Use 	BARREL->WI2DVF->SIZE and BARREL->WI2DVF->NUM_WORDS instead of 	BOW_NUM_WORDS().  In particular, this will allow us to set the 	Laplace estimators using the correct number of words in the 	barrel, not the arbitrary libbow-wide vocabulary size.	(bow_naivebayes_score): Likewise, and add code to print scores	contributions of each word with BOW_PRINT_WORD_SCORES is non-NULL.	(SCORE_WITH_LOG_PROBABILITIES): New macro.	* barrel.c (bow_barrel_printf): Comment out the code that would 	skip over documents that are not of type `model'.Thu Mar 27 11:29:34 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow-stats.pl: Make output labels more descriptive.  Say 	`average percentage accuracy'.	* split.c (bow_test_split): Use the micro-seconds field from 	gettimeofday() instead of time() to set the random number 	generator seed.  Otherwise, if we re-call this function too 	quickly we'll get exactly the same seed!  ...because time() 	returns a number of seconds.	* demos/script: New shell script file that will demo rainbow,	with running commentary.	* demos/data: New directory containing 20 articles 2 newsgroups.	This is for use with demos/script.	* install.texi: Remove mention of `checks' and `examples' 	directory; they don't exist.  (Reported by Doreen Cheng	<dcheng@PRPA.Philips.COM>.)Mon Mar 24 12:07:53 1997  Andrew McCallum  <mccallum@jprc.com>	* Makefile.in (rainbow-lisp.o): Use $(ALL_CPPFLAGS) and 	$(ALL_CLFAGS) instead of non-ALL versions.	* rainbow.c (rainbow_lisp_setup): Rewrite for use with libargp.	* methods.c (bow_method_at_name): Fix typo.	(bow_method_at_index): Likewise.	* opts.c (parse_bow_opt): Use 'g' instead of 'N' for setting gram 	size.	* rainbow.c (rainbow_lisp_query): Free the QUERY_WV before 	returning!	* methods.c (bow_method_register_with_name): New function.	(bow_method_at_name): New function.	* arrow.c (PRINT_IDF_KEY): New macro.	(arrow_options): Add new option "print-idf".	(struct arrow_arg_state): New enum ARROW_PRINTING_IDF.	(arrow_index): Prune the vocabulary if	BOW_PRUNE_VOCAB_BY_OCCUR_COUNT_N is non-zero.	(main): Add code to print idf values.	* lex-simple.c (bow_alpha_lexer, bow_alpha_only_lexer, 	bow_white_lexer): Initialize STEM_FUNC to 0 instead of 	BOW_STEM_PORTER.	* tfidf.c (bow_tfidf_set_weights): Comment out code that sets 	total_word_count.  Do the DF_TRANSFORM on DF, not on IDF!  	Otherwise we get negative IDF's.	* rainbow-h.c (use_maximum_likelihood_path): New global variable.	(_heir_barrel_set_node_scores): Use it.	(main): Set it when -M passed on command line.	(num_top_words): Moved from main-local variable to global.	(heir_barrel_test): Reduce vocab by infogain.Fri Mar 21 14:02:39 1997  Andrew McCallum  <mccallum@jprc.com>	* bow/libbow.h (bow_lexer_simple): Add entry 	TOSS_WORDS_LONGER_THAN.	(bow_wv_set_weights_to_count_times_idf): Declare new function.	* wv.c (bow_wv_set_weights_to_count_times_idf): New function.	* tfidf.c (bow_tfidf_set_weights): Comment out code saying that 	TFIDF is broken.  Rewrite the way IDF is calculated.	(bow_tfidf_score): Set and normalize the QUERY_WV weights here (even	though it is redundant) so that we can properly use the IDF from 	the BARREL when normalizing weights.  Normalize the QUERY_WV 	weight when incrementing CURRENT_SCORE.	* prind.c (bow_prind_set_weights): Skip a document if it does not 	of type model, both when setting NORMALIZER and TOTAL_TERM_COUNT, 	and when setting weights.	(bow_prind_score): Skip a document if it does not of type model.	* lex-simple.c (bow_lexer_simple_postprocess_word): Add code to 	toss words longer than SELF->TOSS_WORDS_LONGER_THAN.  Set WORDLEN 	at beginning.  It appeared that it was getting used uninitialized 	before!	(bow_alpha_lexer, bow_alpha_only_lexer, bow_white_lexer): Add value	for new field TOSS_WORDS_LONGER_THAN.	* opts.c (APPEND_STOPLIST_FILE_KEY): New macro.	(bow_options): Added "append-stoplist-file"	(parse_bow_opt): Handle new option.	* int4str.c (_str2id): Return the absolute value of the old return 	value.  Sometimes with really long strings, the return value was 	going negative.	(_str_hash_lookup): Assert that ID is non-negative.Thu Mar 20 11:47:49 1997  Andrew McCallum  <mccallum@jprc.com>	These changes by Karl Kleinpaste <karl@jprc.com>	* int4word.c (bow_words_reread_from_file): Use fopen() instead of 	bow_fopen(), so we are sure not to call abort().	* wv.c (bow_wv_sprintf): Fix function to account for length 	troubles properly.	(bow_wv_sprintf_words): New function, prints the words themselves,	rather than the word indices.	* bow/libbow.h: Declare new function.	* naivebayes.c (bow_naivebayes_set_weights): Add commented-out 	code that forces all counts to either 0 or 1.  This was used on 	some experiments with Shumeet.	* lex-html.c (bow_lexer_html_get_raw_word): Add a ! to the 	FALSE_TO_END condition test, so we don't end the tokenization too 	early.Tue Mar 18 14:47:35 1997  Andrew McCallum  <mccallum@jprc.com>	* rainbow.c (rainbow_parse_opt) [ARGP_KEY_END]: Print a useful 	error when only one classname is given.	(main): Check for rainbow_infogain_printing properly.	* opts.c (parse_bow_opt) [ARGP_KEY_END]: Check for the existance 	of BOW_DATA_DIRNAME in a way that works even when the directory is 	owned by someone else.	* bow/libbow.h (bow_fread_string): Assert that the string length 	is non-negative.	* barrel.c (_bow_barrel_version): New variable.	(BOW_DEFAULT_BARREL_VERSION): New macro.	(bow_barrel_new_from_data_fp): Read the version number instead of a	null_tag.	(bow_barrel_write): Likewise, for writing.	* arrow.c (main): Remove redundant code that is now in opts.c.Mon Mar 17 12:09:32 1997  Andrew McCallum  <mccallum@jprc.com>	* Makefile.in (%.o:%.c): Fix the order on this pattern rule.	($(DEMO_EXECUTABLES):%:%.o): Put $(DEMO_EXECUTABLES) at the beginning	of this pattern, so it matches only those files.	* arrow.c: Don't include getopt.h; we're using argp.h instead.	(arrow_index): Fix typo.	* configure.in: Don't look for getopt.h anymore.  We don't need it 	now that we are using libargp.	* configure.in: AC_INIT looking for int4str.c instead of libbow.h.	* Makefile.in (%): Use this pattern to make DEMO_EXECUTABLES 	instead of listing them all.  This avoids making all the .o's for 	one of the DEMO_EXECUTABLES.	* rainbow.c: Converted to use argp command-line argument 	processing.	* opts.c (bow_argp_method): Renamed from bow_default_method.	(parse_bow_opt) [ARGP_KEY_INIT]: Add words to stoplist.	* deflexer.c (_bow_default_lexer_init): Initialize 	bow_default_lexer to BOW_DEFAULT_LEXER_GRAM, not BOW_LEXER_GRAM!	* bow/libbow.h (bow_argp_method): Renamed from bow_default_method.	* arrow.c (arrow_parse_opt) [q]: Set query.filename.	(arrow_index): BOW_DEFAULT_METHOD renamed to BOW_ARGP_METHOD.	* arrow.c (arrow_index): Set the method according to 	BOW_DEFAULT_METHOD.	* opts.c: Fleshed out into first working version.	* error.c: Comment fix.  Include libbow.h and stdio.h.	* deflexer.c (_bow_default_lexer_init): New constructor function.	(bow_default_lexer_simple, bow_default_lexer_indirect,	bow_default_lexer_gram, bow_default_lexer_html, 	bow_default_lexer_email): New variables, default instantiations of 	lexers.	* bow/libbow.h: Add argp declarations.	(bow_argp_children): New variable.	(bow_prune_vocab_by_infogain_n): New variable.	(bow_prune_vocab_by_occur_count_n): New variable.	(bow_default_method): New variable.	(bow_data_dirname): New variable.	* arrow.c: Convert to using argp for command-line processing.	* Makefile.in: Change all instances of `libbow.h' to `bow/libbow'.	(includedir): Add `/bow' to end.	(LIBBOW_C_FILES): Add opts.c.	(ALL_CPPFLAGS): add -I$(srcdir)/bow and -I$(srcdir)/argp.	(rainbow-lisp.o): Use $< instead of rainbow.c, so VPATH will find it	when compiling in a different directory than the source.	* bow/libbow.h (STRINGIFY): New macro.	(bow_default_lexer_simple, bow_default_lexer_indirect,	bow_default_lexer_gram, bow_default_lexer_html, 	bow_default_lexer_email): Declare default instantiations of 	lexers.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -