📄 changelog
字号:
1999-07-13 Gregory C Schohn <gcs@cmu.edu> * svm.c: includes smo, which works fine, but is slower than thorsten's algorithm... for now.1999-07-13 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (newton_step): Don't try to change an alpha which is zero, because its gradient will always be negative.1999-07-12 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (gammaln,digamma,trigamma): Changed to a higher precision algorithm. No effect on Dirichlet fitting, however.1999-07-12 Jason Reed <jcreed@cyclone.jprc.com> * lex-suffixing.c (bow_lexer_suffixing_postprocess_word): Fixed off-by-one bug, I think. * archer.c: Does incremental writes in query server. (Does *not* do pure incremental writes if we are just doing an --index) Removed label name and document name fields from archer_labels and archer_docs entries, since they made entries variable length. * archer-server.c: Added indexing capability. (on second socket) Always use simple lexer for query lexing, independent of data lexing. Do 'xxx' suffixing iff appropriate. * wi2pv.c: Made incremental. * int4str.c: Added incremental functions. * int4word.c: likewise. * sarray.c: likewise. * array.c: likewise. * bow/archer.h: likewise. * bow/libbow.h: likewise. 1999-07-09 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (train_dirichlet_sparse, train_dirichlet_nr, logProb, moment_match): Added extra_count parameter.1999-07-08 William Morgan <wmorgan@jprc.com> * archer-server.c (archer_query_socket_init): added SIGPIPE handling; archer server mode now no longer crashes as easily1999-07-08 Kamal Nigam <knigam@server5.jprc.com> * Makefile.local (RAINBOW_METHOD_C_FILES): Added emsimple.c.1999-07-08 Andrew McCallum <mccallum@justresearch.com> * Makefile.local (RAINBOW_METHOD_C_FILES): Added nbsimple.c * naivebayes.c (bow_naivebayes_score): Initialize NUM_SCORES to get rid of GCC warning. * crossbow.c (crossbow_options): New option --use-vocab-in-file. (struct crossbow_arg_state): New element VOCAB_MAP. (crossbow_doc_read): Read the DOC->CIS. (crossbow_index_multiclass_list): Implement BOW_PRUNE_VOCAB_BY_OCCUR_COUNT_N. (crossbow_index): Handle the VOCAB_MAP. * hem.c (crossbow_hem_em_one_iteration): Print the perplexity before returning.1999-07-02 Andrew McCallum <mccallum@justresearch.com> * ddf.c: Random bug fixes. Added option --ddf-prior-alpha.1999-07-08 Gregory C Schohn <gcs@cmu.edu> * tfidf.c (tfidf): changed cdocs->length to ndocs so that only those documents which could change the df value are considered. * svm.c: added support for tfidf scoring for each submodel (could easily be extended to any type of scoring...) 1999-07-02 Gregory C Schohn <gcs@cmu.edu> * Makefile.in: added rules for svm.c & pr_loqo.c which are filled in by configure, if they exist... * configure.in: added a check for pr_loqo.h & pr_loqo.c, if they are there, the makefile will build libbow with its complete svm package, otherwise, svm.c is ignored. * rainbow.c (main): added support for build-and-save & test-from-saved (allowing the user to build a model, then reuse it on succesive runs). Also added support for svm.c. * wi2dvf.c (bow_wi2dvf_add_di_wv): bow_dv_add_di_count_weight now adds the weight value to the dv. * naivebayes.c (bow_naivebayes_score): added naivebayes_score_returns_doc_pr flag so that P(X|C) is returned instead of P(C|X). Added naivebayes_score_unsorted so that the array is returned in unsorted (ie. each ith index is for the ith class). * bow/naivebayes.h: added naivebayes_score_returns_doc_pr & naivebayes_score_unsorted globals so that naivebayes.c can be extended for the fisher kernel in svm. * svm.c: fixed a couple of typos & changed some outdated code. * bow/svm.h: Initial check-in.1999-07-02 William Morgan <wmorgan@jprc.com> * Makefile.in (ARCHER_C_FILES): added required files for archer compilation that had been lost previously; added lex -> c rule * Makefile.local: removed unnecessary lexing rule (now in Makefile.in) * annotation.c: added GPL header * labels.c: ditto * server.c: moved to archer-server.c * tagged.lex: moved to tagged_lex.lex1999-07-01 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c: Changed several functions to use the new sparse iterator scheme. (moment_match): Uses n_group_by_key instead of n_group.1999-07-01 Andrew McCallum <mccallum@justresearch.com> * ddf.c: Fix argument types for train_dirichlet_sparse, and call with correct types. * bow/libbow.h (bow_iterator_double): New type. * barrel.c: Added an iterator for the columns of a barrel that match a class. (bow_barrel_iterator_for_ci_new): New function. * array.c: Added (commented-out) code for an iterator over a cdoc. * train_dirichlet.c (train_dirichlet_nr): Initialize old_logProb to 0 to get rid of gcc warning. * ddf.c: Use new iterator. * Makefile.in: Drastically rearranged to make different sections for different libbow front-ends. * Makefile.local: Updated to handle new Makefile.in organization. Removed rules for Pete Su's old archer query parser. * Makefile.preamble: Emptied. I think this file is no longer necessary.1999-07-01 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (train_dirichlet): Changed to call train_dirichlet_nr for the work. (train_dirichlet_sparse): Same as train_dirichlet but for sparse counts. (train_dirichlet_nr): General Newton-Raphson with option for sparse counts. (moment_match): Changed to save memory. Added option for sparse counts. (logProb): Added option for sparse counts.1999-06-30 William Morgan <wmorgan@jprc.com> * annotation.c: new file * bow/archer.h: added annotation function interfaces and structs * server.c (archer_query_socket_init): added annotation handling code (archer_server_query): ditto * opts.c (bow_options): added ANNOTATION_KEY option * Makefile.local (ARCHER_C_FILES): added annotation.c1999-06-29 William Morgan <wmorgan@jprc.com> * bow/libbow.h: added USE_TAGGED_FLEXER * bow/archer.h: changed lexer interfaces slightly, and moved a few things from archer.c * tagged.lex: created * flex_mail.lex (flex_mail_get_word_extended): added * opts.c (bow_options): added FLEX_TAGGED_KEY * archer.c (archer_index_filename_flex): added tagged flexer option (archer_query_hits_matching_wi): fixed small bug (archer_query_hits_matching_sequence): fixed another small bug (archer_query_socket_init): removed (to server.c) (archer_query_server_process_commands): ditto (archer_query_serve_one_query): ditto (archer_query_serve): ditto * Makefile.local (ARCHER_C_FILES): added flex_mail.c, tagged_lex.c and server.c * server.c: created. moved all server code from archer.c to here.1999-06-28 Andrew McCallum <mccallum@justresearch.com> * ddf.c (bow_ddf_dirichlet_from_doc_word_counts): Make it use the train_dirichlet_sparse(). * ddf.c: New file. 1999-06-15 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (train_dirichlet): Changed to always be conservative.1999-06-11 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (train_dirichlet): Handles the all zero case properly. * train_dirichlet.c (main): Reads count data from stdin.1999-06-11 Kamal Nigam <knigam@zeno.jprc.com> * cdm.c (bow_cdm_ct_set_alphas): change asserts to allow 0 alphas * cdmem.c (bow_cdmem_new_vpc_with_weights): Fix word count = 0 case1999-06-11 Kamal Nigam <knigam@server6.jprc.com> * train_dirichlet.c (train_dirichlet): changes from Tom. * naivebayes.c (bow_naivebayes_score): Fixed memory trashing bug. * em.c (bow_em_score): cosmetic fixes only. * cdmem.c (bow_cdmem_new_vpc_with_weights): Fixed invocation of bow_barrel_score. * cdm.c (bow_cdm_word_probs_using_ct_alphas): Removed some code at Andrew's request. Changed assert to allow some roundoff error. (bow_cdm_score): Fixed memory-trashing bug.1999-06-10 Kamal Nigam <knigam@zeno.jprc.com> * bow/libbow.h (bow_barrel_set_weights): checked for null function * Makefile.preamble (EXTRA_METHOD_C_FILES): added cdmem.c * cdm.c (bow_cdm_word_probs_using_ct_alphas): added assert to check for NaN (bow_cdm_print_word_probs): Removed superfluous exit. (bow_cdm_new_vpc): Removed diagnositc.1999-06-10 Andrew McCallum <mccallum@justresearch.com> * train_dirichlet.c: Use new improved method with iteration. It also now works for Dirichlet densities of arbitrary size, not just Betas. (From Tom Minka.) * cdm.c (bow_cdm_word_probs_using_ct_alphas): Fix the setting of the bottom-most word in the cascade tree. * cdm.c: Several bug fixes. Now runs. * Makefile.preamble (EXTRA_METHOD_C_FILES): Added bpe.c, cdm.c and train_dirichlet.c. * train_dirichlet.c (train_dirichlet): For now, don't do newton iterations. * rainbow.c (rainbow_unarchive): Handle the case in which the OUTPUTNAME_FILENAME doesn't exist in the model directory. * opts.c (parse_bow_opt): Add "dw" as an alias for "document-then-word". (MAX_NUM_CHILDREN): Upped from 10 to 100. (bow_argp_add_child): Fix assertion to complain if we overrun again. * int4word.c (bow_words_add_occurrences_from_file): New function. (bow_words_add_occurrences_from_text_dir): Use it. * info_gain.c (bow_infogain_wa): New function. * barrel.c (bow_barrel_add_document): Add comment questioning assert(). * bow/libbow.h: Declare new functions. * multiclass.c: Changed total_num_mixtures_possible calculation. Changed palpha from 1.0 to 0.01. Changed malpha from 0 to 1. Changed pruning class set size from 4 to 3. Print a warning if the correct class vector was never evaluated.1999-06-10 Andrew McCallum <mccallum@justresearch.com> * cdm.c: Implemented but not tested.1999-06-10 Kamal Nigam <knigam@zeno.jprc.com> * bow/em.h (bow_em_set_priors_using_class_probs): New prototype. * cdm.c (bow_cdm_score): Implemented.1999-06-09 Andrew McCallum <mccallum@justresearch.com> * lex-suffixing.c (bow_lexer_suffixing_postprocess_word): Before rewinding to the beginning of the file (in order to lex without adding suffixes), not only check for two newlines in a row, but also check for the end of the file, so that files without \n\n and without a trailing \n get processed both with and without suffixes added.1999-05-15 Andrew McCallum <mccallum@justresearch.com> * multiclass.c: Backoff the per-class-set mixture distribution in a way loosely based on shrinkage. More bug fixes.1999-05-14 Andrew McCallum <mccallum@justresearch.com> * multiclass.c: Many bug fixes and enhancements to class set search. * lex-japanese.c (bow_lexer_japanese_get_word): Minor bug fixes. * multiclass.c: Overhauled version with better search in class vector space. * Makefile.preamble: Instead of conditioning on "ifdef unix", condition on "ifndef WIN32", since "unix" wasn't defined on UNIX. Fix this properly later. (EXTRA_LIBBOW_H_FILES): Add more. * Makefile.local (DIST_ALL_FILES): Add extra Makefiles.1999-06-01 Jason Reed <jcreed@cyclone.jprc.com> * archer.c (archer_index_filename_old_lex)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -