📄 changelog
字号:
1999-10-15 Gregory C Schohn <gcs@cmu.edu> * svm_base.c: fixed some preprocessing bugs - also mildly cleaned up the code... * svm_smo.c: started fixing bugs (the definition of the error vector changed - the modified code did not also change in some spots)- there is still 1 left * README_SVM: up to date - ready for release.1999-10-13 Andrew McCallum <mccallum@justresearch.com> * Makefile.in: Fix errors from last check-in. 1999-10-13 Gregory C Schohn <gcs@cmu.edu> * Makefile.in (SVM_FILES): Added a check for pr_loqo.[ch] - if they are there the svm parts of bow are built with it, otherwise they parts of the code that use pr_loqo are turned off. The conditional sets necessary files & defines. (STANDARD_LIBBOW_H_FILES): added bow/svm.h (STANDARD_LIBBOW_C_FILES): added $(SVM_FILES) (4-5 files, replacing svm.c) * configure.in: removed checks for pr_loqo.* - that's now done in the makefile. added a check for the fpsetmask macro (which is necessary on at least freebsd boxes to turn ieee math on).1999-10-07 Gregory C Schohn <gcs@cmu.edu> * svm.c: (al_svm_test_wrapper) added printing of documents added & the # of bound support vectors.1999-10-06 Dayne Freitag <dayne@tweed.jprc.com> * tagged.lex: HTML entities now recognized by lex, rather than function is_entity. Parser returns when label is lexed; return value indicates word, begin label, or end label. * opts.c (parse_bow_opt): Conditional removal of code creating the bow data directory which is not appropriate for DART and FDART. * labels.c (bow_last_label): New function. * archer_query_index.c: Major re-write of previous code. * archer_query_array.c: Major re-write of previous code, much of which was buggy. * archer_query.y: Deleted the "term >N term" syntax as superfluous. Added the "term < term" syntax. Added the "word" type and "WORD" terminal to distinguish from NUMBER. * archer_query.c: Code to free allocated structures. * archer-server.c (archer_query_socket_init): Now releases socket on failure. Unix socket support. Streamlined code by changing code under archer_query_serve_one_query. New functions: archer_query_serve_one_admin_command, archer_query_serve_admin_commands, archer_query_server_command_loop, archer_query_serve_regular_query, archer_query_server_process_commands. Added security features. New functions: archer_remote_host_matches_spec, archer_query_password_ok. (archer_server_index): Call archer_archive after indexing. (archer_server_index_with_markup): New function. (archer_server_query_new): Fixed output. Decomposed, adding new functions archer_server_print_hitlist and archer_server_print_hit. Added ndump command and new functions archer_server_dump_new, archer_server_dump_preamble, and archer_server_do_dump. Added fields command and new function archer_server_fields. * archer_query_execute.c: Fixed many memory leaks, and rewrote large sections. * annotation.c (annotation_sarray_reread): New function. * archer.c: Allow mark-up spoofing of the indexer, batch incremental indexing, passwords and IP-based client restriction. New executables (conditionally compiled) DART, FDART, and IDART. (archer_get_fp_from_filename): New function. (flush_labels): New function. (archer_index_term): New function. (archer_index_label): New function. (archer_index_filename_flex): Some code removed to above functions. (archer_index): Changed to prevent re-opening/re-construction of already opened files and existing data structures (needed for batch incremental indexing). 1999-10-06 Gregory C Schohn <gcs@cmu.edu> * svm.c: (svm_vpc_merge) fixed preprocessing bug that caused big problems when no weighting was being used in conjunction with pairwise voting. * also changed all of the options *, to svm-*.1999-10-05 Andrew McCallum <mccallum@justresearch.com> * TODO: Remove and few items that were done. * NEWS: Describe some new features. * HACKING: Update to remove no-longer-available CVS server description. * barrel.c (bow_barrel_keep_top_words_by_infogain): Fix verbosity message for vocabulary sizes under 5.1999-10-05 Andrew McCallum <mccallum@justresearch.com> * docnames.c (bow_map_filenames_from_dir): Fix printf argument.1999-09-29 Gregory C Schohn <gcs@cmu.edu> * svm.c: a lot of minor small changes (like printing stuff), no bug fixes - make sure to suppress the score matrix (which is hundred's of MB large) if test-in-train is used with active learning in the active learning stuff if you don't want it!1999-09-24 Andrew McCallum <mccallum@justresearch.com> * dirichlet.c: Added ability to do simple classification. For example: (echo 2 ; cat ~/research/projects/dicefactory/synth1/bar.counts ) | ./dirichlet -c 2 18.4738 26.1034 2.49099 2.049991999-09-22 Andrew McCallum <mccallum@justresearch.com> * docnames.c (bow_map_filenames_from_dir): When a directory can't be opened, simply skip it instead of trying to open it as a file. (This works around Linux bug whereby directories seem to disappear.)1999-09-20 Kamal Nigam <knigam@zeno.jprc.com> * maxent.c: New code for options maxent-vary-prior-by-count maxent-gaussian-prior-no-zero-constraints maxent-prune-features-by-count maxent-vary-prior-by-count-linearly.1999-09-19 Gregory C Schohn <gcs@cmu.edu> * configure.in: added check for srandom - which was & still is necessary for libbow.h * bow/libbow.h: changed the defines for srandom & random so that both get redefined if one is missing.1999-09-09 Andrew McCallum <mccallum@justresearch.com> * train_dirichlet.c: Allow the main() test driver to be compiled in by simply defining TD_MAIN on the gcc command-line. * random.c (bow_random_reset_seed): New function. (bow_random_set_seed): Make it work with the above function. * multiclass.c (multiclass_iterated_mixture_given_doc_and_cis): New function. (multiclass_mixture_given_doc): Allow this to be called with a test document too. (multiclass_log_prob_of_classes_given_doc): Add commented-out code to implement BIC. (multiclass_explore_cis_greedy1): Bug fixes. * cdmem.c (cdmem_parse_opt): Allow printing of the accuracy on the unlabeled set. (bow_cdmem_class_wi2dvf): New function. (bow_cdmem_new_vpc_with_weights): Save original document types and classes. Using new macro SET_CASCADE_TREE_WITH_ALL_DATA, allow three different options for training the distance function. Allow multiple rounds of CDM distance metric learning. * cdm.c: Include <bow/train_dirichlet.h> instead of defining train_dirichlet() extern here. (bow_cdm_initialize_ct): Make it safe to call this function more than once. (bow_cdm_ct_set_alphas): Define COUNTS as double* instead of unsigned*. Print the bottom-most word in the cascade tree. Don't assert DV. * barrel.c (bow_barrel_add_from_text_dir): Instead of crashing when failing to open a file, simply print warning.1999-09-03 Gregory C Schohn <gcs@cmu.edu> * svm.c: checkpoint - some bitrotting code may not work... * svm.c: updated SMO to work with Keerthi, et al's modifications - the heuristic is much better, but the running time on 20 newsgroups is still slower than Thorsten's methods. * svm.c: added a lot of active learning logging.1999-08-18 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c, bow/train_dirichlet.h: Added train_sum_alpha global variable.1999-08-18 Andrew McCallum <mccallum@justresearch.com> * bow/libbow.h: Declare new functions. * bow/cdm.h: Update function prototypes to match. * wv.c (bow_wv_copy): New function. * wi2dvf.c (bow_wi2dvf_set_idf_to_count): New function. (bow_wi2dvf_dv_hidden): New function. * wa.c (bow_wa_remove): New function. * vpc.c (bow_wi2dvf_sum): New function. (bow_barrel_new_vpc): Move the updating of the CDOC->WORD_COUNT to earlier in the function. * rainbow.c (rainbow_test): In the test documents include words that were previously removed from the training set by, for example, feature selection. * heap.c (bow_make_dv_heap_from_wi2dvf_hidden): New function. * naivebayes.c (bow_naivebayes_pr_wi_ci): All m_est_m to be zero, if set as such explicitly on the command-line. (bow_naivebayes_total_word_count_for_ci): New function. * ddf.c: Add ADDITIONAL_COUNT. * dice.c: Add many command-line options. * ctdf.c: Add handling for zerotons and unknown words. * bpe.c: Include bpe_prior_alpha, and various other bug fixes. * barrel.c: Clean up some verbosity messages. (bow_barrel_set_idf_to_count_in_train): New function. * train_dirichlet.c: Change name from gammaln_fast to gammaln, so this function can be used, depending on the #define. * cdm.c: Many completions and bug fixes. * cdmem.c (SET_CASCADE_TREE_WITH_ALL_DATA): New macro. (bow_cdmem_new_vpc_with_weights): Depending on above macro, use all labels to set the distance metric.1999-08-13 Gregory C Schohn <gcs@cmu.edu> * svm.c: added active learning stuff - has a pretty bad selection heuristic (subject to pathological cases), had to change (modularize) different sections of the code...1999-08-09 Gregory C Schohn <gcs@cmu.edu> * svm.c: the removal of inconsistent examples (for the Thorsten-like algorithm) is working, it still needs to be extended for SMO.1999-08-01 Gregory C Schohn <gcs@cmu.edu> * svm.c: 3 bug fixes - one in the lagrange multiplier check, one in the loop to call pr_loqo (the maximum number of iterations wasn't increasing when pr_loqo could not converge), & a bug that caused the equality constraint to fall apart when the working set size was less than the maximum working set size (actually, rewrote that block to be way more efficient). * svm.c Also played with the kernel cache in a lot of different ways, a simple (but not to simple) solution that is committed yields the best times (when the kernel cache is not grossly smaller than the number of support vectors squared).1999-07-28 William Morgan <wmorgan@jprc.com> * archer_query_array.c: added * archer_query_array.h: added * archer_query_execute.c: added * archer_query_execute.h: added * archer_query_index.c: added * archer_query_index.h: added * pv.c (bow_pv_read_next_di_li_pi): removed useless assert() * archer_query.c: reformatted for better GNU style * archer_query.h: ditto * archer.c (mem_error): added, as well as other stuff surrounded by ARCHER_USE_MCHECK defines for optional memory checking * archer-server.c (archer_server_query_new): made nquery command call new query engine. right now this dumps core. * Makefile.in (ARCHER_C_FILES): added archer_query_ files (ARCHER_H_FILES): ditto1999-07-19 Thomas P. Minka <minka@jprc.com> * train_dirichlet.c (train_dirichlet_nr): Added the option to not train the sum of alphas, only their ratios.1999-07-16 Andrew McCallum <mccallum@justresearch.com> * ddf.c: Debug and add smoothing. It smooths in the same proportion that adding a pseudo-count of 1 would do for naivebayes.1999-07-16 William Morgan <wmorgan@jprc.com> * Makefile.in: fixed small bug that occasionally caused make to overwrite archer_query.c1999-07-16 Gregory C Schohn <gcs@cmu.edu> * svm.c: new version - about 3 times as fast thanks to re-using error-cache values for the kkt conditions instead of re-calculating them each time. A valid cache bitmap was also added... * svm.c: made semi-small bug fixes, like precision checks, the removal of a bogus heuristic check & some small coding bugs...1999-07-16 William Morgan <wmorgan@jprc.com> * bow/archer.h: removed ARCHER_MAX_LABEL_PARAMS (cruft) * flex_mail.lex: modified function naming scheme to work with the new way archer lex files are handled * configure.in: added AC_PROG_LEX to configure lexer generator * archer_query.y: added * archer_query.lex: added * archer_query.h: added * archer_query.c: added * archer.c (archer_query_hits_matching_sequence): fixed bug that caused archer to hang * Makefile.in: added rules for .lex and .y files for archer * archer-server.c (archer_server_query_new): added (archer_server_query_hits_matching_sequence): fixed bug that caused archer to hang 1999-07-15 Jason Reed <jcreed@cyclone.jprc.com> * archer.c (archer_index_filename_flex): Don't write redundant wi2pv information.1999-07-13 Jason Reed <jcreed@cyclone.jprc.com> * archer-server.c (archer_server_query): Include terms matched in query results. * archer-server.c: Added 'hits' command to select a range of hits to show. (No need to send N tens of thousands of hits over the socket when someone searches for 'artificial intelligence' or something equally general)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -