⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 todo

📁 机器学习作者tom mitchell的书上代码
💻
字号:
Bag-Of-Words Library ToDo's===========================* Write bow_barrel_new_from_file(), so we don't get so confused about  NOT closing the FP.* Make new versions of structure-file-saving code that take a filename  and a directory name.  It will be easier to use them.* Rename `bow_cdoc->length' to `bow_cdoc->norm'* Rename `bow_cdoc->filename' to `bow_cdoc->name'* Make `bow_cdoc->class' be a vector of floats.* Rename `bow_wi2dvf_dv()' to `bow_wi2dvf_dv_at_wi()'* Standardize on use of either `entry' or `entries'.* Rename all `2' to `_to_'.* Rename all bow_dv_heap* to bow_dvheap*.* Rename bow_dv_heap_update() to bow_dvheap_next().* change bow_cdoc->word_count from int to float (or double)Remove rainbow_classnamesAre all filename_to_classname() calls still necessary?Examine vpc() and fix to take advantage of barrel->classnames.In rainbow_print_weight_vector() find the class index more efficiently.Likewise for rainbow_print_foilgain()Rename bow_free_barrel() to bow_barrel_free...something.Free heaps in places that they are not!Rename bow_prune_words_by_doc_count_n to bow_prune_vocab_by_doc_count_nChange all occurrences of "prune" to "hide".Take a look at (lex-suffixing.c)bow_lexer_suffixing_get_word - might want  to change bow_lexer_html_get_raw_word to bow_default_lexer->get_wordReplied: Mon, 02 Feb 1998 13:55:04 -0500Replied: ""L. Douglas Baker" <ldbapp@cs.cmu.edu> "Return-Path: ldbapp@cs.cmu.edu Received: from tera.jprc.com (TERA.JPRC.COM [207.86.147.221])	by sandbox.jprc.com (8.8.5/8.8.5) with SMTP id NAA04318	for <mccallum@sandbox.jprc.com>; Mon, 2 Feb 1998 13:52:04 -0500Received: from LDBAPP.JPRC.COM (LDBAPP.JPRC.COM [207.86.147.208]) by tera.jprc.com (NTMail 3.03.0014/1.agyw) with ESMTP id ta116551 for <mccallum@sandbox.jprc.com>; Mon, 2 Feb 1998 13:52:31 -0500Message-Id: <3.0.32.19980202135303.009a64c0@mail.jprc.com>X-Sender: ldbapp@mail.jprc.comX-Mailer: Windows Eudora Pro Version 3.0 (32)Date: Mon, 02 Feb 1998 13:53:05 -0500To: Andrew McCallum <mccallum@jprc.com>From: "L. Douglas Baker" <ldbapp@cs.cmu.edu>Subject: bow commentsMime-Version: 1.0Content-Type: text/plain; charset="us-ascii"Andrew,Here are some comments I wrote down when I was learning my way around bow.You said you'd like to see these someday.  They are things that I thinkmight need to be explaining in any bow documentation that might get writtenin the future.-Doug--------------------------------------------------------------------------------These are "gotchas" that should be addressed in any documentation that iswritten about the bag of words library.--------------------------------------------------------------------------------The document vectors in a barrel are not all loaded at the beginning, but areloaded only on demand.  Thus, to access one you should use bow_wi2dvf_dv().The documents in a bow_dv are not in the array in any particular order.  Toaccess one you should use _bow_dv_index_for_di().  However, if you try toaccessa di that does not exist, this function will automatically make space for it.Maybe there should be a similar function that returns NULL if the requested didoes not exist.There is a function bow_wi2dvf_dv(bow_wi2dvf *, int) which returns a dv*from awi2dvf.  This would make you think that you should acess the dv's this way:        dv1 = bow_wi2dvf_dv(wi2dvf, wi);But then there is a functionbow_dv_add_di_count_weight(bow_dv**, int, int, float) that modifies theentriesin the dv.  You'd think that if you accessed a dv as above, you could then addto it like this:        bow_dv_add_di_count_weight(&dv1, di, count, weight);But this won't work because the original dv that you really should beaccessingis wi2dvf->entry[wi].dv.  Changing dv1 only changes a (presumably) localvariable.--------------------------------------------------------------------------------Other Questions--------------------------------------------------------------------------------What is the protocol regarding "hidden" words?Are the wi's guaranteed to span the range 0..n with no holes?What is the difference between size and num_words, or length in allthe structures?  Are the differences consistent throughout?  It seemslike size is the number of items for which memory has been allocatedand num_words or length is the number of items that are actually beingused.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -