📄 建表说明.txt
字号:
DataBase YellowPageProject_TermSelector
Table
-------------------
Category(ID、name)
DocInforLib
(docID、cateID、docPath、isDone)
WordLib(每类count 作限制)
(word、cateID、docID、termFrequence)
TfidfLib
(word、cateID、weight、termFrequence、docFrequence)
TermLib_temp
(cateID、word、weight、docFrequence)
TermLib
(cateID、word、weight、docFrequence)
-------------------
Class Category
public string GetCateIDByCateName(string name)
-------------------
Class DocLib
private Doc doc;
bool addDoc(doc);
-------------------
Class Doc
private string docID;
private string cateID;
private string path;
private bool isDone;
private string body;
public string GetBody();
-------------------
Class TermSelector
void createTfidfLib();
void createTermLib_Temp();
void createTermLIb();
-------------------
Class WordLib
private string word;
private string cateID;
private string docID;
private int termFrequence;
private void CountTF(string[] words, Common.Doc doc)
public void addWord(Common.Doc doc);
public int DeleteSigleWord()
public int DeleteRareWord()
public int DeleteWordByTF(int termFrequence)
-----------------------------------
以5篇文档为例,经过单字词、噪音词、低频词<5 去除后, 剩词条255个;经过文档频数df<3去噪后,还剩原来的1/5。
而未经去噪的词条却有2604个
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -