建表说明.txt
来自「实现文本的特征提取」· 文本 代码 · 共 62 行
TXT
62 行
DataBase YellowPageProject_TermSelector
Table
-------------------
Category(ID、name)
DocInforLib
(docID、cateID、docPath、isDone)
WordLib(每类count 作限制)
(word、cateID、docID、termFrequence)
TfidfLib
(word、cateID、weight、termFrequence、docFrequence)
TermLib_temp
(cateID、word、weight、docFrequence)
TermLib
(cateID、word、weight、docFrequence)
-------------------
Class Category
public string GetCateIDByCateName(string name)
-------------------
Class DocLib
private Doc doc;
bool addDoc(doc);
-------------------
Class Doc
private string docID;
private string cateID;
private string path;
private bool isDone;
private string body;
public string GetBody();
-------------------
Class TermSelector
void createTfidfLib();
void createTermLib_Temp();
void createTermLIb();
-------------------
Class WordLib
private string word;
private string cateID;
private string docID;
private int termFrequence;
private void CountTF(string[] words, Common.Doc doc)
public void addWord(Common.Doc doc);
public int DeleteSigleWord()
public int DeleteRareWord()
public int DeleteWordByTF(int termFrequence)
-----------------------------------
以5篇文档为例,经过单字词、噪音词、低频词<5 去除后, 剩词条255个;经过文档频数df<3去噪后,还剩原来的1/5。
而未经去噪的词条却有2604个
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?