⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 建表说明.txt

📁 实现文本的特征提取
💻 TXT
字号:
DataBase YellowPageProject_TermSelector

Table 
-------------------
Category(ID、name)

DocInforLib 
(docID、cateID、docPath、isDone)

WordLib(每类count 作限制)
(word、cateID、docID、termFrequence)

TfidfLib
(word、cateID、weight、termFrequence、docFrequence)

TermLib_temp
(cateID、word、weight、docFrequence)

TermLib
(cateID、word、weight、docFrequence)

-------------------
Class Category
public string GetCateIDByCateName(string name)

-------------------
Class DocLib
private Doc doc;
bool addDoc(doc);

-------------------
Class Doc
private string docID;
private string cateID;
private string path;
private bool isDone;
private string body;

public string GetBody();

-------------------
Class TermSelector
void createTfidfLib();
void createTermLib_Temp();
void createTermLIb();

-------------------
Class WordLib
private string word;
private string cateID;
private string docID;
private int  termFrequence;

private void CountTF(string[] words, Common.Doc doc)
public void addWord(Common.Doc doc);
public int DeleteSigleWord()
public int DeleteRareWord()
public int DeleteWordByTF(int termFrequence)
-----------------------------------

以5篇文档为例,经过单字词、噪音词、低频词<5 去除后, 剩词条255个;经过文档频数df<3去噪后,还剩原来的1/5。
				而未经去噪的词条却有2604个

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -