📄 41.txt
字号:
发信人: GzLi (笑梨), 信区: DataMining
标 题: [合集]关于文本的索引问题
发信站: 南京大学小百合站 (Sat Sep 21 14:05:46 2002), 站内信件
singhoo (tony) 于Thu Sep 19 09:18:37 2002提到:
我要做web文本分类,首先要对数据集做索引
请问索引一般怎么做,知道的大侠请指点一下!3x!
sinokdd (KDD in China) 于Thu Sep 19 12:06:47 2002)
提到:
Index for what? For the words or the pages?
I know rainbow is a system can do text classification,
maybe you can find something interesting from it.
fervvac (高远) 于Thu Sep 19 12:39:06 2002提到:
what kind of index you wanted?
Or what kind of queries you are considering?
singhoo (tony) 于Thu Sep 19 21:19:58 2002提到:
the aim of indexing is to get some statistic data such as word vocabulary,
word vector of each document in the text dataset and so on.
you know that indexing of text dataset is the first step in text
classification.
The source code of the rainbow system is too long to read clearly and it can't
index chinese text
can somebody give me some advide on how to make index ?thx a lot!
fervvac (高远) 于Thu Sep 19 21:39:35 2002提到:
That's just preprocessing, right?
If your algorithm will have to access *all* such summary information later,
you do not need to index them. So it highly depends on what kind of access
you will have.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -