📄 talks.info
字号:
A simple text classification problem: classify postings to /usr/msgsas talk announcements or "other". There are 818 messages, going backto Aug 93. I used messages numbered 500 and up as test cases.Average words/message is around 160. I used a simple lex program totokenize the data. Class labels were obtained by manual inspection.Doug McIlroy suggested this dataset, and also suggested that egrep -i'talk|abstract' would be hard to beat. (Which is correct). recall prec. #errors %err fp fndoug's egrep 22 6.77 8 14rocchio -m acc 75.9 96.9 22 6.77 2 20rocchio -m F 95.2 86.8 16 4.92 12 4ripper -L0.06 83.1 87.3 24 7.38 10 14 ripper -L0.125 89.2 89.2 18 5.54 9 9ripper -L0.25 95.2 91.9 11 3.38 7 4ripper -L0.5 94.0 91.8 12 3.69 7 5ripper -L1 89.2 92.5 15 4.62 6 9ripper -L1.5 79.5 95.7 20 6.15 3 17ripper -L2 77.1 100.0 19 5.85 0 19
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -