disc1.py
来自「orange源码 数据挖掘技术」· Python 代码 · 共 28 行
PY
28 行
# Description: Entropy based discretization compared to discretization with equal-frequency
# of instances in intervals
# Category: preprocessing
# Uses: wdbc.tab
# Classes: Preprocessor_discretize, EntropyDiscretization
# Referenced: o_categorization.htm
import orange
def show_values(data, heading):
for a in data.domain.attributes:
print "%s/%d: %s" % (a.name, len(a.values), reduce(lambda x,y: x+', '+y, [i for i in a.values]))
data = orange.ExampleTable("../datasets/wdbc")
print '%d features in original data set, discretized:' % len(data.domain.attributes)
data_ent = orange.Preprocessor_discretize(data, method=orange.EntropyDiscretization())
show_values(data_ent, "Entropy based discretization")
print '\nFeatures with sole value after discretization:'
for a in data_ent.domain.attributes:
if len(a.values)==1:
print a.name
import orngDisc
reload(orngDisc)
data_ent2 = orngDisc.entropyDiscretization(data)
print '%d features after removing features discretized to a constant value' % len(data_ent2.domain.attributes)
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?