决策树。weka。信息增益

xiaoxiao2026-05-17 11

首先举出打网球的例子。

Day

Outlook

Temperature

Humidity

Wind

Play Tennis

sunny

hot

high

weak

sunny

hot

high

strong

overcast

hot

high

weak

yes

rain

mild

high

weak

yes

rain

cool

normal

weak

yes

rain

cool

normal

strong

overcast

cool

normal

strong

yes

sunny

mild

high

weak

sunny

cool

normal

weak

yes

rain

mild

normal

weak

yes

sunny

mild

normal

strong

yes

overcast

mild

high

strong

yes

overcast

hot

normal

weak

yes

rain

mild

high

strong

数据集中包含14个样本，其中9个正样本（yes），5个负样本（no）。则这些元组的期望信息（即熵）为：

Info(D) = - 9/14 * log₂(9/14) - 5/14 * log₂(5/14) = 0.940

现在观察每个属性的期望信息需求。在属性Outlook中，对于sunny，正样本数为2，负样本数为3；对于overcast，正样本数为4，负样本数为0；对与rain，正样本数为3，负样本数为2。

按照Outlook划分样例得到的期望信息为：

5/14 * ( - 2/5log₂2/5 – 3/5log₂3/5) + 4/15 * ( - 4/4log₂4/4) + 5/14 * ( - 3/5log₂3/5 – 2/5log₂2/5)=0.694

即其信息增益为：

Gain(outlook) = 0.940 – 0.694 = 0.246

Gain(Temperature) = 0.029

Gain(Humidity) = 0.151

Gain(Wind) = 0.048

继续信息增益的计算，最终得到如下的决策树：

以sunny,mild,normal,FALSE作为测试集，使用决策树，得出其结论为yes。

相关资源：数据挖掘weka使用C4.5实验报告

转载请注明原文地址: https://www.6miu.com/read-5048917.html

Java

最新回复(0)