【读书笔记】机器学习实战第7章 7.7节非均衡分类问题

xiaoxiao2021-02-27 210

机器学习实战

7.7节非均衡分类问题

分类性能指标：

错误率指错分样本的比例，这样的度量掩盖了样例是如何被错分的事实。有一个普遍适用的称为混淆矩阵

真实(+1)真实(−1)预测(+1)真正例（TP）伪正例（FP）预测(−1) 伪反例（FN）真反例（TN）

分类中，当某个类别的重要性高于其他类别时，可以利用混淆矩阵定义出比错误率更好的指标：

查准率/正确率

P=TPTP+FP

查全率/召回率

R=TPTP+FN

P-R曲线/查准率-查全率曲线

真正确率/真阳率 TPR

TPR=TPTP+FN 假正确率/假阳率 FPR

FPR=FPTN+FP ROC曲线

机器学习实战 7.7 节

# ROC曲线绘制 def plotROC(predStrengths, classLabels):# 预测强度向量，样本标签 import matplotlib.pyplot as plt # 导入库 cur = (1.0,1.0) #cursor ySum = 0.0 #variable to calculate AUC numPosClas = sum(array(classLabels)==1.0) yStep = 1/float(numPosClas); xStep = 1/float(len(classLabels)-numPosClas) sortedIndicies = predStrengths.argsort()#get sorted index, it's reverse fig = plt.figure() fig.clf() ax = plt.subplot(111) #loop through all the values, drawing a line segment at each point for index in sortedIndicies.tolist()[0]: if classLabels[index] == 1.0: delX = 0; delY = yStep; else: delX = xStep; delY = 0; ySum += cur[1] #draw line from cur to (cur[0]-delX,cur[1]-delY) ax.plot([cur[0],cur[0]-delX],[cur[1],cur[1]-delY], c='b') cur = (cur[0]-delX,cur[1]-delY) ax.plot([0,1],[0,1],'b--') plt.xlabel('False positive rate'); plt.ylabel('True positive rate') plt.title('ROC curve for AdaBoost horse colic detection system') ax.axis([0,1,0,1]) plt.show() print "the Area Under the Curve is: ",ySum*xStep

转载请注明原文地址: https://www.6miu.com/read-16583.html

技术

最新回复(0)

【读书笔记】机器学习实战 第7章 7.7节非均衡分类问题

机器学习实战

技术

【读书笔记】机器学习实战第7章 7.7节非均衡分类问题