【机器学习3】基于决策树的Adaboost增强算法

xiaoxiao2021-02-28  84

定义:The AdaBoost algorithm is an iterative procedure that combines many weak classifiers to approximate the Bayes classifier C∗(x). Starting with the unweighted training sample, the AdaBoost builds a classifier, for example a classification tree (Breiman, Friedman, Olshen & Stone 1984), that produces class labels. If a training data point is misclassified, the weight of that training data point is increased (boosted). A second classifier is built using the new weights, which are no longer equal. Again, misclassified training data have their weights boosted and the procedure is repeated. Typically, one may build 500 or 1000 classifiers this way. A score is assigned to each classifier, and the final classifier is defined as the linear combination of the classifiers from each stage.

单分类器与组合分类器间的关系:传统的分类器包括决策树分类器,基于规则的分类器,最近邻分类器,神经网络,支持向量机等。上述分类技术都是使用训练数据所得单个分类器来预测未知样本的类标号。而通过聚集多个分类器的预测可以提高分类的准确率,这种方法称为组合方法。Adaboost 算法是常用的组合方法。

弱学习v.s.强学习 强学习:一个概念如果存在一个多项式的学习算法能够学习它,并且正确率很高,那么,这个概念是强可学习的; 弱学习:一个概念如果存在一个多项式的学习算法能够学习它,并且学习的正确率仅比随机猜测略好(高于50%),那么,这个概念是弱可学习的; ps:强可学习与弱可学习是等价的。并且理论证明可以将若干个弱学习分类器通过线性叠加提升为强学习分类器。

The intuitive idea is to alter the distribution over the domain X in a way that increases the probability of the “harder” parts of the space, thus forcing the weak learner to generate new hypotheses that make less mistakes on these parts

三类boosting算法:Boosting模型主要包括Adaboost、GBDT、XGBoost模型

I Adaboost算法用于Multi-factor选股

来源:Adaptive Boosting的简写 说明:Adaboost 是增强算法,它本身并不是分类器算法,运用 Adaboost 做增 强首先需要一个弱分类器的算法。常见的分类器决策树分类器、最近邻分类器、 神经网络、支持向量机等都可以作为 Adaboost 的弱分类器(本文采用的单分类器是决策树,增强组合算法是Adaboost算法) 本例属性:Two-class Adaboost(i.e. 即响应变量y只有两个类别)

说明:下面代码的训练集数据来自于上一篇博文:机器学习2

### Adaboost分类器 ## 导入packages from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt ## Adaboost拟合 bdt_real=AdaBoostClassifier(DecisionTreeClassifier(max_depth=5),n_estimators=600,learning_rate=1) #弱分类器为决策树,产生600个弱分类器 bdt_real.fit(x_train,y_train) n_tree=len(bdt_real) #数的棵树为600棵 ## 分类器预测平均精确度 scores=cross_val_score(bdt_real,x_test,y_test) print("分类器预测平均精确度:",scores.mean()) ## 预测数据集和误差率画图 real_test_errors=[] for y_predict in bdt_real.staged_predict(x_test): #bdt_real.staged_predict(x_test)的数据形式是生成器generator,这里产生了600个生成器, #也就是600次迭代生成的600棵决策树形成的预测结果 real_test_errors.append(1.-accuracy_score(y_predict,y_test)) plt.plot(range(1,n_tree+1),real_test_errors,c='black',linestyle='dashed',label='SAMME.R') plt.legend() plt.ylim(0.18,0.62) plt.ylabel('Test Error') plt.xlabel('Number of Trees') plt.show()

参考资料

1.Freund and SchapireA Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, 1997 Available at: http://www.face-rec.org/algorithms/Boosting-Ensemble/decision-theoretic_generalization.pdf

2.Zhu et al., Multi-class Adaboost, 2006

3.Two-class Adaboost相关package的官方使用文档 http://scikit-learn.org/stable/auto_examples/ensemble/plot_adaboost_twoclass.html

4.https://www.cnblogs.com/hlongch/p/5734293.html

5.李航,统计学习方法,清华大学出版社,2012,p137

6.国信证券,Adaboost算法下的因子选股,2013

7.决策树的adaboost http://blog.csdn.net/sinat_17196995/article/details/62444928

转载请注明原文地址: https://www.6miu.com/read-2620220.html

最新回复(0)