xgboost之java、python安装mac为例子

xiaoxiao2021-02-28  69

之前用过python下的xgboost,现在想在自己的电脑(os)上折腾下jave版本的xgboost,碰到不少坑,记录下,

1.下载xgboost库

git clone --recursive https://github.com/dmlc/xgboost

2.编译xgboost

查看自己电脑上是否有g++  gcc,在/usr/lib下查看,如果没有,则要安装g++,gcc brew install gcc --without-multilib 这个过程时间很长,我电脑上装了两个小时 然后 ls /usr/local/bin/*

2.1官网提供两种编译xgboost的方式,在mac上,一种支持多线程,另外一种不支持多线程

支持多线程用下面方式进行编译: cd xgboost cp make/minimum.mk ./config.mk make -j4 不支持多线程的编译方式: cd xgboost cp make/config.mk ./config.mk make -j4 若成功安装了gcc的,应该能顺利编译成功,如果报clang: error: : errorunsupported option '-fopenmp'  这种错误, 则在config.mk文件中加入 $ export CC=/usr/local/bin/gcc-7 $ export CC=/usr/local/bin/g++-7 我gcc已经到7了

3.1安装python xgboost,则如下:

cd python-package; sudo python setup.py install 在我的电脑上可以成功: 可以正常使用,测试代码: import numpy as np import xgboost as xgb data = np.loadtxt('train.csv', delimiter=',',converters={14: lambda x:int(x == '?'), 15: lambda x:int(x) } ) sz = data.shape np.random.shuffle(data) #数据随机打乱,测试数据否则抽取的全部是0 train = data[:int(sz[0] * 0.7), :] test = data[int(sz[0] * 0.7):, :] train_X = train[:,0:14] train_Y = train[:, 15] print(type(train_Y)) test_X = test[:,0:14] test_Y = test[:, 15] xg_train = xgb.DMatrix( train_X, label=train_Y) xg_test = xgb.DMatrix(test_X, label=test_Y) params={ 'booster':'gbtree', 'objective':'binary:logistic', 'early_stopping_rounds':100, 'scale_pos_weight':1, 'eval_metric':'auc', 'gamma':'0.1', 'max_depth':8, 'lambda':550, 'subsample':0.7, 'colsample_bytree':0.4, 'min_child_weight':3, 'eta':0.02, 'seed':27, 'nthread':7, } watchlist = [ (xg_train,'train'), (xg_test, 'test') ] xgboost_model = xgb.train(params,xg_train,num_boost_round=3000,evals=watchlist) xgboost_model.save_model('xgb.model') pred= xgboost_model.predict(xg_test) print(pred)

4.安装java版本的xgboost

则进入cd jvm-package    mvn package

如果报如下错误:

Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (checkstyle) on project xgboost-jvm: Execution checkstyle of goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check failed: A required class was missing while executing org.scalastyle:scalastyle-maven-plugin:0.8.0:check: scala/xml/Node 则把jvm-package目录下pom.xml文件下的插件注释既可以,: <!-- <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-checkstyle-plugin</artifactId> <version>2.17</version> <configuration> <configLocation>checkstyle.xml</configLocation> <failOnViolation>true</failOnViolation> </configuration> <executions> <execution> <id>checkstyle</id> <phase>validate</phase> <goals> <goal>check</goal> </goals> </execution> </executions> </plugin> -->

5.改变scala版本的编译

在这里改变既可以: <properties> <spark.version>2.0.1</spark.version> <flink.suffix>_2.11</flink.suffix> <scala.version>2.10.6</scala.version> <scala.binary.version>2.10</scala.binary.version> </properties> 再进行打包用maven   clean  install 打包,不出意外的话在xgboost4j下面会生成有两个jar包,一个是单纯的xgboost,一个是含有依赖的 含有依赖的两个jar包增加下面两个jar包

6.xgboost依赖的两个jar包

<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.4</version> </dependency> <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version>1.2</version> </dependency>

7.将编译好的jar包安装到maven仓库里面去

mvn install:install-file -Dfile=xgboost4j-0.7-jar-with-dependencies.jar -DgroupId=ml.dmlc -DartifactId=xgboost4j -Dversion=0.7 -Dpackaging=jar

8.在自己的maven项目中添加xgboot4j依赖,并做测试

<dependency> <groupId>ml.dmlc</groupId> <artifactId>xgboost4j</artifactId> <version>0.7</version> </dependency> package com.meituan.model.xgboost; import java.util.HashMap; import java.util.ArrayList; import java.util.List; import java.util.Arrays; import java.util.Map; import ml.dmlc.xgboost4j.java.Booster; import ml.dmlc.xgboost4j.java.DMatrix; import ml.dmlc.xgboost4j.java.XGBoost; import ml.dmlc.xgboost4j.java.XGBoostError; public class PredictFirstNtree { private static String path = "/Users/shuubiasahi/Documents/workspace/xgboost/demo/data/"; private static String trainString = "agaricus.txt.train"; private static String testString = "agaricus.txt.test"; public static void main(String[] args) throws XGBoostError { DMatrix trainMat = new DMatrix(path + trainString); DMatrix testMat = new DMatrix(path + testString); // specify parameters Map<String, Object> params = new HashMap<String, Object>(); params.put("eta", 1.0); params.put("max_depth", 2); params.put("silent", 1); params.put("objective", "binary:logistic"); // specify watchList HashMap<String, DMatrix> watches = new HashMap<String, DMatrix>(); watches.put("train", trainMat); watches.put("test", testMat); // train a booster int round = 3; Booster booster = XGBoost.train(trainMat, params, round, watches, null, null); // predict using first 2 tree float[][] leafindex = booster.predictLeaf(testMat, 2); for (float[] leafs : leafindex) { System.out.println(Arrays.toString(leafs)); } // predict all trees leafindex = booster.predictLeaf(testMat, 0); for (float[] leafs : leafindex) { System.out.println(Arrays.toString(leafs)); } } }

[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[5.0, 4.0, 5.0]

[3.0, 3.0, 3.0]

[0] test-error:0.042831 train-error:0.046522

[1] test-error:0.021726 train-error:0.022263

[2] test-error:0.006207 train-error:0.007063

如果看完还不会弄,可以私我。
转载请注明原文地址: https://www.6miu.com/read-50736.html

最新回复(0)