spark-使用cloudera manager部署的spark测试运行mllib的例子

xiaoxiao2025-04-20  15

1.测试cdh集群中spark是否正常运行

[root@cdh01 ~]# spark-submit --master local --class org.apache.spark.examples.SparkPi /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar 10 18/10/29 14:39:08 INFO spark.SparkContext: Running Spark version 1.6.0 18/10/29 14:39:09 INFO spark.SecurityManager: Changing view acls to: root 18/10/29 14:39:09 INFO spark.SecurityManager: Changing modify acls to: root 18/10/29 14:39:09 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 18/10/29 14:39:09 INFO util.Utils: Successfully started service 'sparkDriver' on port 55692. 18/10/29 14:39:09 INFO slf4j.Slf4jLogger: Slf4jLogger started 18/10/29 14:39:09 INFO Remoting: Starting remoting 18/10/29 14:39:10 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.50.202:43516] 18/10/29 14:39:10 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.50.202:43516] 18/10/29 14:39:10 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 43516. 18/10/29 14:39:10 INFO spark.SparkEnv: Registering MapOutputTracker 18/10/29 14:39:10 INFO spark.SparkEnv: Registering BlockManagerMaster 18/10/29 14:39:10 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-2bf97eb7-1a7e-4df7-b221-4e603dc3a55f 18/10/29 14:39:10 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB 18/10/29 14:39:10 INFO spark.SparkEnv: Registering OutputCommitCoordinator 18/10/29 14:39:10 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 18/10/29 14:39:10 INFO ui.SparkUI: Started SparkUI at http://192.168.50.202:4040 18/10/29 14:39:10 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar at spark://192.168.50.202:55692/jars/spark-examples.jar with timestamp 1540795150401 18/10/29 14:39:10 INFO executor.Executor: Starting executor ID driver on host localhost 18/10/29 14:39:10 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53969. 18/10/29 14:39:10 INFO netty.NettyBlockTransferService: Server created on 53969 18/10/29 14:39:10 INFO storage.BlockManager: external shuffle service port = 7337 18/10/29 14:39:10 INFO storage.BlockManagerMaster: Trying to register BlockManager 18/10/29 14:39:10 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:53969 with 530.0 MB RAM, BlockManagerId(driver, localhost, 53969) 18/10/29 14:39:10 INFO storage.BlockManagerMaster: Registered BlockManager 18/10/29 14:39:11 INFO scheduler.EventLoggingListener: Logging events to hdfs://cdh01:8020/user/spark/applicationHistory/local-1540795150435 18/10/29 14:39:11 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener 18/10/29 14:39:11 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:36 18/10/29 14:39:11 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:36) with 10 output partitions 18/10/29 14:39:11 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:36) 18/10/29 14:39:11 INFO scheduler.DAGScheduler: Parents of final stage: List() 18/10/29 14:39:11 INFO scheduler.DAGScheduler: Missing parents: List() 18/10/29 14:39:11 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing parents 18/10/29 14:39:12 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 530.0 MB) 18/10/29 14:39:12 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1202.0 B, free 530.0 MB) 18/10/29 14:39:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:53969 (size: 1202.0 B, free: 530.0 MB) 18/10/29 14:39:12 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1004 18/10/29 14:39:12 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)) 18/10/29 14:39:12 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 10 tasks 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 2036 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0) 18/10/29 14:39:12 INFO executor.Executor: Fetching spark://192.168.50.202:55692/jars/spark-examples.jar with timestamp 1540795150401 18/10/29 14:39:12 INFO spark.ExecutorAllocationManager: New executor driver has registered (new total is 1) 18/10/29 14:39:12 INFO util.Utils: Fetching spark://192.168.50.202:55692/jars/spark-examples.jar to /tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3/userFiles-89bc4061-62e5-41b0-b1c2-cecbc4d3af73/fetchFileTemp4804387182541284155.tmp 18/10/29 14:39:12 INFO executor.Executor: Adding file:/tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3/userFiles-89bc4061-62e5-41b0-b1c2-cecbc4d3af73/spark-examples.jar to class loader 18/10/29 14:39:12 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 342 ms on localhost (executor driver) (1/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 51 ms on localhost (executor driver) (2/10) 18/10/29 14:39:12 INFO executor.Executor: Running task 2.0 in stage 0.0 (TID 2) 18/10/29 14:39:12 INFO executor.Executor: Finished task 2.0 in stage 0.0 (TID 2). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 39 ms on localhost (executor driver) (3/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 3.0 in stage 0.0 (TID 3). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 4.0 in stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 4.0 in stage 0.0 (TID 4) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 3.0 in stage 0.0 (TID 3) in 42 ms on localhost (executor driver) (4/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 4.0 in stage 0.0 (TID 4). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 5.0 in stage 0.0 (TID 5) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 4.0 in stage 0.0 (TID 4) in 37 ms on localhost (executor driver) (5/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 5.0 in stage 0.0 (TID 5). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 6.0 in stage 0.0 (TID 6, localhost, executor driver, partition 6, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 71 ms on localhost (executor driver) (6/10) 18/10/29 14:39:12 INFO executor.Executor: Running task 6.0 in stage 0.0 (TID 6) 18/10/29 14:39:12 INFO executor.Executor: Finished task 6.0 in stage 0.0 (TID 6). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 7.0 in stage 0.0 (TID 7, localhost, executor driver, partition 7, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 7.0 in stage 0.0 (TID 7) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 6.0 in stage 0.0 (TID 6) in 32 ms on localhost (executor driver) (7/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 7.0 in stage 0.0 (TID 7). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8, localhost, executor driver, partition 8, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 8.0 in stage 0.0 (TID 8) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 28 ms on localhost (executor driver) (8/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 8.0 in stage 0.0 (TID 8). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9, localhost, executor driver, partition 9, PROCESS_LOCAL, 2038 bytes) 18/10/29 14:39:12 INFO executor.Executor: Running task 9.0 in stage 0.0 (TID 9) 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 8.0 in stage 0.0 (TID 8) in 27 ms on localhost (executor driver) (9/10) 18/10/29 14:39:12 INFO executor.Executor: Finished task 9.0 in stage 0.0 (TID 9). 877 bytes result sent to driver 18/10/29 14:39:12 INFO scheduler.TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 24 ms on localhost (executor driver) (10/10) 18/10/29 14:39:12 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 0.628 s 18/10/29 14:39:12 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 1.046294 s 18/10/29 14:39:12 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool Pi is roughly 3.141903141903142 18/10/29 14:39:13 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.50.202:4040 18/10/29 14:39:13 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/10/29 14:39:13 INFO storage.MemoryStore: MemoryStore cleared 18/10/29 14:39:13 INFO storage.BlockManager: BlockManager stopped 18/10/29 14:39:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 18/10/29 14:39:13 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 18/10/29 14:39:13 INFO spark.SparkContext: Successfully stopped SparkContext 18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 18/10/29 14:39:13 INFO util.ShutdownHookManager: Shutdown hook called 18/10/29 14:39:13 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-e7873ccb-d141-4347-abcd-1b263d364be3 18/10/29 14:39:13 INFO Remoting: Remoting shut down 18/10/29 14:39:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.

运行正常。

2.下载spark-mllib的测试数据

wget --no-check-certificate https://raw.githubusercontent.com/apache/spark/branch-1.5/data/mllib/sample_movielens_data.txt

会有报错:

[root@cdh01 ~]# wget --no-check-certificate \ > https://raw.githubusercontent.com/apache/spark/branch-1.5/data/mllib/sample_movielens_data.txt -bash: wget: 未找到命令

解决办法:

1.yum -y install wget 由于我的cdh集群的yum源被设置过,无法进行yum下载。此步骤执行不成功 2.下载wget的rpm包,手动安装 下载地址http://rpmfind.net/linux/rpm2html/search.php?query=wget(x86-64) 寻找与自己系统匹配的版本,下载上传虚拟机,执行rpm安装过程 [root@cdh01 ~]# rpm -ivh /opt/lixiang/wget-1.14-15.el7_4.1.x86_64.rpm 准备中... ################################# [100%] 正在升级/安装... 1:wget-1.14-15.el7_4.1 ################################# [100%]

再次执行下载mllib例子的命令,下载完成。将下载好的数据上传至hdfs

[root@cdh01 ~]# hdfs dfs -copyFromLocal sample_movielens_data.txt /user/hdfs

3.运行Spark MLlib MovieLens示例应用程序,该应用程序根据电影评论计算推荐值:

[root@cdh01 ~]# spark-submit --master local --class org.apache.spark.examples.mllib.MovieLensALS /opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar --rank 5 --numIterations 5 --lambda 1.0 --kryo /user/hdfs/sample_movielens_data.txt 18/10/29 14:16:54 INFO spark.SparkContext: Running Spark version 1.6.0 18/10/29 14:16:54 INFO spark.SecurityManager: Changing view acls to: root 18/10/29 14:16:54 INFO spark.SecurityManager: Changing modify acls to: root 18/10/29 14:16:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 18/10/29 14:16:54 INFO util.Utils: Successfully started service 'sparkDriver' on port 48962. 18/10/29 14:16:55 INFO slf4j.Slf4jLogger: Slf4jLogger started 18/10/29 14:16:55 INFO Remoting: Starting remoting 18/10/29 14:16:55 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.50.202:60843] 18/10/29 14:16:55 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@192.168.50.202:60843] 18/10/29 14:16:55 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 60843. 18/10/29 14:16:55 INFO spark.SparkEnv: Registering MapOutputTracker 18/10/29 14:16:55 INFO spark.SparkEnv: Registering BlockManagerMaster 18/10/29 14:16:55 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-b631979d-d1d3-4b0e-a52c-79f23ae27859 18/10/29 14:16:55 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB 18/10/29 14:16:55 INFO spark.SparkEnv: Registering OutputCommitCoordinator 18/10/29 14:16:55 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 18/10/29 14:16:55 INFO ui.SparkUI: Started SparkUI at http://192.168.50.202:4040 18/10/29 14:16:55 INFO spark.SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.11.1-1.cdh5.11.1.p0.4/lib/spark/lib/spark-examples.jar at spark://192.168.50.202:48962/jars/spark-examples.jar with timestamp 1540793815778 18/10/29 14:16:55 INFO executor.Executor: Starting executor ID driver on host localhost 18/10/29 14:16:55 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42182. 18/10/29 14:16:55 INFO netty.NettyBlockTransferService: Server created on 42182 18/10/29 14:16:55 INFO storage.BlockManager: external shuffle service port = 7337 18/10/29 14:16:55 INFO storage.BlockManagerMaster: Trying to register BlockManager 18/10/29 14:16:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:42182 with 530.0 MB RAM, BlockManagerId(driver, localhost, 42182) 18/10/29 14:16:55 INFO storage.BlockManagerMaster: Registered BlockManager 18/10/29 14:16:57 INFO scheduler.EventLoggingListener: Logging events to hdfs://cdh01:8020/user/spark/applicationHistory/local-1540793815861 18/10/29 14:16:57 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener Got 1501 ratings from 30 users on 100 movies. Training: 1184, test: 317. 18/10/29 14:17:00 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS 18/10/29 14:17:00 WARN netlib.BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS 18/10/29 14:17:00 WARN netlib.LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK 18/10/29 14:17:00 WARN netlib.LAPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK Test RMSE = 1.424178449372927.

 

转载请注明原文地址: https://www.6miu.com/read-5028682.html

最新回复(0)