一、安装jdk
jdk版本最好是1.7以上,设置好环境变量,安装过程,略。
二、安装Maven
我选择的Maven版本是3.3.3,安装过程,略。
编辑Maven安装目录conf/settings.xml文件,
? 1 2 <!-- 修改Maven 库存放目录--> <localRepository>D:\maven-repository\repository</localRepository>三、安装Idea
安装过程,略。
四、创建Spark项目
1、新建一个Spark项目,
2、选择Maven,从模板创建项目,
3、填写项目GroupId等,
4、选择本地安装的Maven和Maven配置文件。
5、next
6、创建完毕,查看新项目结构:
7、自动更新Maven pom文件
8、编译项目
如果出现这种错误,这个错误是由于Junit版本造成的,可以删掉Test,和pom.xml文件中Junit的相关依赖,
即删掉这两个Scala类:
和pom.xml文件中的Junit依赖:
? 1 2 3 4 5 <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version> 4.12 </version> </dependency>9、刷新Maven依赖
10、引入Jdk和Scala开发库
11、在pom.xml加入相关的依赖包,包括Hadoop、Spark等
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 <dependency> <groupId>commons-logging</groupId> <artifactId>commons-logging</artifactId> <version> 1.1 . 1 </version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version> 3.1 </version> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version> 1.2 . 9 </version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version> 4.12 </version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version> 2.7 . 1 </version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version> 2.7 . 1 </version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version> 2.7 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency>然后刷新maven的依赖,
12、新建一个Scala Object。
测试代码为:
? 1 2 3 4 5 def main(args: Array[String]) { println( "Hello World!" ) val sparkConf = new SparkConf().setMaster( "local" ).setAppName( "test" ) val sparkContext = new SparkContext(sparkConf) }执行,
如果报了以下错误,
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 java.lang.SecurityException: class "javax.servlet.FilterRegistration" 's signer information does not match signer information of other classes in the same package at java.lang.ClassLoader.checkCerts(ClassLoader.java: 952 ) at java.lang.ClassLoader.preDefineClass(ClassLoader.java: 666 ) at java.lang.ClassLoader.defineClass(ClassLoader.java: 794 ) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java: 142 ) at java.net.URLClassLoader.defineClass(URLClassLoader.java: 449 ) at java.net.URLClassLoader.access$ 100 (URLClassLoader.java: 71 ) at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 361 ) at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 355 ) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java: 354 ) at java.lang.ClassLoader.loadClass(ClassLoader.java: 425 ) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 308 ) at java.lang.ClassLoader.loadClass(ClassLoader.java: 358 ) at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 136 ) at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 129 ) at org.spark-project.jetty.servlet.ServletContextHandler.<init>(ServletContextHandler.java: 98 ) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala: 110 ) at org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala: 101 ) at org.apache.spark.ui.WebUI.attachPage(WebUI.scala: 78 ) at org.apache.spark.ui.WebUI$$anonfun$attachTab$ 1 .apply(WebUI.scala: 62 ) at org.apache.spark.ui.WebUI$$anonfun$attachTab$ 1 .apply(WebUI.scala: 62 ) at scala.collection.mutable.ResizableArray$ class .foreach(ResizableArray.scala: 59 ) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala: 47 ) at org.apache.spark.ui.WebUI.attachTab(WebUI.scala: 62 ) at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala: 61 ) at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala: 74 ) at org.apache.spark.ui.SparkUI$.create(SparkUI.scala: 190 ) at org.apache.spark.ui.SparkUI$.createLiveUI(SparkUI.scala: 141 ) at org.apache.spark.SparkContext.<init>(SparkContext.scala: 466 ) at com.test.Test$.main(Test.scala: 13 ) at com.test.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 606 ) at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 )可以把servlet-api 2.5 jar删除即可:
最好的办法是删除pom.xml中相关的依赖,即
? 1 2 3 4 5 <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version> 2.7 . 1 </version> </dependency>最后的pom.xml文件的依赖是
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version> 2.7 . 1 </version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version> 2.7 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2. 10 </artifactId> <version> 1.5 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2. 10 </artifactId> <version> 1.5 . 2 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-mllib_2. 10 </artifactId> <version> 1.5 . 2 </version> </dependency> <dependency> <groupId>com.databricks</groupId> <artifactId>spark-avro_2. 10 </artifactId> <version> 2.0 . 1 </version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2. 10 </artifactId> <version> 1.5 . 2 </version> </dependency> </dependencies>
如果是报了这个错误,也没有什么问题,程序依旧可以执行,
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java: 356 ) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java: 371 ) at org.apache.hadoop.util.Shell.<clinit>(Shell.java: 364 ) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java: 80 ) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java: 611 ) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java: 272 ) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java: 260 ) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java: 790 ) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java: 760 ) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java: 633 ) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 ) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 ) at scala.Option.getOrElse(Option.scala: 120 ) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala: 2084 ) at org.apache.spark.SparkContext.<init>(SparkContext.scala: 311 ) at com.test.Test$.main(Test.scala: 13 ) at com.test.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 606 ) at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 )最后看到的正常输出:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Hello World! Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16 / 09 / 19 11 : 21 : 29 INFO SparkContext: Running Spark version 1.5 . 1 16 / 09 / 19 11 : 21 : 29 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java: 356 ) at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java: 371 ) at org.apache.hadoop.util.Shell.<clinit>(Shell.java: 364 ) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java: 80 ) at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java: 611 ) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java: 272 ) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java: 260 ) at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java: 790 ) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java: 760 ) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java: 633 ) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 ) at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$ 1 .apply(Utils.scala: 2084 ) at scala.Option.getOrElse(Option.scala: 120 ) at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala: 2084 ) at org.apache.spark.SparkContext.<init>(SparkContext.scala: 311 ) at com.test.Test$.main(Test.scala: 13 ) at com.test.Test.main(Test.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 57 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 606 ) at com.intellij.rt.execution.application.AppMain.main(AppMain.java: 144 ) 16 / 09 / 19 11 : 21 : 29 WARN NativeCodeLoader: Unable to load native -hadoop library for your platform... using builtin-java classes where applicable 16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: Changing view acls to: pc 16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: Changing modify acls to: pc 16 / 09 / 19 11 : 21 : 30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(pc); users with modify permissions: Set(pc) 16 / 09 / 19 11 : 21 : 30 INFO Slf4jLogger: Slf4jLogger started 16 / 09 / 19 11 : 21 : 31 INFO Remoting: Starting remoting 16 / 09 / 19 11 : 21 : 31 INFO Remoting: Remoting started; listening on addresses :[akka.tcp: //sparkDriver@192.168.51.143:52500] 16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'sparkDriver' on port 52500 . 16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering MapOutputTracker 16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering BlockManagerMaster 16 / 09 / 19 11 : 21 : 31 INFO DiskBlockManager: Created local directory at C:\Users\pc\AppData\Local\Temp\blockmgr-f9ea7f8c-68f9-4f9b-a31e-b87ec2e702a4 16 / 09 / 19 11 : 21 : 31 INFO MemoryStore: MemoryStore started with capacity 966.9 MB 16 / 09 / 19 11 : 21 : 31 INFO HttpFileServer: HTTP File server directory is C:\Users\pc\AppData\Local\Temp\spark-64cccfb4-46c8- 4266 -92c1-14cfc6aa2cb3\httpd-5993f955-0d92- 4233 -b366-c9a94f7122bc 16 / 09 / 19 11 : 21 : 31 INFO HttpServer: Starting HTTP Server 16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'HTTP file server' on port 52501 . 16 / 09 / 19 11 : 21 : 31 INFO SparkEnv: Registering OutputCommitCoordinator 16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'SparkUI' on port 4040 . 16 / 09 / 19 11 : 21 : 31 INFO SparkUI: Started SparkUI at http: //192.168.51.143:4040 16 / 09 / 19 11 : 21 : 31 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 16 / 09 / 19 11 : 21 : 31 INFO Executor: Starting executor ID driver on host localhost 16 / 09 / 19 11 : 21 : 31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52520 . 16 / 09 / 19 11 : 21 : 31 INFO NettyBlockTransferService: Server created on 52520 16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMaster: Trying to register BlockManager 16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMasterEndpoint: Registering block manager localhost: 52520 with 966.9 MB RAM, BlockManagerId(driver, localhost, 52520 ) 16 / 09 / 19 11 : 21 : 31 INFO BlockManagerMaster: Registered BlockManager 16 / 09 / 19 11 : 21 : 31 INFO SparkContext: Invoking stop() from shutdown hook 16 / 09 / 19 11 : 21 : 32 INFO SparkUI: Stopped Spark web UI at http: //192.168.51.143:4040 16 / 09 / 19 11 : 21 : 32 INFO DAGScheduler: Stopping DAGScheduler 16 / 09 / 19 11 : 21 : 32 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16 / 09 / 19 11 : 21 : 32 INFO MemoryStore: MemoryStore cleared 16 / 09 / 19 11 : 21 : 32 INFO BlockManager: BlockManager stopped 16 / 09 / 19 11 : 21 : 32 INFO BlockManagerMaster: BlockManagerMaster stopped 16 / 09 / 19 11 : 21 : 32 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16 / 09 / 19 11 : 21 : 32 INFO SparkContext: Successfully stopped SparkContext 16 / 09 / 19 11 : 21 : 32 INFO ShutdownHookManager: Shutdown hook called 16 / 09 / 19 11 : 21 : 32 INFO ShutdownHookManager: Deleting directory C:\Users\pc\AppData\Local\Temp\spark-64cccfb4-46c8- 4266 -92c1-14cfc6aa2cb3 Process finished with exit code 0至此,开发环境搭建完毕。
五、打jar包
1、新建一个Scala Object
代码是:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 package com.test import org.apache.spark.{SparkConf, SparkContext} /** * Created by pc on 2016/9/20. */ object WorldCount { def main(args: Array[String]) { val dataFile = args( 0 ) val output = args( 1 ) val sparkConf = new SparkConf().setAppName( "WorldCount" ) val sparkContext = new SparkContext(sparkConf) val lines = sparkContext.textFile(dataFile) val counts = lines.flatMap(_.split( "," )).map(s => (s, 1 )).reduceByKey((a,b) => a+b) counts.saveAsTextFile(output) sparkContext.stop() } }
2、 File -》Project Structure
3、点击ok
可以设置jar包输出目录:
4、build Artifact
5、运行:
把测试文件放到HDFS的/test/ 目录下,提交:
? 1 spark-submit -- class com.test.WorldCount --master spark: //192.168.18.151:7077 sparktest.jar /test/data.txt /test/test-016、如果出现以下错误
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java: 240 ) at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java: 193 ) at java.util.jar.JarVerifier.processEntry(JarVerifier.java: 305 ) at java.util.jar.JarVerifier.update(JarVerifier.java: 216 ) at java.util.jar.JarFile.initializeVerifier(JarFile.java: 345 ) at java.util.jar.JarFile.getInputStream(JarFile.java: 412 ) at sun.misc.JarIndex.getJarIndex(JarIndex.java: 137 ) at sun.misc.URLClassPath$JarLoader$ 1 .run(URLClassPath.java: 674 ) at sun.misc.URLClassPath$JarLoader$ 1 .run(URLClassPath.java: 666 ) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java: 665 ) at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java: 638 ) at sun.misc.URLClassPath$ 3 .run(URLClassPath.java: 366 ) at sun.misc.URLClassPath$ 3 .run(URLClassPath.java: 356 ) at java.security.AccessController.doPrivileged(Native Method) at sun.misc.URLClassPath.getLoader(URLClassPath.java: 355 ) at sun.misc.URLClassPath.getLoader(URLClassPath.java: 332 ) at sun.misc.URLClassPath.getResource(URLClassPath.java: 198 ) at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 358 ) at java.net.URLClassLoader$ 1 .run(URLClassLoader.java: 355 ) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java: 354 ) at java.lang.ClassLoader.loadClass(ClassLoader.java: 425 ) at java.lang.ClassLoader.loadClass(ClassLoader.java: 358 ) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java: 270 ) at org.apache.spark.util.Utils$.classForName(Utils.scala: 173 ) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala: 641 ) at org.apache.spark.deploy.SparkSubmit$.doRunMain$ 1 (SparkSubmit.scala: 180 ) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala: 205 ) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala: 120 ) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)就使用WinRAR打开jar包, 删除META-INF目录下的除了mainfest.mf,.rsa及maven目录以外的其他所有文件