@(SPARK)[spark]
spark按以下优先级加载配置文件: (1)用户代码中显式调用set()方法设置的选项 (2)通过spark-submit传递的参数 (3)配置文件中的值 (4)spark的默认值
以下会分别介绍各种方式。
val conf = new SparkConf() conf.set(“spark.app.name”, “ljh_test”) conf.set(“spark.master”,”yarn-client”) val sc = new SparkContext(conf)
bin/spark-submit –class com.lujinhong.MyTest –master yarn-client –name “ljh_test” myTest.jar
主要是指conf/spark-defaults.conf,如:
# For monitoring spark.eventLog.enabled true spark.eventLog.dir hdfs://mycluster/tmp/spark-events spark.history.fs.logDirectory hdfs://mycluster/tmp/spark-events spark.yarn.historyServer.address 10.1.1.100:18080 spark.ui.showConsoleProgress true spark.history.kerberos.enabled true spark.history.kerberos.principal hadoop/sparkhistoryserver@LUJINHONG.COM spark.history.kerberos.keytab /home/hadoop/conf/spark/spark.keytab # For executor spark.cores.max 300 spark.driver.memory 2g spark.executor.memory 6g spark.executor.cores 6 spark.driver.extraJavaOptions -XX:PermSize=512M -XX:MaxPermSize=2048M文件中是以空格分开的键值对,默认加载conf/spark-defaults.conf,也可以在spark-submit中通过–properties-file指定路径。
主要用于指定一些环境变量,尤其是指定YARN相关的目录,如
#!/usr/bin/env bash export SPARK_HOME=/home/hadoop/spark export SPARK_LOG_DIR=/home/hadoop/logs export SPARK_PID_DIR=/home/hadoop/pids export YARN_CONF_DIR=/home/hadoop/conf export HADOOP_CONF_DIR=/home/hadoop/conf # for exporting for enviroment, such as lib/native export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/home/hadoop/hadoop/lib/native export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/hadoop/hadoop/lib/native其它的配置文件还有log4j.properties, metircs.properties等。
