Centos 7.2 Hadoop2.7+Spark2.1分布式集群搭建

xiaoxiao2021-02-28  141

一、环境说明

1、虚拟机环境
JDK:jdk1.8.0_131Hadoop:2.7.3Scala:2.12.2Spark:2.1.0
2、三台服务器 (CentOS7.2)
192.168.1.225 (主)192.168.1.226 (从)192.168.1.227 (从)

二、服务器环境 (先主服务器)

1、设置Host映射文件
vim /etc/hosts 192.168.1.225 master 192.168.1.226 slave1 192.168.1.227 slave2
2、关闭防火墙
systemctl stop firewalld # 关闭 systemctl disable firewalld # 禁用开启启动
3、关闭SElinux
setenforce 0 vim /etc/selinux/config

将 SELINUX=enforcing 改为SELINUX=disabled

4、修改SSH配置文件

4.1 打开配置文件

vim /etc/ssh/sshd_config

4.2 将注释去掉

RSAAuthentication yes # 启用 RSA 认证 PubkeyAuthentication yes # 启用公钥私钥配对认证方式 AuthorizedKeysFile .ssh/authorized_keys # 公钥文件路径(和上面生成的文件同)

4.3 重启ssh服务

systemctl restart sshd

三、安装基础环境 (JAVA和SCALA)

1、Java1.8环境搭建

1.1 下载 jdk-8u131-linux-x64.rpm 安装:

rpm -ivh jdk-8u131-linux-x64.rpm

1.2 添加Java环境变量,在/etc/profile中添加:

export JAVA_HOME=/usr/java/jdk1.8.0_131/

1.3 保存后刷新配置

source /etc/profile
2、Scala2.12.2环境搭建

2.1 下载 scala-2.12.2.rpm 安装:

rpm -ivh scala-2.12.2.rpm

2.2 添加环境变量,在/etc/profile中添加:

export SCALA_HOME=/usr/share/scala

2.3 保存后刷新配置

source /etc/profile

四、配置集群环境

1、复制样板机 (真实环境略过)

1.1 复制样板机 1.2 设置网络

# 修改配置 vim /etc/sysconfig/network-scripts/ifcfg-enp0s3 # 修改下面IP 建议修改下 UUID IPADDR=192.168.1.225 # 重启网络 systemctl restart network
2、配置SSH无密码登录
# 在三个节点执行下面的命令: ssh-keygen -t rsa # 三个节点的公钥重新命名 cp id_rsa.pub authorized_keys_master cp id_rsa.pub authorized_keys_slave1 cp id_rsa.pub authorized_keys_slave2 # 把两个从节点(slave1、slave2)的公钥使用scp命令传送到master节点的/root/.ssh文件夹中 scp authorized_keys_slave1 root@master:/root/.ssh scp authorized_keys_slave2 root@master:/root/.ssh # 把三个节点的公钥信息保存到authorized_key文件中 cat authorized_keys_master >> authorized_keys cat authorized_keys_slave1 >> authorized_keys cat authorized_keys_slave2 >> authorized_keys # 把该文件分发到其他两个从节点上 scp authorized_keys root@slave1:/root/.ssh scp authorized_keys root@slave2:/root/.ssh

五、Hadoop2.7.3分布式搭建

master节点:

1、下载二进制包:
wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz
2、解压并移动至相应目录
tar -zxf hadoop-2.7.3.tar.gz mv hadoop-2.7.3 /opt
3、修改相应的配置文件:

3.1 修改 /etc/profile

vim /etc/profile # 增加如下 export HADOOP_HOME=/opt/hadoop-2.7.3/ export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH" export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop # 更新 source /etc/profile

3.2 修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,修改JAVA_HOME 如下:

export JAVA_HOME=/usr/java/jdk1.8.0_131/

3.3 修改$HADOOP_HOME/etc/hadoop/slaves,将原来的localhost删除,改成如下内容:

slave1 slave2

3.4 修改$HADOOP_HOME/etc/hadoop/core-site.xml

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.7.3/tmp</value> </property> </configuration>

3.5 修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml

<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/data</value> </property> </configuration>

3.6 修改$HADOOP_HOME/etc/hadoop/mapred-site.xml 复制template,生成xml:

cp mapred-site.xml.template mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:19888</value> </property> </configuration>

3.7 修改$HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> </configuration>
4、复制master节点的hadoop文件夹到slave1和slave2上:
scp -r /opt/hadoop-2.7.3 root@slave1:/opt scp -r /opt/hadoop-2.7.3 root@slave2:/opt
5、在slave1和slave2上分别修改/etc/profile,过程同master一样。
6、再启动之前我们需要 格式化一下namenode
hadoop namenode -format

启动

/opt/hadoop-2.7.3/sbin/start-all.sh

测试

# master显示 SecondaryNameNode、ResourceManager、NameNode # slave显示 NodeManager、DataNode jps

六、Spark2.1.0分布式环境搭建

master节点:

1、下载文件:
wget "http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz"
2、解压并移动至相应的文件夹
tar -zxf spark-2.1.0-bin-hadoop2.7.tgz mv spark-2.1.0-bin-hadoop2.7 /opt
3、修改相应的配置文件:

3.1 修改/etc/profie

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/ export PATH="$SPARK_HOME/bin:$PATH" # 更新 source /etc/profile

3.2 修改$SPARK_HOME/conf/spark-env.sh

cp spark-env.sh.template spark-env.sh #配置内容如下: export SCALA_HOME=/usr/share/scala export JAVA_HOME=/usr/java/jdk1.8.0_131/ export SPARK_MASTER_IP=master export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop

3.3 修改$SPARK_HOME/conf/slaves

cp slaves.template slaves master slave1 slave2
4、复制master节点的spark文件夹到slave1和slave2上:
scp -r spark-2.1.0-bin-hadoop2.7 root@slave1:/opt scp -r spark-2.1.0-bin-hadoop2.7 root@slave2:/opt
5、在slave1和slave2上分别修改/etc/profile,过程同master一样。
6、启动
/opt/spark-2.1.0-bin-hadoop2.7/sbin/start-all.sh # 启动 spark-shell # 退出 quit
转载请注明原文地址: https://www.6miu.com/read-18343.html

最新回复(0)