Hadoop集群搭建
准备工作
ssh免登陆防火墙关闭分别创建hadoop、hdfs、yarn、mapred四个用户,并将hdfs、yarn、mapred三个用户添加到hadoop用户组中
集群规划
三个节点分别为node2、node3、node4;节点角色分配如下, HDFS node2:NAMENODE(由hdfs用户启动) node3:SECONDARY-NAMENODE(由hdfs用户启动)、DATANODE(由hdfs用户启动) node4:DATANODE(由hdfs用户启动) YARN node2:ResourceManager(由yarn用户启动)、JobHistoryServer(由mapred用户启动) node3:NodeManager(由yarn用户启动) node4:NodeManager(由yarn用户启动)
集群安装与配置
将hadoop安装包解压到/home/software目录下,并将安装包分发到各个节点中相同的位置,配置core-site.xml、hdfs-site.xml、yarn-site.xml,并将配置后的文件发送到各个节点/home/software/hadoop-2.6.4/etc/hadoop/目录下,每个配置文件的配置信息如下:
core-site.xml
<configuration>
<property>
<name>fs.defaultFS
</name>
<value>hdfs://node2:8020
</value>
</property>
<property>
<name>io.file.buffer.size
</name>
<value>4096
</value>
</property>
<property>
<name>hadoop.tmp.dir
</name>
<value>/home/software/hadoop-2.6.4/data
</value>
</property>
<configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication
</name>
<value>2
</value>
</property>
<property>
<name>dfs.namenode.name.dir
</name>
<value>file:///home/software/hadoop-2.6.4/data/dfs/name
</value>
</property>
<property>
<name>dfs.datanode.data.dir
</name>
<value>file:///home/software/hadoop-2.6.4/data/dfs/data
</value>
</property>
<property>
<name>dfs.namenode.secondary.http-adress
</name>
<value>node3:50090
</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir
</name>
<value>file:///home/software/hadoop-2.6.4/data/namesecondary
</value>
</property>
<property>
<name>dfs.nameservices
</name>
<value>node2
</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.acl.enable
</name>
<value>false
</value>
</property>
<property>
<name>yarn.nodemanager.aux-services
</name>
<value>mapreduce_shuffle
</value>
</property>
<property>
<name>yarn.resourcemanager.address
</name>
<value>node2:8032
</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address
</name>
<value>node2:8030
</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address
</name>
<value>node2:8031
</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address
</name>
<value>node2:8033
</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address
</name>
<value>node2:8088
</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class
</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb
</name>
<value>1900
</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores
</name>
<value>2
</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs
</name>
<value>/home/software/hadoop-2.6.4/logs/userlogs
</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name
</name>
<value>yarn
</value>
</property>
<property>
<name>mapreduce.jobhistory.address
</name>
<value>node2:10020
</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address
</name>
<value>node2:19888
</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir
</name>
<value>/tmp/mr-history/tmp
</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir
</name>
<value>/tmp/mr-history/done
</value>
</property>
<property>
<name>mapred.local.dir
</name>
<value>/home/software/hadoop-2.6.4/logs
</value>
</property>
</configuration>
启动hadoop集群
1. 启动HDFS
在node2节点上启动namenode服务
$ cd /home/software/hadoop-
2.6.
4/
$ ./bin/hadoop-daemon.sh start namenode
在node3节点上启动secondarynamenode、datanode服务
$ ./bin/hadoop-daemon.sh start secondarynamenode
$ ./bin/hadoop-daemon.sh start datanode
在node4节点上启动datanode服务
$ ./bin/hadoop-daemon.sh start datanode
2. 启动YARN
在node2节点上执行下面命令执行ResourceManager、NodeManager、JobHistoryServer服务
$ ./sbin/start-yarn.sh
$ sudo su mapred
$ ./sbin/mr-jobhistory-daemon.sh start historyserver