Hadoop 2.8.0 + CentOS7.3搭建

xiaoxiao2021-02-28  96

Hadoop 2.8.0 + CentOS7.3 一、安装jdk1.8 tar zxvf jdk-8u65-linux-x64.tar.gz mv jdk-8u65-linux-x64  /usr/src/jdk 在/etc/profile中添加如下 JAVA_HOME=/usr/src/jdk/ PATH=$JAVA_HOME/bin:/usr/local/xtrabackup/bin:$PATH CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export JAVA_HOME export PATH 二、安装hadoop tar zxvf hadoop-2.8.0.tar.gz mv hadoop-2.8.0  /usr/src/hadoop 在/etc/profile中添加环境变量 export CLASSPATH HADOOP_LOG_DIR=/usr/src/hadoop HADOOP_PREFIX=/usr/src/hadoop export HADOOP_PREFIX export HADOOP_HOME=/usr/src/hadoop export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export HADOOP_MAPARED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export HADOOP_YARN_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native/:$LD_LIBRARY_PATH export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC" 三、服务器相关设置 1, /etc/hosts设置 [root@centos128 hadoop]# cat /etc/hosts 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.44.128  centos128 localhost 192.168.44.129  centos129 192.168.44.130  centos130   centos128 是master centos129,centos130 是slave 2. firewalld关闭 systemctl disable firewalld 3. ssh trusts设置 ssh-keygen -t rsa ssh-keygen -t dsa cd ~/.ssh ssh-copy-id -i  id_rsa.pub  centos128 ssh-copy-id -i  id_dsa.pub  centos128 ssh-copy-id -i  id_rsa.pub  centos129 ssh-copy-id -i  id_dsa.pub  centos129 ssh-copy-id -i  id_rsa.pub  centos130 ssh-copy-id -i  id_dsa.pub  centos130 在其它服务器做同样的设置 4. 存放路径的创建 mkdir  /data/hadoop/name -p mkdir  /data/hadoop/tmp -p mkdir  /Data1 -p mkdir  /Data2 -p 四、设置配置文件 主要几个配置文件 a,在hadoop-env.sh中将 export JAVA_HOME=${JAVA_HOME} 改成 export JAVA_HOME=/usr/src/jdk b, etc/hadoop/core-site.xml  配置NameNode URI etc/hadoop/hdfs-site.xml  配置NameNode ,配置DataNode, etc/hadoop/yarn-site.xml  配置ResourceManager  ,配置NodeManager ,配置History Server etc/hadoop/mapred-site.xml  配置MapReduce Applications,配置 MapReduce JobHistory Server etc/hadoop/slaves         添加slave的IP 1,etc/hadoop/core-site.xml 配置如下: <!-- Put site-specific property overrides in this file. -->  <configuration>     <property>         <name>fs.defaultFS</name>         <value>hdfs://centos128:9000</value>     </property>   <property>     <name>hadoop.tmp.dir</name>     <value>/data/hadoop/tmp</value>   </property>     <property>         <name>io.file.buffer.size</name>         <value>131072</value>     </property>   <property>     <name>fs.trash.interval</name>     <value>10080</value>   </property> </configuration> 其中hdfs://centos128:9000是 DataName uri地址 2, etc/hadoop/hdfs-site.xml  配置如下: <configuration>     <property>         <name>dfs.namenode.name.dir</name>         <value>/data/hadoop/name</value>     </property>     <property>         <name>dfs.blocksize</name>         <value>268435456</value>     </property>     <property>         <name>dfs.namenode.handler.count</name>         <value>100</value>     </property>    <!--     <property>         <name>dfs.replication</name>         <value>3</value>     </property>    -->     <property>         <name>dfs.datanode.data.dir</name>         <value>/Data1,/Data2</value>     </property> </configuration> dfs.namenode.name.dir  namenode物理路径 dfs.replication        默认为3个副本 dfs.datanode.data.dir  datanode放存物理路径 3, etc/hadoop/yarn-site.xml  配置如下: 含义参考:http://blog.csdn.net/u010719917/article/details/73917217 <!-- Site specific YARN configuration properties -->   <!--     ResourceManager    -->   <property>     <name>yarn.log-aggregation-enable</name>     <value>true</value>   </property>  <!--Configurations for ResourceManager -->   <property>     <name>yarn.resourcemanager.hostname</name>     <value>centos128</value>   </property>   <property>     <name>yarn.resourcemanager.address</name>     <value>${yarn.resourcemanager.hostname}:8032</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.address</name>     <value>${yarn.resourcemanager.hostname}:8030</value>   </property>   <property>     <name>yarn.resourcemanager.resource-tracker.address</name>     <value>${yarn.resourcemanager.hostname}:8031</value>   </property>   <property>     <name>yarn.resourcemanager.admin.address</name>     <value>${yarn.resourcemanager.hostname}:8033</value>   </property>   <property>     <name>yarn.resourcemanager.webapp.address</name>     <value>${yarn.resourcemanager.hostname}:8088</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.class</name>     <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>   </property>   <!--   <property>     <name>yarn.resourcemanager.resource-tracker.client.thread-count</name>     <value>50</value>   </property>   <property>     <name>yarn.resourcemanager.scheduler.client.thread-count</name>     <value>50</value>   </property>   -->   <property>     <name>yarn.scheduler.minimum-allocation-mb</name>     <value>0</value>   </property>   <property>     <name>yarn.scheduler.maximum-allocation-mb</name>     <value>512</value>   </property>   <!--   <property>     <name>yarn.scheduler.minimum-allocation-vcores</name>     <value>1</value>   </property>   <property>     <name>yarn.scheduler.maximum-allocation-vcores</name>     <value>2</value>   </property>   <property>     <name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>     <value>1000</value>   </property>   -->   <!--   nodemanager   -->   <property>     <name>yarn.nodemanager.resource.memory-mb</name>     <value>1024</value>   </property>   <property>     <name>yarn.nodemanager.vmem-pmem-ratio</name>     <value>2.1</value>   </property>   <property>     <name>yarn.nodemanager.local-dirs</name>     <value>${hadoop.tmp.dir}/nm-local-dir</value>   </property>   <property>     <name>yarn.nodemanager.log-dirs</name>     <value>${yarn.log.dir}/userlogs</value>   </property>    <property>       <name>yarn.nodemanager.log.retain-seconds</name>     <value>10800</value>   </property>    <property>     <name>yarn.nodemanager.remote-app-log-dir</name>     <value>/logs</value>   </property>   <property>     <name>yarn.nodemanager.remote-app-log-dir-suffix</name>     <value>logs</value>   </property>    <property>     <name>yarn.nodemanager.aux-services</name>     <value>mapreduce_shuffle</value>   </property>   <!--    History Serve    -->     <property>     <name>yarn.log-aggregation.retain-seconds</name>     <value>-1</value>   </property>   <property>     <name>yarn.log-aggregation.retain-check-interval-seconds</name>     <value>-1</value>   </property>     <!--   <property>     <name>yarn.nodemanager.resource.cpu-vcores</name>     <value>4</value>   </property>     --> </configuration> 4. etc/hadoop/mapred-site.xml 配置如下: 含义参考:http://blog.csdn.net/u010719917/article/details/73917217 <configuration>     <!--      MapReduce Applications      -->     <property>         <name>mapreduce.framework.name</name>         <value>yarn</value>     </property>     <property>         <name>mapreduce.map.memory.mb</name>         <value>1536</value>     </property>     <property>         <name>mapreduce.map.java.opts</name>         <value>-Xmx1024M</value>     </property>     <property>         <name>mapreduce.reduce.memory.mb</name>         <value>3072</value>     </property>     <property>         <name>mapreduce.reduce.java.opts</name>         <value>Xmx2560M</value>     </property>     <property>         <name>mapreduce.task.io.sort.mb</name>         <value>512</value>     </property>     <property>         <name>mapreduce.task.io.sort.factor</name>         <value>100</value>     </property>     <property>         <name>mapreduce.reduce.shuffle.parallelcopies</name>         <value>50</value>     </property>     <!--     MapReduce JobHistory Server     -->     <property>         <name>mapreduce.jobhistory.address</name>         <value>centos128:10020</value>     </property>     <property>         <name>mapreduce.jobhistory.webapp.address</name>         <value>centos128:19888</value>     </property>     <property>         <name>mapreduce.jobhistory.intermediate-done-dir</name>         <value>/mr-history/tmp</value>     </property>     <property>         <name>mapreduce.jobhistory.done-dir</name>         <value>/mr-history/done</value>     </property> </configuration> 5. etc/hadoop/slaves  如下 [root@centos128 hadoop]# cat slaves centos129 centos130 6,日志路径: [root@centos128 logs]# pwd /usr/src/hadoop/logs [root@centos128 logs]# ll total 456 -rw-r--r-- 1 root root 135437 Jul  1 14:06 hadoop-root-namenode-centos128.log -rw-r--r-- 1 root root   5069 Jul  1 13:19 hadoop-root-namenode-centos128.out -rw-r--r-- 1 root root  22419 Jul  1 12:54 hadoop-root-secondarynamenode-centos128.log -rw-r--r-- 1 root root    716 Jul  1 12:54 hadoop-root-secondarynamenode-centos128.out -rw-r--r-- 1 root root  34891 Jul  1 14:07 mapred-root-historyserver-centos128.log -rw-r--r-- 1 root root   1477 Jul  1 13:18 mapred-root-historyserver-centos128.out -rw-r--r-- 1 root root      0 Jul  1 12:53 SecurityAuth-root.audit -rw-r--r-- 1 root root  20165 Jul  1 13:06 yarn-root-proxyserver-centos128.log -rw-r--r-- 1 root root    702 Jul  1 13:06 yarn-root-proxyserver-centos128.out -rw-r--r-- 1 root root  87905 Jul  1 14:06 yarn-root-resourcemanager-centos128.log -rw-r--r-- 1 root root   1524 Jul  1 13:04 yarn-root-resourcemanager-centos128.out -rw-r--r-- 1 root root    702 Jul  1 13:00 yarn-root-resourcemanager-centos128.out.1 -rw-r--r-- 1 root root    702 Jul  1 12:58 yarn-root-resourcemanager-centos128.out.2 五 , 其它服务安装hadoop 将配置好的hadoop包,jdk,profile,host 复制到centos129,centos130 cd / tar cvf hd.tar.gz /usr/src/hadoop/  /usr/src/jdk/ /etc/profile /etc/hosts 六. hadoop脚本及页面 Hadoop Startup 要启动hdfs的HDFS,首先HDFS 格式化: [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name> 开启HDFS的 NameNode: [hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode 开启HDFS的 DataNode: [hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs: 如果etc/hadoop/slaves和ssh trusted都被配置,以上所有进程,可以用如下脚本开启 [hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh 开启YARN的ResourceManager: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager 开启YARN的NodeManager: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn: 直接开启yarn全部功能: [yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh 开启mapred的MapReduce JobHistory Server服务: [mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver Hadoop Shutdown 关闭hdfs的namenode: [hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode 关闭hdfs的DataNode: [hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs: [hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh 关闭yarn的ResourceManager: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager 关闭yarn的NodeManager: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn: [yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them: [yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver 关闭mapred的MapReduce JobHistory Serve: [mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver Web Interfaces Once the Hadoop cluster is up and running check the web-ui of the components as described below: Daemon                           Web Interface                Notes NameNode                         http://nn_host:port/          Default HTTP port is 50070. ResourceManager                 http://rm_host:port/          Default HTTP port is 8088. MapReduce JobHistory Server     http://jhs_host:port/        Default HTTP port is 19888.  
转载请注明原文地址: https://www.6miu.com/read-80094.html

最新回复(0)