1.关闭防火墙、配置主机明和ip 、 ssh无密码登录 hadoop、jdk的下载、安装以及环境变量的配置请参考:http://blog.csdn.net/qq_38799155/article/details/75949250
2.伪分布hadoop的配置: 2.1core-site.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/home/hadoop/hadoop-2.7.3/tmp</value> </property> </configuration>2.2 hdfs-ssite.xml
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/home/hadoop/hadoop-2.7.3/tmp/dfs/name</value> </property> </configuration>2.3. mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>2.4 yarn.site.xml
<?xml version="1.0"?> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>2.5 slaves
localhost看到如下图证明成功:
2.mr-jobhistory-daemon.sh start historyserver看到如下图证明成功:
3.start-dfs.sh看到如下图证明成功:
4.start-yarn.sh看到如下图证明成功:
5.在web浏览器上输入自己主机的ip+:50070端口号,即可看到如下图所示的页面查看节点状况和job状况 :
一个测试例子wordcount 计算输入文本中词语数量的程序。WordCount在Hadoop主目录下的java程序包hadoop-0.20.2-examples.jar 中,执行步骤如下: 在/usr/local/hadoop/hadoop-0.20.2/bin/目录下进行如下操作: 执行 hadoop fs -ls命令,查看当前hdfs分布式文件系统的 文件目录结构,刚执行会说no such dictionary, 你要先建一个文件夹,用命令 haoop fs -mkdir testdir ,然后再执行hadoop fs -ls,就会展示/user/root/testdir 当前用户是root,所以hdfs的根目录就是 /user/root
hadoop fs -mkdir okdir(新建目录名称,可任意命名) 离开hodoop的安全模式 bin/hadoop dfsadmin -safemode leave 在这个目录里放入文件:hadoop fs -put /usr/test_in/*.txt okdir(把本地/usr/test_in目录里的所有txt文件copy到 hdfs分布式文件系统的 /user/root/inputdir 目录里面,因为当前目录就是root 所以 直接写input 就代表/user/root/inputdir) 在/usr/local/hadoop/hadoop-0.20.2下执行: [root@master hadoop-0.20.2]# hadoop jar hadoop-0.20.2-examples.jar wordcount okdir output (提交作业,此处需注意okdir与output是一组任务,下次再执行wordcount程序,还要新建目录,不能跟okdir与output重名)
11/05/28 22:02:34 INFO input.FileInputFormat: Total input paths to process : 0 11/05/28 22:02:34 INFO mapred.JobClient: Running job: job_201105282107_0005 11/05/28 22:02:35 INFO mapred.JobClient: map 0% reduce 0% 11/05/28 22:02:46 INFO mapred.JobClient: map 0% reduce 100% 11/05/28 22:02:48 INFO mapred.JobClient: Job complete: job_201105282107_0005 11/05/28 22:02:48 INFO mapred.JobClient: Counters: 8 11/05/28 22:02:48 INFO mapred.JobClient: Job Counters 11/05/28 22:02:48 INFO mapred.JobClient: Launched reduce tasks=1 11/05/28 22:02:48 INFO mapred.JobClient: Map-Reduce Framework 11/05/28 22:02:48 INFO mapred.JobClient: Reduce input groups=0 11/05/28 22:02:48 INFO mapred.JobClient: Combine output records=0 11/05/28 22:02:48 INFO mapred.JobClient: Reduce shuffle bytes=0 11/05/28 22:02:48 INFO mapred.JobClient: Reduce output records=0 11/05/28 22:02:48 INFO mapred.JobClient: Spilled Records=0 11/05/28 22:02:48 INFO mapred.JobClient: Combine input records=0 11/05/28 22:02:48 INFO mapred.JobClient: Reduce input records=0执行完毕后,在hadoop下输入
hadoop fs -ls output显示如下:
Found 2 items drwxr-xr-x - root supergroup 0 2011-05-08 05:20 /user/root/output/_logs -rw-r--r-- 1 root supergroup 1688 2011-05-08 05:21 /user/root/output/part-r-00000可以查看运行结果 :
#bin/hadoop dfs -cat hellooutput/part-r-00000显示如下:
a 1 are 1 day 1 fine 1 fridey 1 is 3 name 1 not 1 today 4 you 1 your 1也可进入web界面刷新查看running job及completed job的显示。