使用MR读取HBases数据进行计算,然后输出到HDFS,在输出到HDFS时遇到了路径问题,让我纠结了好久,今天终于理解解决了,记录一下,希望对遇到同样问题的人有所帮助。
原始代码如下,出现了下面的异常,开始我百思不解,HDFS的路径怎么会和window本地路径有冲突呢?怎么会读取的是本地的路径?最后从网上查找资料和HBase源码发现HBSAE的TableMapReduceUtil.initTableMapperJob方法默认的路径是本地,所以只要把它改一下就可以了,解决方法在最下面,只需要将原来的true改为false即可。
public class Origin_job { public static void main(String[] args) throws ClassNotFoundException, InterruptedException, IOException { long starttime = System.currentTimeMillis(); String tablename = "test"; Path outpath = new Path("/user/sky/output/"); Configuration conf = new Configuration(); conf.set("hbase.zookeeper.quorum","node1"); conf.set("hbase.zookeeper.property.clientPort", "2181"); conf.set("dfs.permissions.enabled", "false"); conf.set("fs.defaultFS", "hdfs://node1:8020"); conf.set("yarn.resourcemanager.hostname", "node1"); FileSystem fs =FileSystem.get(conf); Job job =Job.getInstance(conf);//实例化一个Job job.setJobName("ReadHbase");//Job任务名称 job.setJarByClass(Origin_job.class); //Job入口类 Scan scan = new Scan(); // scan.setCaching(500); // scan.setCacheBlocks(false); TableMapReduceUtil.initTableMapperJob(tablename, scan, Origin_Mapper.class,Text.class,Text.class,job); // ImmutableBytesWritable.class, Put.class, job, false); // job.setReducerClass(Origin_Reducer.class);//Jobִ执行的Reducer程序 job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); if(fs.exists(outpath)){ fs.delete(outpath, true); } FileOutputFormat.setOutputPath(job, outpath); boolean f= job.waitForCompletion(true);//判断任务是否执行成功!ֵ if(f){ System.out.println("job任务执行成功!"); }else { System.out.println("job任务执行失败!!"); } System.out.println(System.currentTimeMillis()-starttime+"毫秒!"); 异常记录如下:2017-06-08 11:17:12,640 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-06-08 11:17:14,475 INFO [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1019)) - session.id is deprecated. Instead, use dfs.metrics.session-id 2017-06-08 11:17:14,475 INFO [main] jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId= 2017-06-08 11:17:14,760 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 2017-06-08 11:17:14,853 WARN [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2017-06-08 11:17:14,869 INFO [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(441)) - Cleaning up the staging area file:/tmp/hadoop-shuke/mapred/staging/root778107143/.staging/job_local778107143_0001 Exception in thread "main" java.lang.IllegalArgumentException: Pathname /F:/HBaselib/metrics-core-2.2.0.jar from hdfs://node1:8020/F:/HBaselib/metrics-core-2.2.0.jar is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1068) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) at Origin_MR.Origin_job.main(Origin_job.java:58
解决方法如下:添加个false就可以了。
TableMapReduceUtil.initTableMapperJob(tablename, scan, Origin_Mapper.class,Text.class,Text.class,job,false);
参考资料:http://www.it610.com/article/3388630.htm
