hive之UDF编程

xiaoxiao2021-02-28 69

一、UDF函数可以直接应用于select语句，对查询结构做格式化处理后，再输出内容。

二、编写UDF函数的时候需要注意一下几点：

a）自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。

b）需要实现evaluate函数，evaluate函数支持重载。

三、编写UDF函数代码

0.要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate 方法。

public class NationUDF extends UDF { public static Map<String,String> nationMap = new HashMap<String,String>(); static{ nationMap.put("China", "中国"); nationMap.put("Japan", "小日本"); nationMap.put("USA", "美帝"); } Text t = new Text(); //1000 sum(income) //返回值：中国 getNation(nation) public Text evaluate(Text nation){ String nation_e = nation.toString(); String name = nationMap.get(nation_e); if(name == null){ name = "火星人"; } t.set(name); return t; } }

0.1打jar包

右键-->export-->java/jar file-->next-->勾选-->finish;

注：打jar包时，注意jdk版本的问题，centOS的hadoop框架下jdk可以向下兼容，也就说，hadoop框架的jdk版本>=UDFjar的版本。

四、自定义函数调用过程： 1.添加jar包（在hive命令行里面执行） hive> add jar /root/NUDF.jar; 2.创建临时函数（所谓的temporary，就是函数在本会话（此次hive客户端）有效） hive> create temporary function getNation as 'com.heres.hive.NationUDF'; 3.调用 hive> select id, name, getNation(nation) from beauties; 4.利用自定义函数查询结果

hive> select id,name,size,getNation(nation) from beauties order by size desc;

Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1494132664539_0002, Tracking URL = http://heres04:8088/proxy/application_1494132664539_0002/ Kill Command = /heres/hadoop-2.2.0/bin/hadoop job -kill job_1494132664539_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2017-05-07 13:03:37,424 Stage-1 map = 0%, reduce = 0% 2017-05-07 13:03:51,342 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.57 sec 2017-05-07 13:04:00,996 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.41 sec MapReduce Total cumulative CPU time: 4 seconds 410 msec Ended Job = job_1494132664539_0002 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.41 sec HDFS Read: 328 HDFS Write: 160 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 410 msec OK 1 bgyjy 56.6565 小日本 4 bing 56.56 火星人 3 liu 45.0 火星人 3 ewrwe 43.9 小日本 1 glm 34.0 火星人 2 lina 30.9 火星人 2 jzmb 23.232 小日本 Time taken: 37.869 seconds, Fetched: 7 row(s)

5、将查询结果保存到HDFS中 create table result row format delimited fields terminated by '\t' as select id, getNation(nation) from beauties;

6、销毁临时函数：

hive>DROP TEMPORARY FUNCTION getNation;

转载请注明原文地址: https://www.6miu.com/read-78737.html

技术

最新回复(0)