Hive中的简单窗口函数应用

xiaoxiao2021-02-28  141

窗口函数:lead   lag   FIRST_VALUE 分析函数: RANK ROW_NUMBER  Hive中的简单窗口函数应用 TOPN row number 说明: row_number() over ([partition col1] [order by col2]) rank() over ([partition col1] [order by col2]) dense_rank() over ([partition col1] [order by col2]) 它们都是根据col1字段分组,然后对col2字段进行排序,对排序后的每行生成一个行号,这个行号从1开始递增 col1、col2都可以是多个字段,用‘,‘分隔   区别: 1)row_number:不管col2字段的值是否相等,行号一直递增,比如:有两条记录的值相等,但一个是第一,一个是第二 2)rank:上下两条记录的col2相等时,记录的行号是一样的,但下一个col2值的行号递增N(N是重复的次数),比如:有两条并列第一,下一个是第三,没有第二 3)dense_rank:上下两条记录的col2相等时,下一个col2值的行号递增1,比如:有两条并列第一,下一个是第二  select empname,empjob,salary,deptno,row_number() over(partition by deptno order by salary desc ) rank from emp; empname empjob salary deptno rank MILLER CLERK 1300.0 10 1 CLARK MANAGER 2450.0 10 2 KING PRESIDENT 5000.0 10 3 SMITH CLERK 800.0 20 1 ADAMS CLERK 1100.0 20 2 JONES MANAGER 2975.0 20 3 SCOTT ANALYST 3000.0 20 4 FORD ANALYST 3000.0 20 5 JAMES CLERK 950.0 30 1 MARTIN SALESMAN 1250.0 30 2 WARD SALESMAN 1250.0 30 3 TURNER SALESMAN 1500.0 30 4 ALLEN SALESMAN 1600.0 30 5 BLAKE MANAGER 2850.0 30 6 //测试原数据 Hive TopN a chinese 98 a english 90 a math 90 d chinese 88 c english 82 c math 98 b math 79 b chinese 79 b english 79 z english 90 z math 89 z chinese 80 e math 99 e english 87 d english 90 create table t(name string, sub string, score int) row format delimited fields terminated by '\t'; load data local inpath "/home/user01/grades.txt" into table t; //为每个学生的各门功课成绩排名 1、row_number select *,row_number() over (partition by name order by score desc) as rank from t; t.name t.sub t.score rank a chinese 98 1 a english 90 2 a math 90 3 b chinese 79 1 b english 79 2 b math 79 3 c math 98 1 c english 82 2 d english 90 1 d chinese 88 2 e math 99 1 e english 87 2 z english 90 1 z math 89 2 z chinese 80 3 2.rank  //排序字段相同的记录使用相同的排名,下一个从值的行号递增N(N是重复的次数),比如:有两条并列第一,下一个是第三,没有第二 select *,rank() over (partition by name order by score asc) as rank from t; t.name t.sub t.score rank a english 90 1 a math 90 1 a chinese 98 3 b chinese 79 1 b english 79 1 b math 79 1 c english 82 1 c math 98 2 d chinese 88 1 d english 90 2 e english 87 1 e math 99 2 z chinese 80 1 z math 89 2 z english 90 3 3、dense_rank //排序字段相同的记录使用相同的排名,下一个值的行号递增1,如下:a的english和math并列第一,下一个chinese是第三,没有第二,b的三门都一样 select *,dense_rank() over (partition by name order by score asc) as rank from t; t.name t.sub t.score rank a english 90 1 a math 90 1 a chinese 98 2 b chinese 79 1 b english 79 1 b math 79 1 c english 82 1 c math 98 2 d chinese 88 1 d english 90 2 e english 87 1 e math 99 2 z chinese 80 1 z math 89 2 z english 90 3 业务实例: 统计每个学科的前二名   select * from (select *, row_number() over(partition by name order by score desc) as rank from t )t where rank <=2; select *,row_number() over () as rank from t rank <=3; select area, barnd, yuan,  row_number() over (partition by area order by yuan desc) as rank  from order where rank <=3; 各地区热门商品统计 按地区 分组  再按各个商品的销量进行降序排名  北京  iphone7    70000   1 北京  xiaomi5    60000   2 北京  mate9      50000   3 北京   手机膜  40000   4 ... 上海  xiaomi5    70000   1 上海  iphone7    60000   2 上海  mate9      50000   3 上海   手机膜  40000   4 ..
转载请注明原文地址: https://www.6miu.com/read-21299.html

最新回复(0)