窗口函数:lead lag FIRST_VALUE
分析函数: RANK ROW_NUMBER
Hive中的简单窗口函数应用
TOPN
row number
说明:
row_number() over ([partition col1] [order by col2])
rank() over ([partition col1] [order by col2])
dense_rank() over ([partition col1] [order by col2])
它们都是根据col1字段分组,然后对col2字段进行排序,对排序后的每行生成一个行号,这个行号从1开始递增
col1、col2都可以是多个字段,用‘,‘分隔
区别:
1)row_number:不管col2字段的值是否相等,行号一直递增,比如:有两条记录的值相等,但一个是第一,一个是第二
2)rank:上下两条记录的col2相等时,记录的行号是一样的,但下一个col2值的行号递增N(N是重复的次数),比如:有两条并列第一,下一个是第三,没有第二
3)dense_rank:上下两条记录的col2相等时,下一个col2值的行号递增1,比如:有两条并列第一,下一个是第二
select empname,empjob,salary,deptno,row_number() over(partition by deptno order by salary desc ) rank from emp;
empname
empjob
salary
deptno
rank
MILLER
CLERK
1300.0
10
1
CLARK
MANAGER
2450.0
10
2
KING
PRESIDENT
5000.0
10
3
SMITH
CLERK
800.0
20
1
ADAMS
CLERK
1100.0
20
2
JONES
MANAGER
2975.0
20
3
SCOTT
ANALYST
3000.0
20
4
FORD
ANALYST
3000.0
20
5
JAMES
CLERK
950.0
30
1
MARTIN
SALESMAN
1250.0
30
2
WARD
SALESMAN
1250.0
30
3
TURNER
SALESMAN
1500.0
30
4
ALLEN
SALESMAN
1600.0
30
5
BLAKE
MANAGER
2850.0
30
6
//测试原数据
Hive TopN
a
chinese
98
a
english
90
a
math
90
d
chinese
88
c
english
82
c
math
98
b
math
79
b
chinese
79
b
english
79
z
english
90
z
math
89
z
chinese
80
e
math
99
e
english
87
d
english
90
create table t(name string, sub string, score int) row format delimited fields terminated by '\t';
load data local inpath "/home/user01/grades.txt" into table t;
//为每个学生的各门功课成绩排名
1、row_number
select *,row_number() over (partition by name order by score desc) as rank from t;
t.name
t.sub
t.score
rank
a
chinese
98
1
a
english
90
2
a
math
90
3
b
chinese
79
1
b
english
79
2
b
math
79
3
c
math
98
1
c
english
82
2
d
english
90
1
d
chinese
88
2
e
math
99
1
e
english
87
2
z
english
90
1
z
math
89
2
z
chinese
80
3
2.rank //排序字段相同的记录使用相同的排名,下一个从值的行号递增N(N是重复的次数),比如:有两条并列第一,下一个是第三,没有第二
select *,rank() over (partition by name order by score asc) as rank from t;
t.name
t.sub
t.score
rank
a
english
90
1
a
math
90
1
a
chinese
98
3
b
chinese
79
1
b
english
79
1
b
math
79
1
c
english
82
1
c
math
98
2
d
chinese
88
1
d
english
90
2
e
english
87
1
e
math
99
2
z
chinese
80
1
z
math
89
2
z
english
90
3
3、dense_rank //排序字段相同的记录使用相同的排名,下一个值的行号递增1,如下:a的english和math并列第一,下一个chinese是第三,没有第二,b的三门都一样
select *,dense_rank() over (partition by name order by score asc) as rank from t;
t.name
t.sub
t.score
rank
a
english
90
1
a
math
90
1
a
chinese
98
2
b
chinese
79
1
b
english
79
1
b
math
79
1
c
english
82
1
c
math
98
2
d
chinese
88
1
d
english
90
2
e
english
87
1
e
math
99
2
z
chinese
80
1
z
math
89
2
z
english
90
3
业务实例:
统计每个学科的前二名
select * from (select *, row_number() over(partition by name order by score desc) as rank from t )t where rank <=2;
select *,row_number() over () as rank from t rank <=3;
select area, barnd, yuan, row_number() over (partition by area order by yuan desc) as rank from order where rank <=3;
各地区热门商品统计
按地区 分组 再按各个商品的销量进行降序排名
北京 iphone7 70000 1
北京 xiaomi5 60000 2
北京 mate9 50000 3
北京 手机膜
40000 4
...
上海 xiaomi5 70000 1
上海 iphone7 60000 2
上海 mate9 50000 3
上海 手机膜
40000 4
..
转载请注明原文地址: https://www.6miu.com/read-21299.html