一、hive sink概述
hive sink与hdfs sink 想对比,hive sink可以近实时的把数据采集到hive表中,hdfs sink要构建hive外部表去关联hdfs路径,并且实时性没辣么高。
二、注意事项
1、Hive表必须设置bucket并且 stored as orc
2、flume配置的hive列名必须都是小写,即fieldnames的配置都必须是小写
3、要手动构建分区,即autoCreatePartitions = false
三、Configure hive sink
```
a1.sinks.k2.type = hive
a1.sinks.k2.channel = c2
#hive元存储的url
a1.sinks.k2.hive.metastore = thrift://192.168.3.150:9083
#hive表库名
a1.sinks.k2.hive.database = test
#hive表表名
a1.sinks.k2.hive.table = ods_table
#hive表分区,逗号分隔,%Y代表2018,&y代表18
a1.sinks.k2.hive.partition = %Y-%m-%d
#此处自动创建分区必须关闭,否则会报错。使用手动构建分区
a1.sinks.k2.autoCreatePartitions = false
#使用本地时间(而不是事件头的时间戳)
a1.sinks.k2.useLocalTimeStamp = false
#a1.sinks.k2.round = true
#a1.sinks.k2.roundValue = 1
#a1.sinks.k2.roundUnit = minute
a1.sinks.k2.serializer = DELIMITED
#切记切记,一定要记得转义
a1.sinks.k2.serializer.delimiter = "\\001"
#a1.sinks.k2.serializer.serdeSeparator = "\\001"
#在Flume配置的Hive 列名必须都为小写字母。Hive表必须设置bucket并且 stored as orc。
a1.sinks.k2.serializer.fieldnames = dstype,id,type,lastuploadtime
```
四、hive
create table test.ods_table
(
dsType string ,
id string ,
type string ,
lastUploadTime string
)
partitioned by (dt string)
clustered by (id) into 2 buckets
stored as orc
TBLPROPERTIES ('transactional'='true');
alter table test.ods_table add if not exists partition ( dt='2018-05-18');