Flume hive sink采坑记录

xiaoxiao2021-02-28  72

一、hive sink概述

hive sink与hdfs sink 想对比,hive sink可以近实时的把数据采集到hive表中,hdfs sink要构建hive外部表去关联hdfs路径,并且实时性没辣么高。

二、注意事项

1、Hive表必须设置bucket并且 stored as orc

2、flume配置的hive列名必须都是小写,即fieldnames的配置都必须是小写

3、要手动构建分区,即autoCreatePartitions = false

三、Configure hive sink

``` a1.sinks.k2.type = hive a1.sinks.k2.channel = c2 #hive元存储的url a1.sinks.k2.hive.metastore = thrift://192.168.3.150:9083 #hive表库名 a1.sinks.k2.hive.database = test #hive表表名 a1.sinks.k2.hive.table = ods_table #hive表分区,逗号分隔,%Y代表2018,&y代表18 a1.sinks.k2.hive.partition = %Y-%m-%d #此处自动创建分区必须关闭,否则会报错。使用手动构建分区 a1.sinks.k2.autoCreatePartitions = false #使用本地时间(而不是事件头的时间戳) a1.sinks.k2.useLocalTimeStamp = false #a1.sinks.k2.round = true #a1.sinks.k2.roundValue = 1 #a1.sinks.k2.roundUnit = minute a1.sinks.k2.serializer = DELIMITED #切记切记,一定要记得转义 a1.sinks.k2.serializer.delimiter = "\\001" #a1.sinks.k2.serializer.serdeSeparator = "\\001" #在Flume配置的Hive 列名必须都为小写字母。Hive表必须设置bucket并且 stored as orc。 a1.sinks.k2.serializer.fieldnames = dstype,id,type,lastuploadtime

```

四、hive

create table test.ods_table ( dsType string , id string , type string , lastUploadTime string  ) partitioned by (dt string) clustered by (id) into 2 buckets stored as orc

TBLPROPERTIES ('transactional'='true');

alter table test.ods_table add if not exists partition ( dt='2018-05-18');

转载请注明原文地址: https://www.6miu.com/read-2619842.html

最新回复(0)