通过Flume拉取Kafka数据保存到HDFS

xiaoxiao2021-02-28  122

新建一个flume的.properties文件 vim flume-conf-kafka2hdfs.properties

# ------------------- 定义数据流---------------------- # source的名字 flume2HDFS_agent.sources = source_from_kafka # channels的名字,建议按照type来命名 flume2HDFS_agent.channels = mem_channel # sink的名字,建议按照目标来命名 flume2HDFS_agent.sinks = hdfs_sink #auto.commit.enable = true ## kerberos config ## #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosPrincipal = flume/datanode2.hdfs.alpha.com@OMGHADOOP.COM #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosKeytab = /root/apache-flume-1.6.0-bin/conf/flume.keytab #-------- kafkaSource相关配置----------------- # 定义消息源类型 # For each one of the sources, the type is defined flume2HDFS_agent.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource flume2HDFS_agent.sources.source_from_kafka.channels = mem_channel flume2HDFS_agent.sources.source_from_kafka.batchSize = 5000 # 定义kafka所在的地址 #flume2HDFS_agent.sources.source_from_kafka.zookeeperConnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka flume2HDFS_agent.sources.source_from_kafka.kafka.bootstrap.servers = 192.168.2.86:9092,192.168.2.87:9092 # 配置消费的kafka topic #flume2HDFS_agent.sources.source_from_kafka.topic = itil_topic_4097 flume2HDFS_agent.sources.source_from_kafka.kafka.topics = mtopic # 配置消费的kafka groupid #flume2HDFS_agent.sources.source_from_kafka.groupId = flume4097 flume2HDFS_agent.sources.source_from_kafka.kafka.consumer.group.id = flumetest #---------hdfsSink 相关配置------------------ # The channel can be defined as follows. flume2HDFS_agent.sinks.hdfs_sink.type = hdfs # 指定sink需要使用的channel的名字,注意这里是channel #Specify the channel the sink should use flume2HDFS_agent.sinks.hdfs_sink.channel = mem_channel #flume2HDFS_agent.sinks.hdfs_sink.filePrefix = %{host} flume2HDFS_agent.sinks.hdfs_sink.hdfs.path = hdfs://192.168.2.xx:8020/tmp/ds=%Y%m%d #File size to trigger roll, in bytes (0: never roll based on file size) flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollSize = 0 #Number of events written to file before it rolled (0 = never roll based on number of events) flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollCount = 0 flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollInterval = 3600 flume2HDFS_agent.sinks.hdfs_sink.hdfs.threadsPoolSize = 30 #flume2HDFS_agent.sinks.hdfs_sink.hdfs.codeC = gzip #flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType = CompressedStream flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType=DataStream flume2HDFS_agent.sinks.hdfs_sink.hdfs.writeFormat=Text #------- memoryChannel相关配置------------------------- # channel类型 # Each channel's type is defined. flume2HDFS_agent.channels.mem_channel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # channel存储的事件容量 # In this case, it specifies the capacity of the memory channel flume2HDFS_agent.channels.mem_channel.capacity = 100000 # 事务容量 flume2HDFS_agent.channels.mem_channel.transactionCapacity = 10000

修改对应的地址及主题等信息就哦了~

运行:

>flume-ng agent -n flume2HDFS_agent -f flume-conf-kafka2hdfs.properties

控制台打开一个producer发送消息试一下吧:

>kafka-console-producer --broker-list cdh5:9092 --topic mtopic

转载请注明原文地址: https://www.6miu.com/read-76604.html

最新回复(0)