hbase(二)

xiaoxiao2022-05-17 69

hbase与hive的整合

数据存储、查询数据分析整合的目的： hbase中表的数据在hive中能够查询到 hive中表的数据在hbase中能够查询到整合的步骤： 1、在hive中创建hbase能看到的表

create table if not exists hbase2hive( uid int, uname string, age int ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping"=":key,f1:name,f1:age" ) tblproperties("hbase.table.name"="hh1") ;

出现错误： FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V – 版本错误

解决方案：将hive-hbase-handler的源码包重新打包，然后，将重新打的包以及依赖包都上传到$HIVE_HOME/lib目录中，再重启hive

创建一个临时表并导入数据

加载数据：

insert into table hbase2hive select * from stu_score ;

2、hbase中已经存在表，并且存在数据

create external table if not exists hbase_user_info1( uid string, uname string, uage int ) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties( "hbase.columns.mapping"=":key,base_info:name,base_info:age" ) tblproperties("hbase.table.name"="ns1:t_user_info") ;

映射多列：

注意事项： 1、映射hbase中多列，要么写:key，要么不写，因为默认使用:key来匹配第一个字段 2、hbase中表存在的时候，在hive中要创建对应的表时需要加关键字external 3、如果删除hbase中的表时，在hive中不能查询出数据 4、hbase中的列和hive中的列的个数和类型最好要一致（hive和hbase的表中字段的陪陪关系不是按照字段名来匹配的，而是按照顺序来匹配） 5、hive和hbase、mysql等可以使用第三方的工具来相互整合数据，比如蓝灯、shell脚本

hbase的高级应用

1、协处理器（Coprocessor）反向索引的需求 2、二级索引

继承BaseRegionObserver类实现prePut或者postPut方法 create ‘t_guanzhu’,‘cf1’,‘cf2’ create ‘t_fensi’,‘cf1’

将jar包上传到hdfs之上 hdfs dfs -mkdir /hbaseObServer hdfs dfs -put gp1813Demo-1.0-SNAPSHOT.jar /hbaseObServer

将协处理器注册到表上 alter ‘t_guanzhu’,METHOD => ‘table_att’,‘coprocessor’=>‘hdfs://qianfeng/hbaseObServer/gp1813Demo-1.0-SNAPSHOT.jar|com.qfedu.bigdata.hbaseObServer.InverIndexCoprocessor|1001|’

hbase的优化

hbase需要注意的事项：属性设置： memstore 的刷新阀值： hbase.hregion.memstore.flush.size=134217728 128M region切分的阀值： hbase.hregion.max.filesize=10737418240 10G regionserver的操作线程数： hbase.regionserver.handler.count=30

hbase的优化

客户端的优化： 1、关闭自动刷新： htable.setAutoFlush(true/false) 2、尽量批量写入(put、delete) 3、谨慎关闭Hlog： ht.setDurability(Durability.SKIP_WAL); 4、尽量把数据放到缓存中 hc1.setInMemory(true); 5、尽量不要太多的列簇，最多两个。（因为hbase在刷新数据的时候会把相邻的列簇也刷新） 6、rowkey的长度尽量短。最大64K 7、尽量将该关闭的对象关闭。admin、table、resultScanner。

转载请注明原文地址: https://www.6miu.com/read-4884247.html

Java

最新回复(0)