Hive的数据导入和导出方式

xiaoxiao2021-02-28  43

HIve导入数据和导出数据的方式

load方式,本地 load data local inpath 'local_path' into table tb_name; 从本地复制了文件到表的路径下 应用场景:大部分的使用,文件几乎都是默认现在本地的 load方式,HDFS load data inpath 'hdfs_path' into table tb_name; 将文件移动到了表的路径下 应用场景:更适合大数据量的存储 load方式,overwrite load data inpath 'hdfs_path' overwrite into table tb_name; 应用场景:适合一些重复写入的表(临时表),作为一个过渡使用 子查询方式,as 应用场景:对于数据查询结果的保存 insert方式 传统关系型数据库中,insert是插入一个值 在hive中insert into table后面还是跟一个语句(select语句) insert into table select sql; 举例: create table emp_insert like emp; insert into table emp_insert select * from emp;

hive数据的导出

insert方式 格式:insert overwrite [local] directory 'path' select sql; 数据导出到本地 insert overwrite local directory '/opt/datas/emp_in01' select * from emp; 输出的目标可以提前存在,底层实现的时候,先删除再重新创建 指定分隔符 insert overwrite local directory '/opt/datas/emp_in01' row format delimited fields terminated by '\t' select * from emp; HDFS 导入到HDFS上 insert overwrite directory '/emp_insert' select * from emp; 注意:上一级的父目录必须存在 HDFS SHELL命令 -get 直接通过hdfs的-get下载命令下载到本地 bin/hdfs dfs -get hdfs_path local_path 在Linux的命令行使用hive的-e -f参数,将输出重定向保存到本地文件

-e就是hive后面直接使用SQL -f就是把SQL写到文件里面,执行文件 最后把执行结果重定向到文件中去,可参考.

[hadoop@hadoop apache-hive-0.13.1-bin]$ bin/hive -e 'select * from emp' > /opt/moduels/emp_01/ef.txt sqoop方式 hive支持export和import **export** export table tb_name to 'hdfs_path' import export table tb_name from 'hdfs_path' Hive常用SQL(过滤 聚合)

十二、hive的常用HQL语句 1、过滤条件 where limit distinct between and null is not null select * from emp where sal > 3000; select * from emp limit 1; select distinct deptno from emp; select * from emp where sal between 2000 and 3000; select ename from emp where comm is null; select ename from emp where comm is not null; 2、聚合函数 count sum avg max min group by having select count(1) from emp; select count(*) from emp; -》运行效率较低 select avg(sal) avg_sal from emp; select deptno,avg(sal) from emp group by deptno; select deptno,avg(sal) avg_sal from emp group by deptno having avg_sal > 2000; 3、join 等值join 左join left 右join right 全join full A表: ID NAME 1 张三 2 李四 3 王五 5 赵六 B表: ID phone 1 1111 2 2222 3 3333 4 4444 select e.empno,e.ename,d.deptno,e.sal from emp e join dept d on e.deptno=d.deptno; select e.empno,e.ename,d.deptno,e.sal from emp e left join dept d on e.deptno=d.deptno; select e.empno,e.ename,d.deptno,e.sal from emp e right join dept d on e.deptno=d.deptno; select e.empno,e.ename,d.deptno,e.sal from emp e full join dept d on e.deptno=d.deptno; 十三、hive与MR的常用参数设置 设置每个reduce处理的数据量 set hive.exec.reducers.bytes.per.reducer=<number> <property> <name>hive.exec.reducers.bytes.per.reducer</name> <value>1000000000</value> <description>size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.</description> </property> 设置最大能够运行的reduce个数 set hive.exec.reducers.max=<number> <property> <name>hive.exec.reducers.max</name> <value>999</value> <description>max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.</description> </property> 实际reduce的个数 set mapreduce.job.reduces=<number> <property> <name>mapreduce.job.reduces</name> <value>1</value> <description>The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapreduce.jobtracker.address is "local". </description> </property>
转载请注明原文地址: https://www.6miu.com/read-2628713.html

最新回复(0)