Hive 基本操作

xiaoxiao2021-02-28 81

目录 1.显示地展示当前使用的数据库 2.建表 3.导入本地文件 4.数据库多用户共享表 5.创建数据库 5.1仓库数据库判断数据库是否存在 5.2.显示现在有的数据库 5.3条件查询数据库 5.4创建指定存放文件位置数据库 5.5创建数据库时添加注释信息 5.6创建数据库添加扩展信息 5.7使用数据库 5.8删除数据库 5.9当数据库存在表时，先要删除表再能删除数据库 5.10修改数据库 6.1创建表 6.2创建表-复制表结构 6.3显示某个数据库中的表 6.4显示指定筛选条件表名 6.5显示表扩展信息 6.6指定显示某个字段的信息 7.1External Tables -- 外部表 7.2复制表结构仓库外部表 7.3Partitioned, Managed Tables --分区表 7.3.1建议安全措施 7.3.2查看现有分区 7.3.3查看分区详细分区键 7.3.3通过 DESC 显示分区键 7.3.4从文件读入分区表 7.4.1External Partitioned Tables 外部分区表 7.4.1.1.先创建外部表结构 7.4.2.为外部表增加指定分区 7.4.3.把数据表复制外部表目录结构中 7.4.3显示表分区信息 1.显示地展示当前使用的数据库 :hive> set hive.cli.print.current.db=true; 2.建表：create table t_order(id int,age int) row format delimited fields terminated by ','; 3.导入本地文件：load data local inpath '/home/hadoop/hivedata/data' into table t_order; 4.数据库多用户共享表（解决不同用户不能共享数据库）:mysqld_safe --skip-grant-tables 注意：Hive 不支持行级插入，更新，删除。也不支持事务 5.创建数据库 hive> CREATE DATABASE financials; 5.1仓库数据库判断数据库是否存在 hive> CREATE DATABASE IF NOT EXISTS financials; 5.2.显示现在有的数据库 hive> SHOW DATABASES; hive> CREATE DATABASE human_resources; hive> SHOW DATABASES; 5.3条件查询数据库 hive> SHOW DATABASES LIKE 'h.*'; 5.4创建指定存放文件位置数据库 hive> CREATE DATABASE financials > LOCATION '/my/preferred/directory'; 5.5创建数据库时添加注释信息 hive> CREATE DATABASE financials > COMMENT 'Holds all financial tables'; hive> DESCRIBE DATABASE financials; financials Holds all financial tables hdfs://master-server/user/hive/warehouse/financials.db 5.6创建数据库添加扩展信息 hive> CREATE DATABASE financials > WITH DBPROPERTIES ('creator' = 'Mark Moneybags', 'date' = '2012-01-02'); hive> DESCRIBE DATABASE financials; financials hdfs://master-server/user/hive/warehouse/financials.db hive> DESCRIBE DATABASE EXTENDED financials; financials hdfs://master-server/user/hive/warehouse/financials.db {date=2012-01-02, creator=Mark Moneybags); 5.7使用数据库 hive> USE financials; 5.8删除数据库 hive> DROP DATABASE IF EXISTS financials; 5.9当数据库存在表时，先要删除表再能删除数据库 hive> DROP DATABASE IF EXISTS financials CASCADE; 5.10修改数据库 hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'Joe Dba'); There is no way to delete or “unset” a DBPROPERTY 没有方法删除或重置 DBPROPERTY 6.1创建表 CREATE TABLE IF NOT EXISTS mydb.employees ( name STRING COMMENT 'Employee name', salary FLOAT COMMENT 'Employee salary', subordinates ARRAY<STRING> COMMENT 'Names of subordinates', deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address') COMMENT 'Description of the table' TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...) LOCATION '/user/hive/warehouse/mydb.db/employees';

6.2创建表-复制表结构 CREATE TABLE IF NOT EXISTS mydb.employees2 LIKE mydb.employees;

6.3显示某个数据库中的表 hive> USE mydb; hive> SHOW TABLES;

6.4显示指定筛选条件表名 hive> USE mydb; hive> SHOW TABLES 'empl.*'; employees

6.5显示表扩展信息 hive> DESCRIBE EXTENDED mydb.employees; name string Employee name salary float Employee salary subordinates array<string> Names of subordinates deductions map<string,float> Keys are deductions names, values are percentages address struct<street:string,city:string,state:string,zip:int> Home address Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me, ... location:hdfs://master-server/user/hive/warehouse/mydb.db/employees, parameters:{creator=me, created_at='2012-01-02 10:00:00', last_modified_user=me, last_modified_time=1337544510, comment:Description of the table, ...}, ...)

6.6指定显示某个字段的信息 hive> DESCRIBE mydb.employees.salary; salary float Employee salary

7.1External Tables -- 外部表外部表，删除表不删除数据 CREATE EXTERNAL TABLE IF NOT EXISTS stocks ( exchange STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/data/stocks';

7.2复制表结构仓库外部表 CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3 LIKE mydb.employees LOCATION '/path/to/data';

7.3Partitioned, Managed Tables --分区表 CREATE TABLE employees ( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> ) PARTITIONED BY (country STRING, state STRING); However, Hive will now create subdirectories reflecting the partitioning structure. For example: ... .../employees/country=CA/state=AB .../employees/country=CA/state=BC ... .../employees/country=US/state=AL .../employees/country=US/state=AK ...

7.3.1建议安全措施把HIVE 设置成“严格”模式,禁止分区表的查询没有一个WHERE子句 hive> set hive.mapred.mode=strict; hive> SELECT e.name, e.salary FROM employees e LIMIT 100; FAILED: Error in semantic analysis: No partition predicate found for Alias "e" Table "employees" hive> set hive.mapred.mode=nonstrict; hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

7.3.2查看现有分区 hive> SHOW PARTITIONS employees; ... Country=CA/state=AB country=CA/state=BC ... country=US/state=AL country=US/state=AK

7.3.3查看分区详细分区键 hive> SHOW PARTITIONS employees PARTITION(country='US'); country=US/state=AL country=US/state=AK ... hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK'); country=US/state=AK

7.3.3通过 DESC 显示分区键 hive> DESCRIBE EXTENDED employees; name string, salary float, ... address struct<...>, country string, state string Detailed Table Information... partitionKeys:[FieldSchema(name:country, type:string, comment:null), FieldSchema(name:state, type:string, comment:null)], ...

7.3.4从文件读入分区表 LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' INTO TABLE employees PARTITION (country = 'US', state = 'CA');

7.4.1External Partitioned Tables 外部分区表 1.先创建外部表结构 CREATE EXTERNAL TABLE IF NOT EXISTS log_messages ( hms INT, severity STRING, server STRING, process_id INT, message STRING) PARTITIONED BY (year INT, month INT, day INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; 7.4.2.为外部表增加指定分区 ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2) LOCATION 'hdfs://master_server/data/log_messages/2012/01/02'; 7.4.3.把数据表复制外部表目录结构中 Copy the data for the partition being moved to S3. For example, you can use the Hadoop distcp command: hadoop distcp /data/log_messages/2011/12/02 s3n://ourbucket/logs/2011/12/02 •Alter the table to point the partition to the S3 location: ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02'; •Remove the HDFS copy of the partition using the hadoop fs -rmr command: hadoop fs -rmr /data/log_messages/2011/01/02 7.4.3显示表分区信息 hive> SHOW PARTITIONS log_messages; ... year=2011/month=12/day=31 year=2012/month=1/day=1 year=2012/month=1/day=2 hive> DESCRIBE EXTENDED log_messages; ... message string, year int, month int, day int Detailed Table Information... partitionKeys:[FieldSchema(name:year, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:day, type:int, comment:null)], ... hive> DESCRIBE EXTENDED log_messages PARTITION (year=2012, month=1, day=2); ... location:s3n://ourbucket/logs/2011/01/02, ... 8.1Customizing Table Storage Formats -- 表存储格式(表操作) CREATE TABLE employees ( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' COLLECTION ITEMS TERMINATED BY '\002' MAP KEYS TERMINATED BY '\003' LINES TERMINATED BY '\n' STORED AS TEXTFILE; Dropping Tables -- 删除表 DROP TABLE IF EXISTS employees; For external tables, the metadata is deleted but the data is not. Alter Table --修改表结构 ALTER TABLE modifies table metadata only. The data for the table is untouched. It’s up to you to ensure that any modifications are consistent with the actual data. Renaming a Table -- 修改表名 ALTER TABLE log_messages RENAME TO logmsgs; Adding, Modifying, and Dropping a Table Partition -- 增加，修改，删除表分区 ALTER TABLE log_messages ADD IF NOT EXISTS PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01' PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02' PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03' ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02'; ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2); Changing Columns --修改列 ALTER TABLE log_messages CHANGE COLUMN hms hours_minutes_seconds INT COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity; Adding Columns --增加列 ALTER TABLE log_messages ADD COLUMNS ( app_name STRING COMMENT 'Application name', session_id LONG COMMENT 'The current session id'); Deleting or Replacing Columns --删除替换列 ALTER TABLE log_messages REPLACE COLUMNS ( hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp', severity STRING COMMENT 'The message severity' message STRING COMMENT 'The rest of the message'); This statement effectively renames the original hms column and removes the server and process_id columns from the original schema definition. As for all ALTER statements, only the table metadata is changed. Alter Table Properties --修改表属性 ALTER TABLE log_messages SET TBLPROPERTIES ( 'notes' = 'The process id is no longer captured; this column is always NULL'); Alter Storage Properties --修改存储属性 ALTER TABLE log_messages PARTITION(year = 2012, month = 1, day = 1) SET FILEFORMAT SEQUENCEFILE; You can specify a new SerDe along with SerDe properties or change the properties for the existing SerDe. The following example specifies that a table will use a Java class named com.example.JSONSerDe to process a file of JSON-encoded records ALTER TABLE table_using_JSON_storage SET SERDE 'com.example.JSONSerDe' WITH SERDEPROPERTIES ( 'prop1' = 'value1', 'prop2' = 'value2'); 来源： http://blog.csdn.net/iquicksandi/article/details/8522691

转载请注明原文地址: https://www.6miu.com/read-78001.html

技术

最新回复(0)