hive 服务

xiaoxiao2021-02-28  19

一、hive概念

hive由facebook开源,用来解决海量结构化日志的数据统计.

基于hadoop分布式文件能系统之上的数据仓库。底层的存储架构是HDFS; 计算框架是mapreduce。hive将数据库和表以文件目录的形式存放HDFS或本地文件系统上;hive中表数据以文件的形式存放在文件系统中

hive架构 : https://cwiki.apache.org/confluence/display/Hive/Design#Design-HiveArchitecture


二、hive 的服务

hive的服务列表如下:

beeline cli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version

[yeemi@yeemi01 apache-hive-0.13.1-bin]$ bin/hive –help

Usage ./hive <parameters> --service serviceName <service parameters> Service List: beeline cli help hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version

1、hive metastore

简介:

hive元数据,默认存储在自带的derby数据库中,但由于derby数据库在同一目录下只能有一个数据库实例,导致访问hive数据库不方便,所以一般将hive元数据存储在其他关系型数据库中,如mysql.

存储形式:
嵌入式/本地derby: 主要用于单元测试, 一次只能启动有一个数据库实例,多客户端不能访问;本地数据库;元数据库和hive在一台机器远端数据库;可远程访问元数据库;

2、 hiveserver2

基于thrift协议;将hive变成一个服务对外开放,通过客户端去连接。

开启hiveserver2服务:

bin/hive –service hiveserver2 –hiveconf hive.server2.thrift.port=14000 (都是有两个横杠)bin/hiveserver2nohup bin/hiveserver2 & 后台执行进入beeline: bin/beeline

连接beeline:

!connect jdbc:hive2://yeemi01:14000

!connect jdbc:hive2://zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 – hiveserver2高可用时访问方式。

bin/beeline -u jdbc:hive2://yeemi01:10000 -n root -p wjm

输入用户名,密码(若部署或者输错,则进入默认用户,可查询,但是在插入执行mapreduce的时候会报权限错误)

注:hiveserver2的局限:

- hiveserver2 不太稳定,连接时间长以后会自动断开; 连接超时可修改参数: set hive.server2.long.polling.timeout=5000 把L去掉放入hive-site.xml中 - 任务很多时就不用hiveserver2了,效率会很低??? --》那应该怎么做? - 用户名,密码在某种情况下是摆设,不起作用??(匿名用户可以查看,但是不能执行mapreduce任务)

help 命令

[root@yeemi01 hive-0.13.1-cdh5.3.6]# bin/beeline -help Usage: java org.apache.hive.cli.beeline.BeeLine -u <database url> the JDBC URL to connect to -n <username> the username to connect as -p <password> the password to connect as -d <driver class> the driver class to use -i <init file> script file for initialization -e <query> query that should be executed -f <exec file> script file that should be executed --hiveconf property=value Use value for given property --hivevar name=value hive variable name and value This is Hive specific settings in which variables can be set at session level and referenced in Hive commands or queries. --color=[true/false] control whether color is used for display --showHeader=[true/false] show column names in query results --headerInterval=ROWS; the interval between which heades are displayed --fastConnect=[true/false] skip building table/column list for tab-completion --autoCommit=[true/false] enable/disable automatic transaction commit --verbose=[true/false] show verbose error messages and debug info --showWarnings=[true/false] display connection warnings --showNestedErrs=[true/false] display nested errors --numberFormat=[pattern] format numbers using DecimalFormat pattern --force=[true/false] continue running script even after errors --maxWidth=MAXWIDTH the maximum width of the terminal --maxColumnWidth=MAXCOLWIDTH the maximum width to use when displaying columns --silent=[true/false] be more silent --autosave=[true/false] automatically save preferences --outputformat=[table/vertical/csv2/tsv2/dsv/csv/tsv] format mode for result display Note that csv, and tsv are deprecated - use csv2, tsv2 instead --truncateTable=[true/false] truncate table column when it exceeds length --delimiterForDSV=DELIMITER specify the delimiter for delimiter-separated values output format (default: |) --isolation=LEVEL set the transaction isolation level --nullemptystring=[true/false] set to true to get historic behavior of printing null as empty string --help display this message Beeline version 0.13.1-cdh5.3.6 by Apache Hive

内部help

beeline> help !all Execute the specified SQL against all the current connections !autocommit Set autocommit mode on or off !batch Start or execute a batch of statements !brief Set verbose mode off !call Execute a callable statement !close Close the current connection to the database !closeall Close all current open connections !columns List all the columns for the specified table !commit Commit the current transaction (if autocommit is off) !connect Open a new connection to the database. !dbinfo Give metadata information about the database !describe Describe a table !dropall Drop all tables in the current database !exportedkeys List all the exported keys for the specified table !go Select the current connection !help Print a summary of command usage !history Display the command history !importedkeys List all the imported keys for the specified table !indexes List all the indexes for the specified table !isolation Set the transaction isolation for this connection !list List the current connections !manual Display the BeeLine manual !metadata Obtain metadata information !nativesql Show the native SQL for the specified statement !nullemptystring Set to true to get historic behavior of printing null as empty string. Default is false. !outputformat Set the output format for displaying results (table,vertical,csv2,dsv,tsv2,xmlattrs,xmlelements, and deprecated formats(csv, tsv)) !primarykeys List all the primary keys for the specified table !procedures List all the procedures !properties Connect to the database specified in the properties file(s) !quit Exits the program !reconnect Reconnect to the database !record Record all output to the specified file !rehash Fetch table and column names for command completion !rollback Roll back the current transaction (if autocommit is off) !run Run a script from the specified file !save Save the current variabes and aliases !scan Scan for installed JDBC drivers !script Start saving a script to a file !set Set a beeline variable !sql Execute a SQL command !tables List all the tables in the database !typeinfo Display the type map for the current connection !verbose Set verbose mode on Comments, bug reports, and patches go to ???

jdbc 连接入口

hiveserver2 jdbc 示例代码

代码中sql不能加分号;检测用户名不检测密码;

package class.hive.test; import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class HiveJdbcClient { private static String driverName = "org.apache.hive.jdbc.HiveDriver"; public static void main(String[] args) throws SQLException { try { Class.forName(driverName); } catch (ClassNotFoundException e) { System.exit(1); } //replace "hive" here with the name of the user the queries should run as Connection con = DriverManager.getConnection("jdbc:hive2://172.16.217.111:10000", "root", "wjm"); Statement stmt = con.createStatement(); String tableName = "default"; stmt.execute("show databases"); stmt.execute("create table " + tableName + " (key int, value string)"); // show tables String sql = "show tables '" + tableName + "'"; System.out.println("Running: " + sql); ResultSet res = stmt.executeQuery(sql); if (res.next()) { System.out.println(res.getString(1)); } // describe table sql = "describe " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1) + "\t" + res.getString(2)); } // load data into table // NOTE: filepath has to be local to the hive server // NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line String filepath = "/"; sql = "load data local inpath '" + filepath + "' into table " + tableName; System.out.println("Running: " + sql); stmt.execute(sql); // select * query sql = "select * from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2)); } // regular hive query sql = "select count(1) from " + tableName; System.out.println("Running: " + sql); res = stmt.executeQuery(sql); while (res.next()) { System.out.println(res.getString(1)); } } }
转载请注明原文地址: https://www.6miu.com/read-850274.html

最新回复(0)