Hive内嵌字符处理函数：字符函数concat

xiaoxiao2021-02-28 41

1.Hive内嵌字符串处理函数1

Return Type

Name(Signature)

Description

int

ascii(string str)

Returns the numeric value of the first character of str.

返回str中首个ASCII字符串的整数值

string

base64(binary bin)

unbase64(string str)

Converts the argument from binary to a base 64 string (as of Hive 0.12.0)..

将二进制bin转换成64位的字符串

Converts the argument from a base 64 string to BINARY. (As of Hive 0.12.0.).

将64位的字符串转换二进制值

string

concat(string|binary A, string|binary B...)

Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings..

对二进制字节码或字符串按次序进行拼接

尖叫提示：

常用，字符串拼接可以有多个参数，同时拼接多个字符串，但无法指定拼接连接符，功能类似于“+”。

hive> select concat('a','bc','+','de') as a, 、

concat('1','2') from a ;

结果： abc+de ,12

array

<struct<string,double>>

context_ngrams(array<array<string>>, array<string>, int K, int pf)

Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See StatisticsAndDataMining for more information..

与ngram类似，但context_ngram()允许你预算指定上下文(数组)来去查找子序列，具体看StatisticsAndDataMining(这里的解释更易懂)

string

concat_ws(string SEP, string A, string B...)，这个也是聚合函数UDAF

Like concat() above, but with custom separator SEP..

与concat()类似，但使用指定的分隔符喜进行分隔

尖叫提示：

同样可以进行字符串拼接，但是可以在第一个参数指定拼接连接符，而concat没有办法连接符，功能是concat的补充，如果指定第一个参数为‘’空字符串，功能和concat一样。

hive>select concat_ws('','a','bc','+','de') a,

concat_ws(',','a','b')

j结果： abc+de a,b

string

concat_ws(string SEP, array<string>)

遍历数组

Like concat_ws() above, but taking an array of strings. (as of Hive 0.9.0).

拼接Array中的元素并用指定分隔符进行分隔

尖叫提示：

其实就是concat_ws可以拼接遍历数组元素成字符串，这个concat干不了。这个很常用。尤其配合collect_set等使用。

hive>select concat_ws('@',company_used_name) from a

比如表中字段company_used_name为数组，内容如下：

["铁岭市宏基砖业有限公司","铁岭市华隆装饰有限责任公司","铁岭宏太粮油有限公司"]

结果如下：

铁岭市宏基砖业有限公司@铁岭市华隆装饰有限责任公司@铁岭宏太粮油有限公司

string

decode(binary bin, string charset)

Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive 0.12.0.).

使用指定的字符集charset将二进制值bin解码成字符串，支持的字符集有：'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'，如果任意输入参数为NULL都将返回NULL

binary

encode(string src, string charset)

Encodes the first argument into a BINARY using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive 0.12.0.).

使用指定的字符集charset将字符串编码成二进制值，支持的字符集有：'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'，如果任一输入参数为NULL都将返回NULL

int

find_in_set(string str, string strList)

其实用的也不多

Returns the first occurance of str in strList where strList is a comma-delimited string. Returns null if either argument is null. Returns 0 if the first argument contains any commas. For example, find_in_set('ab', 'abc,b,ab,c,def') returns 3..

返回以逗号分隔(必须是逗号)的字符串中str出现的位置，如果参数str为包含逗号或查找失败将返回0，如果任一参数为NULL将返回NULL回.

hive>select find_in_set('abc','dabc,ab,c,abcd,abc,babcd'), find_in_set('abc,','dabc,ab,c,abcd,abc,babcd'), find_in_set(null,'dabc,ab,c,abcd,abc,babcd'), find_in_set( 'de','dabc,ab,c,abcd,abc,babcd') 结果： 5 0 null 0

string

format_number(number x, int d)

Formats the number X to a format like '#,###,###.##', rounded to D decimal places, and returns the result as a string. If D is 0, the result has no decimal point or fractional part. (As of Hive 0.10.0; bug with float types fixed in Hive 0.14.0, decimal type support added in Hive 0.14.0).

将数值X转换成"#,###,###.##"格式字符串，并保留d位小数，如果d为0，将进行四舍五入且不保留小数

int

length(string A)

Returns the length of the string..

返回字符串的长度返回值: int，注意如果字符串是null值，则返回null值，如果为‘’，则返回0 举例： hive> select length('abcedfg') from lxw_dual; 7

int

locate(string substr, string str[, int pos])

[]这里的参数是可选择的，可选可不选

注意第一个参数才是substr

Returns the position of the first occurrence of substr in str after position pos..

查找字符串str中的pos位置后字符串substr第一次出现的位置

尖叫提示：如果locate，只有两个字符串参数，表示返回子字符串在后面父字符串中的首次出现位置，没有则返回0，若有参数为null，则返回null。如果有第三个int参数，则表示子串从父串的第几个字符开始查找。同样返回首个出现位置。如果没有第三个参数，功能跟instr一样。

hive>

select locate('abc','dfdabc,ab,abcdfd@abc'),--4 locate('abc','dfdabc,ab,abcdfd@abc',4),--4 locate('abc','dfdabc,ab,abcdfd@abc',5),--11 locate('abcee','dfdabc,ab,abcdfd@abc'),--0 locate(null,'dfdabc,ab,abcdfd@abc')--null from FDM_SSA.T_PLPLPSS_PAC_REPAY_APPLY_PAY_ED a wher

instr(string str, string substr)

注意第二个参数才是substr

Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str. Be aware that this is not zero based. The first character in str has index 1..

尖叫提示：

查找字符串str中子字符串substr出现的位置，如果查找失败将返回0，如果任一参数为Null将返回null，注意位置为从1开始的，与locate类似,但是注意instr的参数第二个是substr,跟locate不一样。

hive>

select instr('abc','afdababcdddabceee'), --0 instr('afdababcdddabceee','abc') --6

string

lower(string A) lcase(string A)

Returns the string resulting from converting all characters of B to lower case. For example, lower('fOoBaR') results in 'foobar'..

将字符串A的所有字母转换成小写字母

hive> select lower('abSEd') from lxw_dual; absed hive> select lcase('abSEd') from lxw_dual; absed

string

lpad(string str, int len, string pad)

一般用来补足长度后用来截取字符串。

rpad(string str, int len, string pad)

Returns str, left-padded with pad to a length of len..

从左边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分

hive> select lpad('abc',10,'td') from lxw_dual; tdtdtdtabc

Returns str, right-padded with pad to a length of len..

从右边开始对字符串str使用字符串pad填充，最终len长度为止，如果字符串str本身长度比len大的话，将去掉多余的部分

string

ltrim(string A)

trim(string A)

rtrim(string A)

Returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example,

ltrim(' foobar ') results in 'foobar '..

hive> select ltrim(' abc ') from lxw_dual; abc

Returns the string resulting from trimming spaces from both ends of A. For example, trim(' foobar ') results in 'foobar'.

将字符串A前后出现的空格去掉

Returns the string resulting from trimming spaces from the end(right hand side) of A. For example, rtrim(' foobar ') results in ' foobar'..

去掉字符串后面出现的空格

string

upper(string A)

ucase(string A)

initcap(string A)

Returns the string resulting from converting all characters of A to upper case. For example, upper('fOoBaR') results in 'FOOBAR'..

将字符串A中的字母转换成大写字母

hive> select upper('abSEd') from lxw_dual; ABSED hive> select ucase('abSEd') from lxw_dual; ABSED

Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by whitespace. (As of Hive 1.1.0.).

将字符串A转换第一个字母大写其余字母的字符串

string

int

space(int n)

soundex(string A)

levenshtein(string A, string B)

Returns a string of n spaces..

返回n个空格

Returns soundex code of the string (as of Hive 1.2.0). For example, soundex('Miller') results in M460..

将普通字符串转换成soundex字符串

Returns the Levenshtein distance between two strings (as of Hive 1.2.0). For example, levenshtein('kitten', 'sitting') results in 3..

计算两个字符串之间的差异大小如：levenshtein('kitten', 'sitting') = 3

转载请注明原文地址: https://www.6miu.com/read-1750181.html

技术

最新回复(0)