Hbase的filter使用

xiaoxiao2021-02-28  53

参数基础 有两个参数类在各类Filter中经常出现,统一介绍下: (1)比较运算符  CompareFilter.CompareOp 比较运算符用于定义比较关系, 可以有以下几类值供选择: EQUAL                                  相等GREATER                              大于GREATER_OR_EQUAL           大于等于LESS                                      小于LESS_OR_EQUAL                  小于等于NOT_EQUAL                        不等于 (2)比较器   ByteArrayComparable 通过比较器可以实现多样化目标匹配效果,比较器 有以下子类可以使用: BinaryComparator               匹配完整字节数组 BinaryPrefixComparator     匹配字节数组前缀 BitComparatorNullComparatorRegexStringComparator    正则表达式匹配SubstringComparator        子串匹配 1,FilterList FilterList 代表一个过滤器链 ,它可以包含一组即将应用于目标数据集的过滤器 ,过滤器间具有“与”   FilterList.Operator.MUST_PASS_ALL   和“或”  FilterList.Operator.MUST_PASS_ONE  关系。 官网实例代码, 两个 或” 关系的 过滤器 的写法: [java]  view plain  copy FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ONE);   //数据只要满足一组过滤器中的一个就可以  SingleColumnValueFilter filter1 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my value"));  list.add(filter1);  SingleColumnValueFilter filter2 = new SingleColumnValueFilter(cf,column,CompareOp.EQUAL,Bytes.toBytes("my other value"));  list.add(filter2);  Scan scan = new Scan();  scan.setFilter(list);   2,列值过滤器--SingleColumnValueFilter SingleColumnValueFilter 用于测试列值相等 (CompareOp.EQUAL ), 不等 (CompareOp.NOT_EQUAL),或单侧范围 (e.g., CompareOp.GREATER)。 构造函数: (1)比较的关键字是一个字符数组 SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, byte[] value) (2)比较的关键字是一个比较器(比较器下一小节做介绍) SingleColumnValueFilter(byte[] family, byte[] qualifier, CompareFilter.CompareOp compareOp, ByteArrayComparable comparator) 注意:根据列的值来决定这一行数据是否返回,落脚点在行,而不是列。我们可以设置filter.setFilterIfMissing(true);如果为true,当这一列不存在时,不会返回,如果为false,当这一列不存在时,会返回所有的列信息 测试表user内容如下: java代码测试: [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));          SingleColumnValueFilter scvf= new SingleColumnValueFilter(Bytes.toBytes("account"), Bytes.toBytes("name"),                CompareOp.EQUAL,"zhangsan".getBytes());          scvf.setFilterIfMissing(true); //默认为false, 没有此列的数据也会返回 ,为true则只返回name=lisi的数据          Scan scan = new Scan();          scan.setFilter(scvf);          ResultScanner resultScanner = table.getScanner(scan);          for (Result result : resultScanner) {               List<Cell> cells= result.listCells();                   for (Cell cell : cells) {                   String row = Bytes.toString(result.getRow());                   String family1 = Bytes.toString(CellUtil.cloneFamily(cell));                   String qualifier = Bytes.toString(CellUtil.cloneQualifier(cell));                   String value = Bytes.toString(CellUtil.cloneValue(cell));                   System.out.println("[row:"+row+"],[family:"+family1+"],[qualifier:"+qualifier+"]"                          + ",[value:"+value+"],[time:"+cell.getTimestamp()+"]");              }          }   如果setFilterIfMissing(true), 有匹配只会返回当前列所在的行数据,基于行的数据 country 也返回了,因为他么你的rowkey是相同的 [java]  view plain  copy [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]   如果setFilterIfMissing(false),有匹配的列的值相同会返回,没有此列的 name的也会返回,, 不匹配的name则不会返回。 下面 红色是匹配列内容的会返回,其他的不是account:name列也会返回,, name=lisi的不会返回,因为不匹配。 [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]  [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]  [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]  [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]  [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]  [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  <span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]</span>  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]   3 键值元数据 由于HBase 采用键值对保存内部数据,键值元数据过滤器评估一行的键 (ColumnFamily:Qualifiers) 是否存在   3.1. 基于列族过滤数据的FamilyFilter 构造函数: FamilyFilter(CompareFilter.CompareOp familyCompareOp, ByteArrayComparable familyComparator) 代码如下: [java]  view plain  copy    public static ResultScanner getDataFamilyFilter(String tableName,String family) throws IOException{          Table table = connection.getTable(TableName.valueOf("user"));          FamilyFilter ff = new FamilyFilter(CompareOp.EQUAL ,                   new BinaryComparator(Bytes.toBytes("account")));   //表中不存在account列族,过滤结果为空  //       new BinaryPrefixComparator(value) //匹配字节数组前缀  //       new RegexStringComparator(expr) // 正则表达式匹配  //       new SubstringComparator(substr)// 子字符串匹配           Scan scan = new Scan();          // 通过scan.addFamily(family)  也可以实现此操作          scan.setFilter(ff);          ResultScanner resultScanner = table.getScanner(scan);          return resultScanner;      }   测试结果:查询的都是account列簇的内容 [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]  [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]   3.2. 基于限定符Qualifier(列)过滤数据的QualifierFilter 构造函数: QualifierFilter(CompareFilter.CompareOp op, ByteArrayComparable qualifierComparator) [java]  view plain  copy     Table table = connection.getTable(TableName.valueOf("user"));          QualifierFilter ff = new QualifierFilter(                  CompareOp.EQUAL , new BinaryComparator(Bytes.toBytes("name")));  //       new BinaryPrefixComparator(value) //匹配字节数组前缀  //       new RegexStringComparator(expr) // 正则表达式匹配  //       new SubstringComparator(substr)// 子字符串匹配           Scan scan = new Scan();          // 通过scan.addFamily(family)  也可以实现此操作          scan.setFilter(ff);          ResultScanner resultScanner = table.getScanner(scan);   测试结果:只返回 name 的列内容 [java]  view plain  copy [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]   3.3. 基于列名(即Qualifier)前缀过滤数据的ColumnPrefixFilter  ( 该功能用QualifierFilter也能实现 ) 构造函数: ColumnPrefixFilter(byte[] prefix)  [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));           ColumnPrefixFilter ff = new ColumnPrefixFilter(Bytes.toBytes("name"));          Scan scan = new Scan();          // 通过QualifierFilter的 newBinaryPrefixComparator也可以实现          scan.setFilter(ff);          ResultScanner resultScanner = table.getScanner(scan);   返回结果: [java]  view plain  copy [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]   3.4. 基于多个列名(即Qualifier)前缀过滤数据的MultipleColumnPrefixFilter MultipleColumnPrefixFilter 和 ColumnPrefixFilter 行为差不多,但可以指定多个前缀 [java]  view plain  copy byte[][] prefixes = new byte[][] {Bytes.toBytes("name"), Bytes.toBytes("age")};          //返回所有行中以name或者age打头的列的数据          MultipleColumnPrefixFilter ff = new MultipleColumnPrefixFilter(prefixes);            Scan scan = new Scan();          scan.setFilter(ff);          ResultScanner rs = table.getScanner(scan);     结果: [java]  view plain  copy [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]   3.5. 基于列范围过滤数据ColumnRangeFilter 构造函数: ColumnRangeFilter(byte[] minColumn, boolean minColumnInclusive, byte[] maxColumn, boolean maxColumnInclusive) 参数解释: minColumn - 列范围的最小值,如果为空,则没有下限;minColumnInclusive - 列范围是否包含minColumn maxColumn - 列范围最大值,如果为空,则没有上限;maxColumnInclusive - 列范围是否包含maxColumn 。 代码: [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));          byte[] startColumn = Bytes.toBytes("a");          byte[] endColumn = Bytes.toBytes("d");          //返回所有列中从a到d打头的范围的数据,          ColumnRangeFilter ff = new ColumnRangeFilter(startColumn, true, endColumn, true);          Scan scan = new Scan();          scan.setFilter(ff);          ResultScanner rs = table.getScanner(scan);     结果:返回列名开头是a 到  d的所有列数据 [java]  view plain  copy [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]   4. RowKey 当需要根据行键特征查找一个范围的行数据时,使用Scan的 startRow和stopRow会更高效,但是, startRow和stopRow只能匹配行键的开始字符,而不能匹配中间包含的字符         byte[] startColumn = Bytes.toBytes("azha");         byte[] endColumn = Bytes.toBytes("dddf");         Scan scan = new Scan(startColumn,endColumn); 当需要针对行键进行更复杂的过滤时,可以使用 RowFilter: 构造函数: RowFilter(CompareFilter.CompareOp rowCompareOp, ByteArrayComparable rowComparator) 代码: [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));          RowFilter rf = new RowFilter(CompareOp.EQUAL ,                   new SubstringComparator("zhangsan"));          //       new BinaryPrefixComparator(value) //匹配字节数组前缀          //       new RegexStringComparator(expr) // 正则表达式匹配          //       new SubstringComparator(substr)// 子字符串匹配           Scan scan = new Scan();          scan.setFilter(rf);          ResultScanner rs = table.getScanner(scan);    结果: [java]  view plain  copy [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]   5.PageFilter 指定页面行数,返回对应行数的结果集。 需要注意的是,该过滤器并不能保证返回的结果行数小于等于指定的页面行数,因为过滤器是分别作用到各个region server的,它只能保证当前region返回的结果行数不超过指定页面行数。 构造函数: PageFilter(long pageSize) 代码: [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));          PageFilter pf = new PageFilter(2L);          Scan scan = new Scan();          scan.setFilter(pf);          scan.setStartRow(Bytes.toBytes("zhangsan_"));          ResultScanner rs = table.getScanner(scan);   结果:返回的结果实际上有四条,因为这数据来自不同RegionServer,  [java]  view plain  copy [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan],[time:1495556648729]   6.SkipFilter 根据整行中的每个列来做过滤,只要存在一列不满足条件,整行都被过滤掉。 例如,如果一行中的所有列代表的是不同物品的重量,则真实场景下这些数值都必须大于零,我们希望将那些包含任意列值为0的行都过滤掉。 在这个情况下,我们结合ValueFilter和SkipFilter共同实现该目的: scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL, new BinaryComparator(Bytes.toBytes(0)))); 构造函数: SkipFilter(Filter filter)  代码: [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));          SkipFilter sf = new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,                  new BinaryComparator(Bytes.toBytes("zhangsan"))));          Scan scan = new Scan();          scan.setFilter(sf);          ResultScanner rs = table.getScanner(scan);    结果: [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]  [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]  [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]  [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]  [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]  [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]  [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898],[time:1495636370056]  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]   和原来数据相比  列值为name的 zhagnsan的所在行的 rowkey   为   zhangsan_1495527850824 在上面结果中是过滤了 [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]  [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]  [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔]  [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]  [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]  [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]  [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]  [row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]  <strong><span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]  [row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]</span></strong>  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]   7. FirstKeyOnlyFilter 该过滤器仅仅返回每一行中的第一个cell的值, 可以用于高效的执行行数统计操作。 构造函数: public FirstKeyOnlyFilter() 代码 [java]  view plain  copy Table table = connection.getTable(TableName.valueOf("user"));           FirstKeyOnlyFilter fkof = new FirstKeyOnlyFilter();              Scan scan = new Scan();              scan.setFilter(fkof);              ResultScanner rs = table.getScanner(scan);    结果: 看着返回数据还没明白, 仅仅返回每一行中的第一个cell的值可以用于高效的执行行数统计操作。 [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230],[time:1495556647872]  [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi],[time:1495556647984]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236],[time:1495556648013]  [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔],[time:1495556648017]  [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai],[time:1495556648041]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21],[time:1495556648045]  [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女],[time:1495556648056]  [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002],[time:1495556648060]  [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009],[time:1495624624131]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646],[time:1495556648664]  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china],[time:1495636452285]  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong],[time:1495556648855]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100],[time:1495556648878]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男],[time:1495556648983]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001],[time:1495556648996]   对比原数据“ [java]  view plain  copy [row:lisi_1495527849910],[family:account],[qualifier:idcard],[value:42963319861234561230]  [row:lisi_1495527850081],[family:account],[qualifier:name],[value:lisi]  [row:lisi_1495527850111],[family:account],[qualifier:password],[value:123451231236]  [row:lisi_1495527850114],[family:address],[qualifier:city],[value:黄埔]  [row:lisi_1495527850136],[family:address],[qualifier:province],[value:shanghai]  [row:lisi_1495527850144],[family:info],[qualifier:age],[value:21]  [row:lisi_1495527850154],[family:info],[qualifier:sex],[value:女]  [row:lisi_1495527850159],[family:userid],[qualifier:id],[value:002]  [row:wangwu_1495595824517],[family:userid],[qualifier:id],[value:009]  [row:zhangsan_1495527850759],[family:account],[qualifier:idcard],[value:9897645464646]  <strong><span style="color:#ff0000;">[row:zhangsan_1495527850759],[family:account],[qualifier:passport],[value:5689879898]</span></strong>  [row:zhangsan_1495527850824],[family:account],[qualifier:country],[value:china]  <strong><span style="color:#ff0000;">[row:zhangsan_1495527850824],[family:account],[qualifier:name],[value:zhangsan]</span></strong>  [row:zhangsan_1495527850951],[family:address],[qualifier:province],[value:guangdong]  [row:zhangsan_1495527850975],[family:info],[qualifier:age],[value:100]  [row:zhangsan_1495527851080],[family:info],[qualifier:sex],[value:男]  [row:zhangsan_1495527851095],[family:userid],[qualifier:id],[value:001]   对比一下明显,rowkey相同的只会返回第一个rowkey的所在cell数据
转载请注明原文地址: https://www.6miu.com/read-2600049.html

最新回复(0)