Elasticsearch 之（5）kibana多种搜索方式

xiaoxiao2021-02-28 39

query string search

搜索全部商品：GET /ecommerce/product/_search took：耗费了几毫秒 timed_out：是否超时，这里是没有 _shards：数据拆成了5个分片，所以对于搜索请求，会打到所有的primary shard（或者是它的某个replica shard也可以） hits.total：查询结果的数量，3个document hits.max_score：score的含义，就是document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也高 hits.hits：包含了匹配搜索的document的详细数据 { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] } } query string search的由来，因为search参数都是以http请求的query string来附带的搜索商品名称中包含yagao的商品，而且按照售价降序排序：GET /ecommerce/product/_search?q=name:yagao&sort=price:desc 适用于临时的在命令行使用一些工具，比如curl，快速的发出请求，来检索想要的信息；但是如果查询请求很复杂，是很难去构建的在生产环境中，几乎很少使用query string search

query DSL

DSL：Domain Specified Language，特定领域的语言 http request body：请求体，可以用json的格式来构建查询语法，比较方便，可以构建各种复杂的语法，比query string search肯定强大多了查询所有的商品 GET /ecommerce/product/_search { "query": { "match_all": {} } } 查询名称包含yagao的商品，同时按照价格降序排序 GET /ecommerce/product/_search { "query" : { "match" : { "name" : "yagao" } }, "sort": [ { "price": "desc" } ] } 分页查询商品，总共3条商品，假设每页就显示1条商品，现在显示第2页，所以就查出来第2个商品 GET /ecommerce/product/_search { "query": { "match_all": {} }, "from": 1, "size": 1 } 指定要查询出来商品的名称和价格就可以 GET /ecommerce/product/_search { "query": { "match_all": {} }, "_source": ["name", "price"] } 更加适合生产环境的使用，可以构建复杂的查询

multi match

查询test_field 或 test_field1列中包含test

GET /test_index/test_type/_search { "query": { "multi_match": { "query": "test", "fields": ["test_field", "test_field1"] } } }

bool

用bool组合多个搜索条件，来搜索name

GET /ecommerce/product/_search { "query": { "bool": { "must": { "match": { "name": "gaolujie" }}, "must_not": { "match": { "name": "jiajieshi" }}, "should": [ { "match": { "title": "gaolujie" }}, { "match": { "title": "lengsuanling" }} ] } } }

控制搜索结果的精准度的第二步：指定一些关键字中，必须至少匹配其中50%的关键字，才能作为结果返回

GET /ecommerce/product/_search { "query": { "match": { "title": { "query": "gaolujie zhonghua yagao", "minimum_should_match": "50%" } } } }

query filter

搜索商品名称包含yagao，而且售价大于25元的商品

GET /ecommerce/product/_search { "query" : { "bool" : { "must" : { "match" : { "name" : "yagao" } }, "filter" : { "range" : { "price" : { "gt" : 25 } } } } } }

full-text search（全文检索）

GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "yagao producer" } } } producer这个字段，会先被拆解，建立倒排索引 special 4 yagao 4 producer 1,2,3,4 gaolujie 1 zhognhua 3 jiajieshi 2 yagao producer ---> yagao 和 producer { "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } } ] } } 搜索结果精准控制的第一步：灵活使用and关键字，如果你是希望所有的搜索关键字都要匹配的，那么就用and，可以实现单纯match query无法实现的效果 GET /ecommerce/product/_search { "query": { "match": { "title": { "query": "java elasticsearch", "operator": "and" } } } }如果对一个string field进行排序，结果往往不准确，因为分词后是多个单词，再排序就不是我们想要的结果了通常解决方案是，将一个string field建立两次索引，一个分词，用来进行搜索；一个不分词，用来进行排序（后续篇章讲解）

相当于

{ "bool": { "must": [ { "term": { "title": "java" }}, { "term": { "title": "elasticsearch" }} ] } }

phrase search (短语搜索)

跟全文检索相对应，相反，全文检索会将输入的搜索串拆解开来，去倒排索引里面去一一匹配，只要能匹配上任意一个拆解后的单词，就可以作为结果返回 phrase search，要求输入的搜索串，必须在指定的字段文本中，完全包含一模一样的，才可以算匹配，才能作为结果返回 GET /ecommerce/product/_search { "query" : { "match_phrase" : { "producer" : "yagao producer" } } } { "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } } ] } }

proximity match (近似匹配)

query string，搜索文本，中的几个term，要经过几次移动才能与一个document匹配，这个移动的次数，就是slop

hello world, java is very good, spark is also very good.java spark，match phrase，搜不到如果我们指定了slop，那么就允许java spark进行移动，来尝试与doc进行匹配

java is very good spark isjava sparkjava --> sparkjava --> sparkjava --> spark这里的slop，就是3，因为java spark这个短语，spark移动了3次，就可以跟一个doc匹配上了slop的含义，不仅仅是说一个query string terms移动几次，跟一个doc匹配上。一个query string terms，最多可以移动几次去尝试跟一个doc匹配上slop，设置的是3，那么就ok

GET /forum/article/_search { "query": { "match_phrase": { "title": { "query": "java spark", "slop": 3 } } } }

其实，加了slop的phrase match，就是proximity match，近似匹配1、java spark，短语，doc，phrase match2、java spark，可以有一定的距离，但是靠的越近，越先搜索出来，proximity match

highlight search（高亮搜索结果）

搜索结果高亮展示

GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "producer" } }, "highlight": { "fields" : { "producer" : {} } } } { "took": 6, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 0.51623213, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.51623213, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] }, "highlight": { "producer": [ "zhonghua producer" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.25811607, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] }, "highlight": { "producer": [ "jiajieshi producer" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] }, "highlight": { "producer": [ "gaolujie producer" ] } } ] } }

mget 批量查询

1、批量查询的好处就是一条一条的查询，比如说要查询100条数据，那么就要发送100次网络请求，这个开销还是很大的如果进行批量查询的话，查询100条数据，就只要发送1次网络请求，网络请求的性能开销缩减100倍 2、mget的语法（1）一条一条的查询 GET /test_index/test_type/1 GET /test_index/test_type/2 （2）mget批量查询 GET /_mget { "docs" : [ { "_index" : "test_index", "_type" : "test_type", "_id" : 1 }, { "_index" : "test_index", "_type" : "test_type", "_id" : 2 } ] } { "docs": [ { "_index": "test_index", "_type": "test_type", "_id": "1", "_version": 2, "found": true, "_source": { "test_field1": "test field1", "test_field2": "test field2" } }, { "_index": "test_index", "_type": "test_type", "_id": "2", "_version": 1, "found": true, "_source": { "test_content": "my test" } } ] } （3）如果查询的document是一个index下的不同type种的话 GET /test_index/_mget { "docs" : [ { "_type" : "test_type", "_id" : 1 }, { "_type" : "test_type", "_id" : 2 } ] } （4）如果查询的数据都在同一个index下的同一个type下，最简单了 GET /test_index/test_type/_mget { "ids": [1, 2] } 3、mget的重要性可以说mget是很重要的，一般来说，在进行查询的时候，如果一次性要查询多条数据的话，那么一定要用batch批量操作的api 尽可能减少网络开销次数，可能可以将性能提升数倍，甚至数十倍，非常非常之重要

bulk语法

POST /_bulk { "delete": { "_index": "test_index", "_type": "test_type", "_id": "3" }} { "create": { "_index": "test_index", "_type": "test_type", "_id": "12" }} { "test_field": "test12" } { "index": { "_index": "test_index", "_type": "test_type", "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} } 每一个操作要两个json串，语法如下： {"action": {"metadata"}} {"data"} 举例，比如你现在要创建一个文档，放bulk里面，看起来会是这样子的： {"index": {"_index": "test_index", "_type", "test_type", "_id": "1"}} {"test_field1": "test1", "test_field2": "test2"} 有哪些类型的操作可以执行呢？（1）delete：删除一个文档，只要1个json串就可以了（2）create：PUT /index/type/id/_create，强制创建（3）index：普通的put操作，可以是创建文档，也可以是全量替换文档（4）update：执行的partial update操作 bulk api对json的语法，有严格的要求，每个json串不能换行，只能放一行，同时一个json串和一个json串之间，必须有一个换行 { "error": { "root_cause": [ { "type": "json_e_o_f_exception", "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 3]" } ], "type": "json_e_o_f_exception", "reason": "Unexpected end-of-input: expected close marker for Object (start marker at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 1])\n at [Source: org.elasticsearch.transport.netty4.ByteBufStreamInput@5a5932cd; line: 1, column: 3]" }, "status": 500 } { "took": 41, "errors": true, "items": [ { "delete": { "found": true, "_index": "test_index", "_type": "test_type", "_id": "10", "_version": 3, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200 } }, { "create": { "_index": "test_index", "_type": "test_type", "_id": "3", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true, "status": 201 } }, { "create": { "_index": "test_index", "_type": "test_type", "_id": "2", "status": 409, "error": { "type": "version_conflict_engine_exception", "reason": "[test_type][2]: version conflict, document already exists (current version [1])", "index_uuid": "6m0G7yx7R1KECWWGnfH1sw", "shard": "2", "index": "test_index" } } }, { "index": { "_index": "test_index", "_type": "test_type", "_id": "4", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true, "status": 201 } }, { "index": { "_index": "test_index", "_type": "test_type", "_id": "2", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false, "status": 200 } }, { "update": { "_index": "test_index", "_type": "test_type", "_id": "1", "_version": 3, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 200 } } ] } bulk操作中，任意一个操作失败，是不会影响其他的操作的，但是在返回结果里，会告诉你异常日志 POST /test_index/_bulk { "delete": { "_type": "test_type", "_id": "3" }} { "create": { "_type": "test_type", "_id": "12" }} { "test_field": "test12" } { "index": { "_type": "test_type" }} { "test_field": "auto-generate id test" } { "index": { "_type": "test_type", "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} } POST /test_index/test_type/_bulk { "delete": { "_id": "3" }} { "create": { "_id": "12" }} { "test_field": "test12" } { "index": { }} { "test_field": "auto-generate id test" } { "index": { "_id": "2" }} { "test_field": "replaced test2" } { "update": { "_id": "1", "_retry_on_conflict" : 3} } { "doc" : {"test_field2" : "bulk test1"} } 2、bulk size最佳大小 bulk request会加载到内存里，如果太大的话，性能反而会下降，因此需要反复尝试一个最佳的bulk size。一般从1000~5000条数据开始，尝试逐渐增加。另外，如果看大小的话，最好是在5~15MB之间。

scoll

如果一次性要查出来比如10万条数据，那么性能会很差，此时一般会采取用scoll滚动查询，一批一批的查，直到所有数据都查询完处理完。使用scoll滚动搜索，可以先搜索一批数据，然后下次再搜索一批数据，以此类推，直到搜索出全部的数据来scoll搜索会在第一次搜索的时候，保存一个当时的视图快照，之后只会基于该旧的视图快照提供数据搜索，如果这个期间数据变更，是不会让用户看到的采用基于_doc进行排序的方式，性能较高每次发送scroll请求，我们还需要指定一个scoll参数，指定一个时间窗口，每次搜索请求只要在这个时间窗口内能完成就可以了 GET /test_index/test_type/_search?scroll=1m { "query": { "match_all": {} }, "sort": [ "_doc" ], "size": 3 } { "_scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAACxeFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYBY0b25zVFlWWlRqR3ZJajlfc3BXejJ3AAAAAAAALF8WNG9uc1RZVlpUakd2SWo5X3NwV3oydwAAAAAAACxhFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYhY0b25zVFlWWlRqR3ZJajlfc3BXejJ3", "took": 5, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 10, "max_score": null, "hits": [ { "_index": "test_index", "_type": "test_type", "_id": "8", "_score": null, "_source": { "test_field": "test client 2" }, "sort": [ 0 ] }, { "_index": "test_index", "_type": "test_type", "_id": "6", "_score": null, "_source": { "test_field": "tes test" }, "sort": [ 0 ] }, { "_index": "test_index", "_type": "test_type", "_id": "AVp4RN0bhjxldOOnBxaE", "_score": null, "_source": { "test_content": "my test" }, "sort": [ 0 ] } ] } } 获得的结果会有一个scoll_id，下一次再发送scoll请求的时候，必须带上这个scoll_id GET /_search/scroll { "scroll": "1m", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAACxeFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYBY0b25zVFlWWlRqR3ZJajlfc3BXejJ3AAAAAAAALF8WNG9uc1RZVlpUakd2SWo5X3NwV3oydwAAAAAAACxhFjRvbnNUWVZaVGpHdklqOV9zcFd6MncAAAAAAAAsYhY0b25zVFlWWlRqR3ZJajlfc3BXejJ3" }

scoll，看起来挺像分页的，但是其实使用场景不一样。分页主要是用来一页一页搜索，给用户看的；scoll主要是用来一批一批检索数据，让系统进行处理的

转载请注明原文地址: https://www.6miu.com/read-2624940.html

技术

最新回复(0)