2. solr 6.6.0 内容上传及查询

xiaoxiao2025-10-18  6

  接上文:https://blog.csdn.net/danjuanzi2684/article/details/83385831

 

1. solr文字内容上传:

        以core名为try为例,上传过程需要将上传的字段添加在配置文件try\\conf\\ managed-schema中,以上传old、WebpageURL、imgWebURL三个字段为例,基本配置如下:

<field name="old" type="string" indexed="true" stored="true" multiValued="false"/> <field name="WebpageURL" type="string" indexed="true" stored="true" multiValued="false"/> <field name="imgWebURL" type="string" indexed="true" stored="true" multiValued="false"/>

        如果该字段查询时可能用到分词,需要将type进行修改,solr中的text_ikk可以实现,首先找到配置文件中text_ikk类型的定义位置,将需要添加的字段进行添加,具体如下:

<fieldType name="text_ikk" class="solr.TextField"> <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> </fieldType> <field name="summary" type="text_ikk" indexed="true" stored="true" multiValued="false" /> <field name="name" type="text_ikk" indexed="true" stored="true" multiValued="false"/> <field name="description" type="text_ikk" indexed="true" stored="true" multiValued="false"/> <field name="label" type="text_ikk" indexed="true" stored="true" multiValued="false"/> <field name="test" type="text_ikk" indexed="true" stored="true" multiValued="false" />

  基本上传代码如下:

public class upload { public static void main(String[] args) throws SolrServerException, IOException { HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建Document对象 SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "c001"); doc.addField("name", "xiao ming"); doc.addField("age", "22"); // 将Document对象添加到索引库 server.add(doc); // 提交 server.commit(); } }

    

2.solr 图片特征上传:

      solr中可以集成lire,实现图像特征的提取及检索,liresolr具体参考:https://bitbucket.org/dermotte/liresolr

      图片特征的上传,可直接利用solr自带的*_hi和*_ha,managed-schema中对应字段为以下三行,如果没有,自行添加:

<dynamicField name="*_ha" type="text_ws" indexed="true" stored="true"/> <!-- if you are using BitSampling --> <dynamicField name="*_hi" type="binaryDV" indexed="true" stored="true"/> <fieldtype name="binaryDV" class="net.semanticmetadata.lire.solr.BinaryDocValuesField"/>

       如果中文分词或者添加_hi时出现错误:Error loading class 's 'net.semanticmetadata.lire.solr.BinaryDocValuValuesField',将lire.jar、liresolr.jar、IKAnalyzer2012FF_u1.jar、ik-analyzer-solr5-5.x.jar,四个jar包拷贝到solr-6.6.0\server\solr-webapp\webapp\WEB-INF\lib文件夹下。详见GitHub。

      利用java 实现图片特征的hashcode上传过程如下:

public class imgindexing { public static void main(String[] args) throws Exception { System.out.println("Images Indexing!"); int number =0; String path = "C:\\Users\\ttt\\Desktop\\图片"; File file=new File(path); File[] tempList = file.listFiles(); System.out.println("The number of Files: "+tempList.length); for (int i = 0; i < tempList.length; i++) { String imgpath = tempList[i].toString(); String name = imgpath.substring(imgpath.lastIndexOf("\\")+1); String hashcode = HashExtraction(imgpath); String feature = FeatureExtraction(imgpath); String id = String.valueOf(number); number++; System.out.println(number); solradd(hashcode, feature, name, id); } } public static void solradd(String hashcode, String feature, String name, String id) throws Exception { String url = "http://localhost:8983/solr/try"; HttpSolrClient server = new HttpSolrClient(url); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", id); doc.addField("name", name); doc.addField("feature_hi", feature); doc.addField("mvcnn_ha", hashcode); server.add(doc); //提交 server.commit(); } public static String FeatureExtraction(String path) throws Exception { BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); String arg2 = Base64.getEncoder().encodeToString(SerializationUtils.toByteArray(f.getFeatureVector())); return arg2; } public static String HashExtraction(String path) throws Exception { BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); double[] temp = f.getFeatureVector(); BitSampling.readHashFunctions(); int[] intArr=BitSampling.generateHashes(temp); String hashcode = arrayToString(intArr); return hashcode; } public static String arrayToString(int[] array) { StringBuilder sb = new StringBuilder(array.length * 8); for (int i = 0; i < array.length; i++) { if (i > 0) sb.append(' '); sb.append(Integer.toHexString(array[i])); } return sb.toString(); } }

 

2. solr导入xml文件数据:

        参考博客地址:http://blog.csdn.net/vtopqx/article/details/73229080

       首先需要配置导入字段,在managed-schema文件中新增:

<field name="name" type="string" stored="true" indexed="true"omitNorms="false"/> <field name="isbn" type="string" stored="true" indexed="true"/>

       编辑导入文件:

<add overwrite="true" commitWithin="10000"> <doc> <field name="id">1</field> <field name="isbn">ABC1234</field> <field name="name" boost="2">Some Book</field> </doc> <doc boost="2.5"> <field name="id">2</field> <field name="isbn">ZYVW9821</field> <field name="name" boost="2">Important Book</field> </doc> <doc> <field name="id">3</field> <field name="isbn">NXJS1234</field> <field name="name" boost="2">Some other book</field> </doc> </add>

       将上述xml文件放进如下目录:solr\\example\\exampledocs目录,并在此处打开命令行窗口,输入如下命令:

java -Dtype=text/xml -Durl=http://localhost:8983/solr/solr_xml/update -jar post.jar book.xml

      导入完成后可以在界面Query查询数据,如:

 

3.导入mysql数据:

     本节前提是已经配置好了solr,并新创建了一个core,以core名为try为例:

     ① 修改soreconfig.xml,在soreconfig.xml的<requestHandler name="/select" class="solr.SearchHandler">上面添加如下代码:

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">    <lst name="defaults">    <str name="config">data-config.xml</str>    </lst>   </requestHandler>

       注意:需要确认文件中不存在其他的dataimport,如果存在直接替换即可

        ②在同级目录下创建data-config.xml文件,然后配置数据库相关属性,column为数据库中对应的列名,name为solr中对应的列名:

<!--?xml version="1.0" encoding="UTF-8" ?--> <dataConfig> <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/most3d_db" user="root" password="123456" /> <!--SELECT most3d_new_8_23.,most3d_new_8_23.,most3d_new_8_23.address,most3d_new_8_23.city_name,most3d_new_8_23.create_time FROM most3d_new_8_23--> <document> <entity name="most3d_new_8_23" dataSource="source1" pk="ID" query="SELECT ID,WebpageURL,ImgWebURL,author, timePost,browse,description,source,label FROM most3d_new_8_23"> <field column='ID' name='id' /> <field column='WebpageURL' name='WebpageURL' /> <field column='ImgWebURL' name='ImgWebURL' /> <field column='author' name='author' /> <field column='timePost' name='timePost' /> <field column='browse' name='browse' /> <field column='description' name='description' /> <field column='source' name='source' /> <field column='label' name='label' /> <!-- <field column='create_time' name='createtime' dateTimeFormat='yyyy-MM-dd HH:mm:ss' /> --> </entity> </document> </dataConfig>

         ③ 拷贝jar:拷贝solr-6.6.0\\dist路径下的solr-dataimporthandler-6.6.0.jar,solr-dataimporthandler-extras-6.6.0.jar 到Tomcat的WEB-INF\\lib目录下,同时拷贝jdbc包中的mysql-connector-java-5.1.40.jar到该目录下。(如果没有将solr配置到tomcat下,则跳过该步骤,关于配置到tomcat下参见我的博客:

         ④ 修改jar配置路径:打开solrconfig.xml 找到lib标签,修改jar的路径,以我本地修改后的路径为例:(如果将solr配置到tomcat下,跳过该步骤)

<lib dir="D:\solr-6.6.0\solr-6.6.0/contrib/extraction/lib" regex=".*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/dist/" regex="solr-cell-\d.*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/contrib/clustering/lib/" regex=".*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/dist/" regex="solr-clustering-\d.*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/contrib/langid/lib/" regex=".*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/dist/" regex="solr-langid-\d.*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/contrib/velocity/lib" regex=".*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/dist/" regex="solr-velocity-\d.*\.jar" /> <lib dir="D:\solr-6.6.0\solr-6.6.0/contrib/IKAnalyzer2/lib" regex=".*\.jar" /> <lib dir="F:\solr-6.6.0\solr-6.6.0/dist/" regex="solr-dataimporthandler-\d.*\.jar" />

         重启solr,点击dataimport导入后,刷新页面即可看到下面界面。

        如果想要将多张mysql表导入到solr中,由于id为主键值,会出现覆盖情况,解决办法如下:

      其他设置不变, 需要将这个sql语句进行如下修改即可,将原有的ID的sql语句改为concat("book_",ID),可以将book替换为相对应的表名,需要注意单双引号的区别,需要修改的部分如下:

<entity name="local_mohou" dataSource="source1" pk="ID" query='SELECT concat("book_",ID),objurl,imgWebURL,name,category,subclass,description,source FROM local_mohou'> <field column='concat("book_",ID)' name='id' />

 

4. 在已上传的数据中增添字段:

    在solr已有数据基础上增添一个字段,需保证该字段在solr的配置文件managed-schema中已进行过配置(详见上传部分)

public class addField { public static void main(String[] args) { HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建Document对象 SolrInputDocument solrInputDocument = new SolrInputDocument(); //在id=1的内容下新添一个字段old,并设置值为22 solrInputDocument.addField("id","1"); Map<String, String > operation = new HashMap<>(); operation.put("set", "22"); solrInputDocument.addField("old", operation); try { server.add(solrInputDocument); server.commit(); } catch (SolrServerException | IOException e) { e.printStackTrace(); } } }

 

5.删除已上传的部分数据:

   1)request-handler 中选择/update

   2) documents type 选择 XML 

   3)documents 输入下面语句并提交,即可将id=1的数据删除,

<delete><query>id:1</query></delete> <commit/>

 

6.查询数据:

    1) java中基本查询语句如下:

public class search { public static void main(String[] args) throws SolrServerException, IOException { // 创建HttpSolrServer HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建SolrQuery对象 SolrQuery query = new SolrQuery(); // 输入查询条件 query.setQuery("name:部件"); // 执行查询并返回结果 QueryResponse response = server.query(query); // 获取匹配的所有结果 SolrDocumentList list = response.getResults(); // 匹配结果总数 long count = list.getNumFound(); System.out.println("匹配结果总数:" + count); for (SolrDocument doc : list) { System.out.println(doc.get("id")); System.out.println(doc.get("name")); System.out.println(doc.get("label")); System.out.println("====================="); } } }

      2)如果需要实现图片检索,不能直接将hashcode放入到查询语句中,需要将hashcode值进行分段查询:

public class imgsearch { public static void main(String[] args) throws Exception { //对要检索的图片提取特征,计算hashcode String path = "C:\\Users\\ttt\\Desktop\\图片\\2.jpg"; BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); double[] temp = f.getFeatureVector(); BitSampling.readHashFunctions(); int[] intArr=BitSampling.generateHashes(temp); String hashcode = arrayToString(intArr); System.out.println(hashcode); // 创建HttpSolrServer String url = "http://localhost:8983/solr/try"; HttpSolrClient server = new HttpSolrClient(url); // 创建SolrQuery对象 SolrQuery query = new SolrQuery(); String hashes = hashcode; String[] split = hashes.split(" "); String querys = ""; //split.length for (int j = 0; j < 100; j++) { String s = split[j]; if (s.trim().length() > 0) querys += " mvcnn_ha:" + s.trim(); } // 输入查询条件 query.setQuery(querys); // 执行查询并返回结果 query.setRows(50); QueryResponse response = server.query(query); // 获取匹配的所有结果 SolrDocumentList list = response.getResults(); // 匹配结果总数 long count = list.getNumFound(); System.out.println("the number:" + count); for (SolrDocument doc2 : list) { System.out.println(doc2.get("name")); System.out.println("====================="); } } public static String arrayToString(int[] array) { StringBuilder sb = new StringBuilder(array.length * 8); for (int i = 0; i < array.length; i++) { if (i > 0) sb.append(' '); sb.append(Integer.toHexString(array[i])); } return sb.toString(); } }

   3)多字段不同权重的查询:

          将solrconfig.xml中的/browse做如下修改(以下为例,修改后表示:查询name、description、label三个字段,每个字段的文本相关度打分权重分别为1,0.4,0.6):

<requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse"> <lst name="defaults"> <str name="defType">edismax</str> <str name="pf">name label description</str> <str name="qf">name^1 description^0.4 label^0.6</str> </lst> </requestHandler>

       重启solr,在浏览器中修改request-handler为/browse,进行查询即可。

       如果想要在程序中进行查询,需要如下语句:

HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); SolrQuery query = new SolrQuery(); query.setRequestHandler("/browse"); query.setQuery("部件");

 

转载请注明原文地址: https://www.6miu.com/read-5038122.html

最新回复(0)