2. solr 6.6.0 内容上传及查询

xiaoxiao2025-10-18 6

接上文：https://blog.csdn.net/danjuanzi2684/article/details/83385831

1. solr文字内容上传：

以core名为try为例，上传过程需要将上传的字段添加在配置文件try\\conf\\ managed-schema中，以上传old、WebpageURL、imgWebURL三个字段为例，基本配置如下：

如果该字段查询时可能用到分词，需要将type进行修改，solr中的text_ikk可以实现，首先找到配置文件中text_ikk类型的定义位置，将需要添加的字段进行添加，具体如下：

基本上传代码如下：

public class upload { public static void main(String[] args) throws SolrServerException, IOException { HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建Document对象 SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "c001"); doc.addField("name", "xiao ming"); doc.addField("age", "22"); // 将Document对象添加到索引库 server.add(doc); // 提交 server.commit(); } }

2.solr 图片特征上传：

solr中可以集成lire，实现图像特征的提取及检索，liresolr具体参考：https://bitbucket.org/dermotte/liresolr。

图片特征的上传，可直接利用solr自带的*_hi和*_ha，managed-schema中对应字段为以下三行，如果没有，自行添加：

如果中文分词或者添加_hi时出现错误：Error loading class 's 'net.semanticmetadata.lire.solr.BinaryDocValuValuesField'，将lire.jar、liresolr.jar、IKAnalyzer2012FF_u1.jar、ik-analyzer-solr5-5.x.jar，四个jar包拷贝到solr-6.6.0\server\solr-webapp\webapp\WEB-INF\lib文件夹下。详见GitHub。

利用java 实现图片特征的hashcode上传过程如下：

public class imgindexing { public static void main(String[] args) throws Exception { System.out.println("Images Indexing!"); int number =0; String path = "C:\\Users\\ttt\\Desktop\\图片"; File file=new File(path); File[] tempList = file.listFiles(); System.out.println("The number of Files: "+tempList.length); for (int i = 0; i < tempList.length; i++) { String imgpath = tempList[i].toString(); String name = imgpath.substring(imgpath.lastIndexOf("\\")+1); String hashcode = HashExtraction(imgpath); String feature = FeatureExtraction(imgpath); String id = String.valueOf(number); number++; System.out.println(number); solradd(hashcode, feature, name, id); } } public static void solradd(String hashcode, String feature, String name, String id) throws Exception { String url = "http://localhost:8983/solr/try"; HttpSolrClient server = new HttpSolrClient(url); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", id); doc.addField("name", name); doc.addField("feature_hi", feature); doc.addField("mvcnn_ha", hashcode); server.add(doc); //提交 server.commit(); } public static String FeatureExtraction(String path) throws Exception { BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); String arg2 = Base64.getEncoder().encodeToString(SerializationUtils.toByteArray(f.getFeatureVector())); return arg2; } public static String HashExtraction(String path) throws Exception { BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); double[] temp = f.getFeatureVector(); BitSampling.readHashFunctions(); int[] intArr=BitSampling.generateHashes(temp); String hashcode = arrayToString(intArr); return hashcode; } public static String arrayToString(int[] array) { StringBuilder sb = new StringBuilder(array.length * 8); for (int i = 0; i < array.length; i++) { if (i > 0) sb.append(' '); sb.append(Integer.toHexString(array[i])); } return sb.toString(); } }

2. solr导入xml文件数据：

参考博客地址：http://blog.csdn.net/vtopqx/article/details/73229080

首先需要配置导入字段，在managed-schema文件中新增：

编辑导入文件：

<add overwrite="true" commitWithin="10000"> <doc> <field name="id">1</field> <field name="isbn">ABC1234</field> <field name="name" boost="2">Some Book</field> </doc> <doc boost="2.5"> <field name="id">2</field> <field name="isbn">ZYVW9821</field> <field name="name" boost="2">Important Book</field> </doc> <doc> <field name="id">3</field> <field name="isbn">NXJS1234</field> <field name="name" boost="2">Some other book</field> </doc> </add>

将上述xml文件放进如下目录：solr\\example\\exampledocs目录，并在此处打开命令行窗口，输入如下命令：

java -Dtype=text/xml -Durl=http://localhost:8983/solr/solr_xml/update -jar post.jar book.xml

导入完成后可以在界面Query查询数据，如：

3.导入mysql数据：

本节前提是已经配置好了solr，并新创建了一个core，以core名为try为例：

① 修改soreconfig.xml，在soreconfig.xml的<requestHandler name="/select" class="solr.SearchHandler">上面添加如下代码：

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> 　　 <lst name="defaults"> 　　 <str name="config">data-config.xml</str> 　　 </lst> 　　</requestHandler>

注意：需要确认文件中不存在其他的dataimport，如果存在直接替换即可

②在同级目录下创建data-config.xml文件，然后配置数据库相关属性，column为数据库中对应的列名，name为solr中对应的列名：

<dataConfig> <dataSource name="source1" type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://127.0.0.1:3306/most3d_db" user="root" password="123456" />  <document> <entity name="most3d_new_8_23" dataSource="source1" pk="ID" query="SELECT ID,WebpageURL,ImgWebURL,author, timePost,browse,description,source,label FROM most3d_new_8_23"> <field column='ID' name='id' /> <field column='WebpageURL' name='WebpageURL' /> <field column='ImgWebURL' name='ImgWebURL' /> <field column='author' name='author' /> <field column='timePost' name='timePost' /> <field column='browse' name='browse' /> <field column='description' name='description' /> <field column='source' name='source' /> <field column='label' name='label' />  </entity> </document> </dataConfig>

③ 拷贝jar：拷贝solr-6.6.0\\dist路径下的solr-dataimporthandler-6.6.0.jar，solr-dataimporthandler-extras-6.6.0.jar 到Tomcat的WEB-INF\\lib目录下,同时拷贝jdbc包中的mysql-connector-java-5.1.40.jar到该目录下。（如果没有将solr配置到tomcat下，则跳过该步骤，关于配置到tomcat下参见我的博客：）

④ 修改jar配置路径：打开solrconfig.xml 找到lib标签，修改jar的路径，以我本地修改后的路径为例：（如果将solr配置到tomcat下，跳过该步骤）

重启solr，点击dataimport导入后，刷新页面即可看到下面界面。

如果想要将多张mysql表导入到solr中，由于id为主键值，会出现覆盖情况，解决办法如下：

其他设置不变，需要将这个sql语句进行如下修改即可，将原有的ID的sql语句改为concat("book_",ID)，可以将book替换为相对应的表名，需要注意单双引号的区别，需要修改的部分如下：

4. 在已上传的数据中增添字段：

在solr已有数据基础上增添一个字段，需保证该字段在solr的配置文件managed-schema中已进行过配置（详见上传部分）

public class addField { public static void main(String[] args) { HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建Document对象 SolrInputDocument solrInputDocument = new SolrInputDocument(); //在id=1的内容下新添一个字段old，并设置值为22 solrInputDocument.addField("id","1"); Map<String, String > operation = new HashMap<>(); operation.put("set", "22"); solrInputDocument.addField("old", operation); try { server.add(solrInputDocument); server.commit(); } catch (SolrServerException | IOException e) { e.printStackTrace(); } } }

5.删除已上传的部分数据：

1)request-handler 中选择/update

2) documents type 选择 XML

3)documents 输入下面语句并提交，即可将id=1的数据删除，

6.查询数据：

1） java中基本查询语句如下：

public class search { public static void main(String[] args) throws SolrServerException, IOException { // 创建HttpSolrServer HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); // 创建SolrQuery对象 SolrQuery query = new SolrQuery(); // 输入查询条件 query.setQuery("name:部件"); // 执行查询并返回结果 QueryResponse response = server.query(query); // 获取匹配的所有结果 SolrDocumentList list = response.getResults(); // 匹配结果总数 long count = list.getNumFound(); System.out.println("匹配结果总数:" + count); for (SolrDocument doc : list) { System.out.println(doc.get("id")); System.out.println(doc.get("name")); System.out.println(doc.get("label")); System.out.println("====================="); } } }

2）如果需要实现图片检索，不能直接将hashcode放入到查询语句中，需要将hashcode值进行分段查询：

public class imgsearch { public static void main(String[] args) throws Exception { //对要检索的图片提取特征，计算hashcode String path = "C:\\Users\\ttt\\Desktop\\图片\\2.jpg"; BufferedImage img = null; GlobalFeature f = null; img = ImageIO.read(new FileInputStream(path)); f = new PHOG(); f.extract(img); double[] temp = f.getFeatureVector(); BitSampling.readHashFunctions(); int[] intArr=BitSampling.generateHashes(temp); String hashcode = arrayToString(intArr); System.out.println(hashcode); // 创建HttpSolrServer String url = "http://localhost:8983/solr/try"; HttpSolrClient server = new HttpSolrClient(url); // 创建SolrQuery对象 SolrQuery query = new SolrQuery(); String hashes = hashcode; String[] split = hashes.split(" "); String querys = ""; //split.length for (int j = 0; j < 100; j++) { String s = split[j]; if (s.trim().length() > 0) querys += " mvcnn_ha:" + s.trim(); } // 输入查询条件 query.setQuery(querys); // 执行查询并返回结果 query.setRows(50); QueryResponse response = server.query(query); // 获取匹配的所有结果 SolrDocumentList list = response.getResults(); // 匹配结果总数 long count = list.getNumFound(); System.out.println("the number:" + count); for (SolrDocument doc2 : list) { System.out.println(doc2.get("name")); System.out.println("====================="); } } public static String arrayToString(int[] array) { StringBuilder sb = new StringBuilder(array.length * 8); for (int i = 0; i < array.length; i++) { if (i > 0) sb.append(' '); sb.append(Integer.toHexString(array[i])); } return sb.toString(); } }

3）多字段不同权重的查询：

将solrconfig.xml中的/browse做如下修改（以下为例，修改后表示：查询name、description、label三个字段，每个字段的文本相关度打分权重分别为1，0.4，0.6）：

<requestHandler name="/browse" class="solr.SearchHandler" useParams="query,facets,velocity,browse"> <lst name="defaults"> <str name="defType">edismax</str> <str name="pf">name label description</str> <str name="qf">name^1 description^0.4 label^0.6</str> </lst> </requestHandler>

重启solr，在浏览器中修改request-handler为/browse，进行查询即可。

如果想要在程序中进行查询，需要如下语句：

HttpSolrClient server = new HttpSolrClient("http://localhost:8983/solr/try"); SolrQuery query = new SolrQuery(); query.setRequestHandler("/browse"); query.setQuery("部件");

转载请注明原文地址: https://www.6miu.com/read-5038122.html

Java

最新回复(0)