nutch 0.9二次开发--网页快照

xiaoxiao2026-05-21 5

nutch 0.9二次开发--网页快照 nutch通过相关词进行搜索网页的时候,会查询出这个关键词对应的相关信息..

比如:title,url,content等等.

通过URL我们可以链接到相关真实的URL.

而网页快照其实是nutch在索引时,索引以前网页的内容.

所有当点击网页快照时,我们根据索引文档的ID,去索引出原网页内容.

Hit hit = new Hit(getIndexNo,getIndexDocNo); HitDetails details = bean.getDetails(hit); String content = new String(bean.getContent(details));

nutch 网页快照的中文问题

tomcat下的ROOT目录（nutch所在的目录）修改cached.jsp，把***elsecontent = new String( bean.getContent(details) );改成content = new String( bean.getContent(details) ,"utf-8");就ok了

转载请注明原文地址: https://www.6miu.com/read-5049157.html

Java

最新回复(0)