sparkcrawler安装

xiaoxiao2021-02-27  169

#1.新建文件夹 mkdir -p ~/sparkler #2.下载solr安装包 cd ~/sparkler wget http://mirror.bit.edu.cn/apache/lucene/solr/6.5.1/solr-6.5.1.tgz  #3.下载sparkler源文件 cd ~/sparkler git clone https://github.com/USCDataScience/sparkler.git #4.解压solr-6.5.1.tgz文件 tar -zxvf solr-6.5.1.tgz #5.删除solr-6.5.1.tgz文件 rm -f solr-6.5.1.tgz #6.进入到sparkler源码文件(我这里是绝对路径) cd ~/sparkler/sparkler/sparkler-ui #7.新建文件夹并进入该文件夹 mkdir banana cd banana git submodule init git submodule update #8.进入到源码sparkler-ui目录并执行mvn clean package cd ~/sparkler/sparkler/sparkler-ui mvn clean package #9.进入到sparkler源码根目录,并执行mvn clean install -DskipTests cd ~/sparkler/sparkler/ mvn clean install -DskipTests #10.进入solr目录 cd ~/sparkler/solr-6.5.1 #11.复制文件 cp -r ~/sparkler/sparkler/sparkler-ui/sparkler-dashboard ~/sparkler/solr-6.5.1/server/solr-webapp/ cp -r ~/sparkler/sparkler/conf/solr/sparkler-jetty-context.xml ~/sparkler/solr-6.5.1/server/contexts/ cp -rv ~/sparkler/sparkler/conf/solr/crawldb ~/sparkler/solr-6.5.1/server/solr/configsets/ cp -r ~/sparkler/solr-6.5.1/server/solr/configsets/crawldb ~/sparkler/solr-6.5.1/server/solr/ #12.启动solr ~/sparkler/solr-6.5.1/bin/solr start 13.浏览器访问http://localhost:8983/solr/#/~cores/ 新增(add Core) ————》name和instanceDir两个字段值都为crawldb 14.浏览器访问http://localhost:8983/banana/#/dashboard 1.点击右上角的文件图标(第三个小图标) 2.选择文件---》~/sparkler/sparkler/sparkler-ui/dashboard/Sparkler-Dashboard-Basic 3.点击右上角保存图标(第四个小图标) 4.点击 Set as Browser Default选项 15.进入到sparkler源码根目录 cd ~/sparkler/sparkler/ bin/sparkler.sh inject -su http://www.sina.com.cn/ #执行会返回一个jobId值,请记录它(sjob-1496713811764) bin/sparkler.sh crawl -id sjob-1496713811764  -m local[*] -i 1 我们就可以通过访问http://localhost:8983/banana/#/dashboard就可以看到数据了

也可以访问http://localhost:8983/solr/#/~cores/crawldb查看相关数据

本机安装需要git,jdk

转载请注明原文地址: https://www.6miu.com/read-10364.html

最新回复(0)