nutch-0.9 build.xml和.bat文件 配置

xiaoxiao2026-05-22  1

在Windows下运行Nutch,很简单,只要你能执行Crawl这个类就行,写一个Ant脚本放在Nuthc的根目录下执行它就OK,内容如下:

< project  name ="nutch-crawl"  default ="crawl"  basedir ="." >          < property  name ="lib.dir"   location ="lib" />      < property  name ="conf.dir"   location ="conf" />          < path  id ="project.classpath" >          < fileset  dir ="."  includes ="nutch-*.jar" />          < fileset  dir ="lib"   />          < pathelement  path ="." />          < pathelement  path ="${conf.dir}" />      </ path >              < target  name ="crawl"   >          < echo > crwaling starting </ echo >          < property  name ="JVM.extra.args"  value ="-Xmx512m"   />          < java  classname ="org.apache.nutch.crawl.Crawl"  classpathref ="project.classpath"  fork ="true" >              < jvmarg  line ="${JVM.extra.args}" />              < arg  value ="C:/dev-tools/nutch-0.9/urls" />  <!-- url.txt文件存放的目录 -->              < arg  value ="-dir" />              < arg  value ="C:/dev-tools/nutch-0.9/crawl" />  <!-- 爬虫文件存放的目录 -->              < arg  value ="-depth" />              < arg  value ="3" />              < arg  value ="-threads" />              < arg  value ="15" />          </ java >          < echo > crwaling finished </ echo >      </ target >      </ project >

 

启动bulid.xml批处理文件run.bat(放在Nuthc的根目录,假若工程放在E盘下)

@echo off   cd e:antpause

转载请注明原文地址: https://www.6miu.com/read-5049172.html

最新回复(0)