输入scrapy会显示帮助命令
$ scrapy Scrapy 1.3.3 - project: chinese Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test check Check spider contracts commands crawl Run a spider edit Edit spider fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates list List available spiders parse Parse URL (using its spider) and print the results runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy Use "scrapy <command> -h" to see more info about a command bench 做一个快速的基准测试 $ scrapy bench crawl 运行一个爬虫 #这是一个爬虫 在项目的spider文件夹下 class EduSpider(scrapy.Spider): name = "edu" allowed_domains = ["xx.xxx.com"] start_urls = ['http://xx.xxx.com/pinyi.html'] def parse(self, response): for url in response.xpath('//table[@id="table1"]//a[@class="fontbox"]/@href').extract(): yield scrapy.Request('http://xx.xxx.com/' + url, callback=self.parse_item)运行crawl需要在项目中
$ scrapy crawl edu #edu是爬虫类中name的值 fetch 用下载器下载一个链接 $ scrapy fetch http://www.baidu.com view 在浏览器中打开一个链接 $ scrapy view http://www.baidu.comversion 显示爬虫的版本号
list 显示项目中的爬虫
$ srcapy list genspider 新建一个爬虫(非常重要的一个命令) genspider 有几个选项: –list,-l 显示出可用的模板 $ scrapy genspider -l #以下是输出 Available templates: basic crawl csvfeed xmlfeed 表示一共有 basic,crawl,csvfeed,xmlfeed四个模板可用2 - -edit,-e 创建后编辑 3 - -dump,-d 将创建的内容显示在标准输出 4 - -template,-t 指定使用那个模板(这些模板必须是在-l 选项列出的其中之一)
$ scrapy genspider -t crawl baidu www.baidu.com5 - -force, 如果这个爬虫已经存在就覆盖
$ scrapy genspider -t crawl --force baidu www.baidu.com shell交互式命令 用来调试xpath和其他一下函数很有用,最好预先安装ipython $ scrapy shell http://www.baidu.com会有好多内置对象可供使用,经常用的有response
check 合同测试 def parse(self, response): """ This function parses a sample response. Some contracts are mingled with this docstring. @url http://www.amazon.com/s?field-keywords=selfish+gene @returns items 1 16 @returns requests 0 0 @scrapes Title Author Year Price """@url 必须的,测试的链接 @returns items items的个数上下限 @returns requests 0 0 requests的个数上下限 @scrapes Title Author Year Price item的字段