scrapy 的命令使用

xiaoxiao2021-02-27  359

输入scrapy会显示帮助命令

$ scrapy Scrapy 1.3.3 - project: chinese Usage: scrapy <command> [options] [args] Available commands: bench Run quick benchmark test check Check spider contracts commands crawl Run a spider edit Edit spider fetch Fetch a URL using the Scrapy downloader genspider Generate new spider using pre-defined templates list List available spiders parse Parse URL (using its spider) and print the results runspider Run a self-contained spider (without creating a project) settings Get settings values shell Interactive scraping console startproject Create new project version Print Scrapy version view Open URL in browser, as seen by Scrapy Use "scrapy <command> -h" to see more info about a command bench 做一个快速的基准测试 $ scrapy bench crawl 运行一个爬虫 #这是一个爬虫 在项目的spider文件夹下 class EduSpider(scrapy.Spider): name = "edu" allowed_domains = ["xx.xxx.com"] start_urls = ['http://xx.xxx.com/pinyi.html'] def parse(self, response): for url in response.xpath('//table[@id="table1"]//a[@class="fontbox"]/@href').extract(): yield scrapy.Request('http://xx.xxx.com/' + url, callback=self.parse_item)

运行crawl需要在项目中

$ scrapy crawl edu #edu是爬虫类中name的值 fetch 用下载器下载一个链接 $ scrapy fetch http://www.baidu.com view 在浏览器中打开一个链接 $ scrapy view http://www.baidu.com

version 显示爬虫的版本号

list 显示项目中的爬虫

$ srcapy list genspider 新建一个爬虫(非常重要的一个命令) genspider 有几个选项: –list,-l 显示出可用的模板 $ scrapy genspider -l #以下是输出 Available templates: basic crawl csvfeed xmlfeed 表示一共有 basic,crawl,csvfeed,xmlfeed四个模板可用

2 - -edit,-e 创建后编辑 3 - -dump,-d 将创建的内容显示在标准输出 4 - -template,-t 指定使用那个模板(这些模板必须是在-l 选项列出的其中之一)

$ scrapy genspider -t crawl baidu www.baidu.com

5 - -force, 如果这个爬虫已经存在就覆盖

$ scrapy genspider -t crawl --force baidu www.baidu.com

shell交互式命令

用来调试xpath和其他一下函数很有用,最好预先安装ipython $ scrapy shell http://www.baidu.com

会有好多内置对象可供使用,经常用的有response

check 合同测试 def parse(self, response): """ This function parses a sample response. Some contracts are mingled with this docstring. @url http://www.amazon.com/s?field-keywords=selfish+gene @returns items 1 16 @returns requests 0 0 @scrapes Title Author Year Price """

@url 必须的,测试的链接 @returns items items的个数上下限 @returns requests 0 0 requests的个数上下限 @scrapes Title Author Year Price item的字段

转载请注明原文地址: https://www.6miu.com/read-5717.html

最新回复(0)