Scrapy 2.6 Exceptions 异常处理使用指南

科技 08-04 来源： Mr数据杨

Python3 的 Scrapy 爬虫框架 中数据爬取过程中异常操作处理操作。用于在 Scrapy 爬虫框架 工作过程中遇见特殊情况对整个工程进行的相关操作。

Scrapy 版本：2.6+

常规异常操作

关闭spider操作

异常错误类型。

exception scrapy.exceptions.CloseSpider(reason='cancelled')

异常操作举例，定义的业务逻辑。

def parse_page(self, response):    # 自定义业务逻辑    if 'xxxxxx' in response.body:         raise CloseSpider('xxxxxx')

不关闭spider操作

异常错误类型。

exception scrapy.exceptions.DontCloseSpider

异常操作举例，为防止由于特殊情况停止spider。

def parse_page(self, response):    # 自定义业务逻辑    # 在spider_idle信号处理程序中引发此异常，以防止spider被关闭。    if 'xxxxxx' not in response.body:        raise DontCloseSpider('xxxxxxx')

停止处理Item操作

异常错误类型。

exception scrapy.exceptions.DropItem

异常操作举例，pipline阶段必须引发的异常才能停止处理Item。

def parse_page(self, response):    # 自定义业务逻辑    if 'xxxxxxx' not in response.body:         raise DropItem('drop_item')

忽略请求操作

异常错误类型。

exception scrapy.exceptions.IgnoreRequest

异常操作举例，调度程序或任何下载程序中间件均可引发此异常，以指示应忽略该请求。

def parse_page(self, response):    if 'Bandwidth exceeded' not in response.body:         raise IgnoreRequest('ignore_request')

未配置操作

异常错误类型。

exception scrapy.exceptions.NotConfigured

异常操作举例，某些组件可能会引发异常，指示将保持禁用状态。

Extensions
Item pipelines
Downloader middlewares
Spider middlewares

def parse_page(self, response):    if 'xxxxxx' not in response.body:        raise NotConfigured('not_configured')

不支持操作

异常错误类型。

exception scrapy.exceptions.NotSupported

异常操作举例，调度程序或任何下载程序中间件均可引发此异常，以指示应忽略该请求。

def parse_page(self, response):    if 'xxxxxx' not in response.body:        raise NotSupported('not_upported')

停止下载操作

异常错误类型。

exception scrapy.exceptions.StopDownload(fail=True)

异常操作举例：

fail=True（默认），则调用请求errback
fail=False，则调用请求callback

def parse_page(self, response):    if 'xxxxxx' not in response.body:        raise StopDownload('stop_download')# fail=False 则调用请求回调def errback(self, failure)：    failure.value.response