Scrapy shell使用
2024-10-19 04:21:00
注意:容易出现403错误,实际爬取时不会出现。
response - a Response object containing the last fetched page
>>>response.xpath('//title/text()').extract()
return a list of selectors
>>>for index, link in enumerate(links):
... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract()) ... print 'Link number %d points to url %s and image %s' % args
Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg'] Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg'] Link number 2 points to url [u'image3.html'] and image [u'image3_thumb.jpg'] Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg'] Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']
enumerate() 函数一般用在 for 循环当中。
>>> seq = ['one', 'two', 'three'] >>> for element in seq: ... print i, seq[i] ... i +=1 ... 0 one 1 two 2 three
one 1 two 2 three
suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
>>> divs = response.xpath('//div')
note the dot prefixing the .//p XPath):
>>> for p in divs.xpath('.//p'): # extracts all <p> inside ... print p.extract()
Another common case would be to extract all direct <p> children:
>>> for p in divs.xpath('p'): ... print p.extract()
在程序中使用shell
from scrapy.shell import inspect_response inspect_response(response, self)
Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the crawling:
xpath最外层最好用单引号!
shell 本地html,方便 调试(但别取名为index.html)
scrapy shell ./path/to/file.html ,即使在本目录,也必须要加./,不能直接 shell file.html scrapy shell ../other/path/to/file.html scrapy shell /absolute/path/to/file.html
最新文章
- 数组Array,集合List与字符串String,整形int的get类方法。
- 要当好JavaScript程序员:5个debug技巧
- The Non-Inverting Amplifier Output Resistance by Adrian S. Nastase [ Copied ]
- Maya 与 Matlab 数据互联插件使用教程
- ks使用lvm分区,ks启动
- Android中通过广播方式调起第三方App
- MySQL Replication 常用架构
- mysql下用户和密码生成管理
- C# 中文在URL中的编码
- Node.js之事件监听和发送
- Zabbix3.4-RHEL 7.4 X64 YUM联网安装
- golang-flag的问题
- python学习记录20190122_增量赋值
- Event Recommendation Engine Challenge分步解析第二步
- [转] 通过jQuery Ajax使用FormData对象上传文件
- struts2 中的 addActionError 、addFieldError、addActionMessage的方法【转】
- NSHashTable NSPointerArray
- C#基础篇四数组
- HDU 4751 Divide Groups (2013南京网络赛1004题,判断二分图)
- 面试整理(3)js事件委托
热门文章
- Struts基本概念
- RocketMQ-创建MappedFile本地文件
- java二维码小试牛刀
- vector iterator not incrementable For information on how your program can cause an an assertion Failure, see the Visual c + + documentation on asserts
- PHPMailer_v5.1 使用[转]
- h5移动端百分比
- 算法笔记_084:蓝桥杯练习 11-1实现strcmp函数(Java)
- Vue 组件与复用
- centOS7 安装redis-3.2.6
- 基于RxJava2+Retrofit2简单易用的网络请求实现