Scrapy 下载图片
参考 : https://www.jianshu.com/p/6c8d2730d088
https://docs.scrapy.org/en/latest/topics/item-pipeline.html#writing-your-own-item-pipeline
import scrapy
import requests
import osclass MeinvSpider(scrapy.Spider):
name = "get_meinv"start_urls = [
'https://www.du114.com/',
]def parse(self, response):
dir_path = '%s/%s' % (".", self.name)
if not os.path.exists(dir_path):
os.makedirs(dir_path)for imggroup in response.css('div.Column-picBox'):
imgset = imggroup.css('ul>li img::attr("src")')
for image_url in imgset.extract():print("image_url=%s" % image_url)
us = image_url.split('/')[3:]
image_file_name = '_'.join(us)
file_path = '%s/%s' % (dir_path, image_file_name)if os.path.exists(file_path):
continuewith open(file_path, 'wb') as handle:
response = requests.get(image_url, stream=True)
for block in response.iter_content(1024):
if not block:
breakhandle.write(block)
最新文章
- git下载自己项目到本地
- Android数据缓存
- 操作AppConfig.xml中AppSettings对应值字符串
- 小型工厂企业网站究竟该怎么做好SEO优化,从而带来更多订单?
- QA:java.lang.RuntimeException:java.io.FileNotFoundException:Resource nexus-maven-repository-index.properties does not exist.
- Android中显示网页的多种方式
- Nginx配置性能优化(转)
- POJ 2891 扩展欧几里德
- Java编程中时区和时间相关的问题
- apache禁止公网IP访问的配置
- IIS环境下部署项目
- phpunit实践笔记
- SoapUI模拟REST MockService
- WebForm AnyWay
- Windows DIB文件操作具体解释-4.使用DIB Section
- (zhuan) Using convolutional neural nets to detect facial keypoints tutorial
- MDK 的编译过程及文件类型全解
- 基于nginx-rtmp-module模块实现的HTTP-FLV直播模块(nginx-http-flv-module)
- Android-自定义TabHost
- C++ STL 一般总结(转载)