Scrapy中将数据保存至数据库
2024-08-31 15:50:02
一、在settings.py文件中配置数据库连接参数
# 数据库连接参数 DB_HOST = '192.168.183.1' DB_PORT = 3306 DB_USER = 'root' DB_PASSWORD = ' DB_DATABASE = 'a' DB_CHARSET = 'utf8' # 设置一个管道用于将数据插入数据库 ITEM_PIPELINES = { 'dushu.pipelines.DushuPipeline': 300, 'dushu.pipelines.DushuMysql':301, }
二、在pipelines.py文件定义管道
读取settings文件中的参数:
from scrapy.utils.project import get_project_settings
settings = get_project_settings()
# 将settings文件导入到python文件 from scrapy.utils.project import get_project_settings import pymysql class DushuMysql(object): def __init__(self): #settings中包含了setting的属性 settings = get_project_settings() self.host =settings['DB_HOST'] self.port = settings['DB_PORT'] self.user = settings['DB_USER'] self.password = settings['DB_PASSWORD'] self.database = settings['DB_DATABASE'] self.charset = settings['DB_CHARSET'] self.connect() def connect(self): self.conn = pymysql.connect(host=self.host,port=self.port,user=self.user,password=self.password,db=self.database,charset = self.charset,) self.cursor = self.conn.cursor() def process_item(self, item, spider): try: self.cursor.execute( 'insert into books values("%s","%s","%s")' % (item['src'],item['alt'],item['author']) ) # 注意需要提交 self.conn.commit() except Exception as e: print(str(e)) return item def close_spider(self, spider): self.cursor.close() self.conn.close()
最新文章
- 聊下git merge --squash
- 待研究:insert客商账户触发器增加条件提示为空
- 转 Android中shape中的属性大全
- JAVA中的NIO(二)
- [HDOJ1175]连连看
- mac下使用github
- linux查找某个文件中单词出现的次数
- vijos1194 Domino
- Flink Program Guide (3) -- Event Time (DataStream API编程指导 -- For Java)
- Oracle EBS使用adpatch工具打patch过程【Z】
- 软件测试学习日志————round 0 An impressed error in my past projects
- 利用objc的runtime来定位次线程中unrecognized selector sent to instance的问题
- 汉字Collection
- git如何忽略文件
- Python之Queue模块
- JAVA反射中的getFields()方法和getDeclaredFields ()方法的区别
- 排错-Loadrunner添加Windows Resource计数器提示“找不到网络路径”解决方法
- nmap扫描出现tcpwrapped
- Transaction And Lock--唯一索引下INSERT导致的死锁
- ATL字符宏使用以及代码测试