python 爬虫 重复下载 二次请求
2024-08-26 17:44:50
在写爬虫的时候,难免会遇到报错,比如 4XX ,5XX,有些可能是网络的原因,或者一些其他的原因,这个时候我们希望程序去做第二次下载,
有一种很low的解决方案,比如是用 try except
try:
-------
except:
try:
--------
except:
try:
------
except:
try:
------
except:
try:
------
except:
try:
------
except:
------
有没有看起来更舒服的写法呢?
我们可以用递归实现这个过程
代码如下
request_urls = [
"https://www.baidu.com/",
"https://www.baidu.com/",
"https://www.baidu.com/",
"https://www.ba111111idu.com/",
"https://www.baidu.com/",
"https://www.baidu.com/",
] def down_load(url,request_max=3):
print "正在请求的URL是:",url
result_html = ""
result_status_code = ""
try:
result = session.get(url=url)
result_html = result.content
result_status_code = result.status_code
print result_status_code
except Exception as e:
print e
if request_max >0:
if result_status_code != 200:
return down_load(url,request_max-1)
return result_html for url in request_urls:
down_load(url=url,request_max=13)
输出结果:
C:\Python27\python.exe C:/Users/xuchunlin/PycharmProjects/A9_25/auction/test.py
正在请求的URL是: https://www.baidu.com/
200
正在请求的URL是: https://www.baidu.com/
200
正在请求的URL是: https://www.baidu.com/
200
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6208>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6438>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA65F8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6828>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6A90>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA62E8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6D30>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003AA6DD8>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B682B0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68080>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B685C0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B687F0>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68A20>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.ba111111idu.com/
HTTPSConnectionPool(host='www.ba111111idu.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000000003B68C50>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))
正在请求的URL是: https://www.baidu.com/
200
正在请求的URL是: https://www.baidu.com/
200 Process finished with exit code 0
最新文章
- SSH开源框架的优缺点
- swift开发多线程篇 - 多线程基础
- Nginx的配置文件
- Leetcode: Guess Number Higher or Lower II
- [51NOD1959]循环数组最大子段和(dp,思路)
- comparison of floating point numbers with equality operator. possible loss of precision while rounding values
- 小细节:Java中split()中的特殊分隔符 小数点
- javascript控制图片等比例缩放
- uC/OS 的任务调度解析 (转)
- bzoj1753 [Usaco2005 qua]Who&#39;s in the Middle
- mongodb GUI
- 1002. 写这个号码 (20)(数学啊 ZJU_PAT)
- Javascript CustomEvent
- es6 的循环
- Python上下文管理器
- 2018-2019-2 20165311《网络对抗技术》Exp5 MSF基础应用
- es6(三)
- QQ空间说说如何批量删除
- codeforces 671D Roads in Yusland &; hdu 5293 Tree chain problem
- 乞丐版servlet容器第3篇