爬虫学习--Requests库详解 Day2
2024-09-01 15:50:47
什么是Requests Requests是用python语言编写,基于urllib,采用Apache2 licensed开源协议的HTTP库,它比urllib更加方便,可以节约我们大量的工作,完全满足HTTP测试需求。
一句话总结:它是Python实现的简单易用的HTTP库 安装Requests pip install requests
验证没有报错,表示已经成功的安装了
实例引入
import requests
response = requests.get('https://www.baidu.com')
print(type(response))
print(response.status_code) # 状态码
print(type(response.text))
print(response.text) # 响应的内容
print(response.cookies) # 获取cookie
各种请求方式
import requests
print(requests.post('http://httpbin.org/post')) #
print(requests.put('http://httpbin.org/put'))
print(requests.delete('http://httpbin.org/delete'))
print(requests.head('http://httpbin.org/get'))
print(requests.options('http://httpbin.org/get'))
Requests库请求具体怎么用的
基本GET请求-----------------------------------------------------------------------------------------------------------
基本写法
import requests response = requests.get('http://httpbin.org/get')
print(response.text) # 请求头啊,请求的IP地址,请求的链接
带参数的GET请求
import requests response = requests.get('http://httpbin.org/get?name=germey&age=22')
print(response.text)
也可以用字典的形式传参 params
import requests data = {
'name':'xiaohu',
'age':''
}
response = requests.get('http://httpbin.org/get',params=data)
print(response.text)
解析json
import requests response = requests.get('http://httpbin.org/get')
print(type(response.text))
print(response.json())
print(type(response.json())) # 它是一个字典的类型
区别用json.loads与直接.json有什么不一样,结果其实是一样的
import requests
import json response = requests.get('http://httpbin.org/get')
print(type(response.text))
print(response.json())
print(json.loads(response.text)) #区别用json.loads与直接.json有什么不一样,结果其实是一样的
print(type(response.json())) # 它是一个字典的类型
获取二进制数据
import requests response = requests.get('https://github.com/favicon.ico')
print(type(response.text),type(response.content)) # .content是二进制内容
print(response.text)
print(response.content)
获取到图片把它保存
import requests
response = requests.get('https://github.com/favicon.ico')
with open('favicon','wb') as f:
f.write(response.content)
f.close()
添加headers 如果不加headers 报错500
import requests headers = {
'User-Agent':'Mozilla/5.0(Macintosh;intel Mac OS X 10_11_4)AppleWebKit/537.36(KHTML,like Gecko)Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.get("https://www.zhihu.com/explore",headers=headers)
print(response.text)
基本POST请求
import requests data = {
'name':'xiaohu',
'age':'',
'job':'IT'
}
response = requests.post('http://httpbin.org/post',data=data)
print(response.text)
import requests data = {
'name':'xiaohu',
'age':'',
'job':'IT'
}
headers = {
'User-Agent':'Mozilla/5.0(Macintosh;intel Mac OS X 10_11_4)AppleWebKit/537.36(KHTML,like Gecko)Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.post('http://httpbin.org/post',data=data,headers=headers)
print(response.json())
响应
response属性
import requests response = requests.get('http://www.jianshu.com')
print(type(response.status_code),response.status_code) # 状态码
print(type(response.headers),response.headers) # 请求头
print(type(response.cookies),response.cookies) #
print(type(response.url),response.url)
print(type(response.history),response.history) # 访问的历史记录
高级操作
文件上传
import requests files = {
'file':open('favicon','rb')
}
response = requests.post('http://httpbin.org/post',files=files)
print(response.text)
获取cookie
import requests response = requests.get('https://www.baidu.com')
print(response.cookies)
print(type(response.cookies))
for key,value in response.cookies.items():
print(key+'='+value)
会话维持
import requests
requests.get('http://httpbin.org/cookies/set/number/1165872335') # 为网站设置一个cookies
response = requests.get('http://httpbin.org/cookies') # 再用get访问这个cookies
print(response.text) # 为空,因为这里进行了两次get请求,相当于两个浏览器分别设置cookies和访问cookies,相对独立的
改进
import requests
S = requests.Session() # 声明对象
S.get('http://httpbin.org/cookies/set/number/1165872335') # 实现在同一个浏览器进行设置rookies和访问rookies
response = S.get('http://httpbin.org/cookies') # 再用get访问这个cookies
print(response.text) # 此时不为空
证书验证
import requests
from requests.packages import urllib3
urllib3.disable_warnings() # 消除警告信息 response = requests.get('https://www.12306.cn',verify=False) # verify=False 不需要验证进入,但是有警告
print(response.status_code)
import requests response = requests.get('https://www.12306.cn',cert=('/path/server.crt','/path/key')) # 指定的证书
print(response.status_code)
代理设置
import requests
import socks5 proxies = {
"http":"socks5://127.0.0.1:8080",
"https":"socks5://127.0.0.1:8080",
} response = requests.get('https://www.taobao.com',proxies=proxies)
print(response.status_code)
超时设置
import requests
from requests.exceptions import ReadTimeout try:
response = requests.get('https://www.httpbin.org/get',timeout = 0.2)
print(response.status_code)
except ReadTimeout:
print("timeout")
认证设置
import requests
from requests.auth import HTTPBasicAuth r = requests.get('http://127.27.34.24:9001',auth=HTTPBasicAuth('user',''))
print(r.status_code) # 第二种方式
import requests r = requests.get('http://127.27.34.24:9001',auth=('user',''))
print(r.status_code)
异常处理
import requests
from requests.exceptions import ReadTimeout,HTTPError,RequestException,ConnectionError
try:
response = requests.get('http://httpbin.org/get',timeout = 0.2)
print(response.status_code)
except ReadTimeout:
print('Timeout')
except HTTPError:
print('Http error')
except ConnectionError:
print('Connection Error')
except RequestException:
print('error')
最新文章
- mySql引擎
- [IOS 静态库]
- 对HTML+CSS+JavaScript的个人理解
- Testin云测试平台初体验
- CalParcess.php.
- App Submission Issues
- 高性能网站优化-创建快速响应的Web
- 泛泰A860(高通公司8064 cpu 1080p) 拂4.4中国民营recovery TWRP2.7.1.2文本(通过刷第三版)
- RMAN中FILESPERSET设置对备份速度的影响
- hibernate使用注解简化开发
- Cocos2d-x 3.2Lua演示样例UserDefaultTest(用户默认配置)
- 如何高效的使用PowerShell备份数据库
- SQL遇到的问题
- (转)Nandflash读写
- ajax post提交空字符串(string.Empty) MVC接收为null的问题
- log框架集成
- String变量的两种创建方式
- linux 环境下tomcat中部署jfinal项目
- 剑指Offer - 九度1506 - 求1+2+3+...+n
- 9.Django里的数据同步migrations命令