python爬虫入门(1)----- requests
2024-09-03 18:09:22
介绍
requests是python实现的简单易用的HTTP库,使用起来比urllib简洁很多
基本使用
requests.get("http://www.baidu.com")
requests.post("http://www.baidu.com")
requests.put("http://www.baidu.com")
requests.delete("http://www.baidu.com")
requests.request("get", "http://www.baidu.com")
get
def get(url, params=None, **kwargs):
r"""Sends a GET request. :param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary, list of tuples or bytes to send
in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
""" kwargs.setdefault('allow_redirects', True)
return request('get', url, params=params, **kwargs)
下面凡科微传单获取模板的接口为例子
import requests param = {
"cmd": "getTemplate",
"scrollIndex": 0
}
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}//通过ua识别是否是爬虫
rep = requests.get("https://cd.fkw.com/ajax/flyerhome.jsp", params=param, headers=header)
rep.encoding = 'utf8'
print(rep.text)
post
def post(url, data=None, json=None, **kwargs):
r"""Sends a POST request. :param url: URL for the new :class:`Request` object.
:param data: (optional) Dictionary, list of tuples, bytes, or file-like
object to send in the body of the :class:`Request`.
:param json: (optional) json data to send in the body of the :class:`Request`.
:param \*\*kwargs: Optional arguments that ``request`` takes.
:return: :class:`Response <Response>` object
:rtype: requests.Response
""" return request('post', url, data=data, json=json, **kwargs)
一样以凡科微传单接口为例
import requests data = {
"cmd": "getTemplate",
"scrollIndex": 0
}
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
rep = requests.post("https://cd.fkw.com/ajax/flyerhome.jsp", data=data, headers=header)
rep.encoding = 'utf8'
print(rep.text)
会话对象
在上面操作中request不会持有cookie对象导致每次请求都是新的会话,requests库提供了session的解决方案,下面以凡科登录和登录状态下获取模板为例
import requests
import _md5
import json
import re s = requests.session() md5 = _md5.md5()
md5.update("pwd".encode("utf8"))
pwd = md5.hexdigest()
data = {
"cmd": "loginCorpNew",
"cacct": "username",
"pwd": pwd
}
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"
}
rep = s.post("https://i.fkw.com/ajax/login_h.jsp?dogSrc=3", data=data, headers=header)
login = json.loads(rep.text)
tokenStr = login.get("_TOKEN")
print(tokenStr)
pattern = "value='(.+)'"
matcher = re.search(pattern, rep.text)
if matcher:
token = matcher.group(1)
print(token)
param = {
"cmd": "getTemplate",
"_TOKEN": token,
"scrollIndex": 0
}
rep = s.get("https://i.cd.fkw.com/ajax/flyerTemplate_h.jsp", params=param, headers=header)
print(rep.text)
参考文献
https://cuiqingcai.com/2556.html
http://docs.python-requests.org/en/master/api/
最新文章
- Mac OSX+VirtualBox+Vagrant+CentOS初体验
- SQL Server 2016 SP1 标准版等同企业版?!
- java集合框架之List
- WPF学习(一)--布局控件简介
- 【转】memcached工作原理介绍
- 布局神器display:flex
- 最常用Python开源框架有哪些?
- java 基本类型包装类,system类,Math类,Assrays类,大数据运算
- Linux 访问控制列表(access control list)
- Linux环境下虚拟环境virtualenv安装和使用
- 【5】用vector进行直接插入排序
- Web Penetration Testing
- maven创建helloword项目
- 单页面应用(SPA)
- pandas.Series
- HTML:Browser 对象
- export的变量另开一个终端失效解决方法
- 启动报错:Access denied for user &#39;root&#39;@&#39;localhost&#39; (using password:YES)
- 微信小程序——引用阿里云字体
- Chrome 错误代码:ERR_UNSAFE_PORT
热门文章
- jvm基础知识1
- 使用java类的方式配置spring 需要什么注解?
- 阿里巴巴开源canal 工具数据同步异常CanalParseException:parse row data failed,column size is not match for table......
- JavaScript基础函数的配置对象Configuration Objects(020)
- ArcServer服务通过python备份,到另一台机器还原服务
- 问题 C: 最短路径
- 二.5vue服务器展示
- 整合Lettuce Redis
- 集群搭建完成简要测试集群(性能)带宽与IOPS
- mat-paginatoor控件