python爬虫 -掘金
2024-10-11 06:38:45
import json
from time import sleep import requests url = "https://web-api.juejin.im/query"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36",
"Referer": "https://juejin.im/",
"X-Agent": "Juejin/Web",
"Content-Type": "application/json", } def get_content(after=''):
info = {"operationName": "", "query": "", "variables": {"first": 20, "after": after, "order": "POPULAR"},
"extensions": {"query": {"id": "21207e9ddb1de777adeaca7a2fb38030"}}}
resp = requests.post(url, headers=headers, data=json.dumps(info))
content = resp.content.decode('utf-8')
content = json.loads(content) edges = content['data']['articleFeed']['items']['edges']
pageInfo = content['data']['articleFeed']['items']['pageInfo']
return edges, pageInfo def getList(edges):
tmp = []
for item in edges:
one = {}
node = item['node']
one['title'] = node['title']
# one['links'] = node['originalUrl']
# one['content'] = node['content']
tmp.append(one) return tmp data = [] content = get_content()
edges = content[0]
pageInfo = content[1] tmpList = getList(edges)
#data = data + tmpList
print(tmpList)
while (pageInfo['hasNextPage']):
content = get_content(pageInfo['endCursor'])
edges = content[0]
pageInfo = content[1]
tmpList = getList(edges)
#data = data + tmpList
print(tmpList)
sleep(2)
最新文章
- 谈谈关键字strictfp
- 读懂IL代码就这么简单 (一)
- 电赛总结(二)——AD芯片总结之音频处理芯片ADC8009
- lua cURL使用笔记
- 【转】android fragment 博客 学习
- Elasticsearch内存分配设置详解
- 实现MVC4多级Views目录
- JS 打字机效果
- 关于Dropdownlist使用的心得体会
- Area 使用
- 转 Android中通过广播方式调起第三方App
- 图片转换base64数据上传,并且实现预览的简便方法
- [HAOI 2007]反素数ant
- 015模块——起别名
- 扫AR
- [转帖] Linux buffer 和 cache相关内容
- 锐捷 rg-S2026f 学习笔记
- 给洛谷填坑的spj……
- http协议知识整理
- python代码是解释型语言,为什么还有编译过程?
热门文章
- Georgia and Bob(POJ 1704)
- selenuim中18种定位方式
- sqli-labs 1-10关
- [LeetCode]面试题 01.06. 字符串压缩
- [LeetCode]42. 接雨水(双指针,DP)
- Linux幽灵漏洞修复
- C#开发PACS医学影像三维重建(一):使用VTK重建3D影像
- SpringBoot框架:'url' attribute is not specified and no embedded datasource could be configured问题处理
- Python推导式(列表推导式、元组推导式、字典推导式和集合推导式)
- 刷题[CISCN2019 华北赛区 Day2 Web1]Hack World