request-html 简单爬虫
2024-10-21 12:41:20
import asyncio
from requests_html import HTMLSession
url = 'http://www.xiaohuar.com/hua/'
session = HTMLSession( browser_args=[
'--no-sand',
'--disable-infobars'
'--user-agent=Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
],headless=False)
res = session.request(url=url,method='GET')
script = """
() => {
return {
width: document.documentElement.clientWidth,
height: document.documentElement.clientHeight,
deviceScaleFactor: window.devicePixelRatio,
}
}
"""
try:
res.html.render(keep_page = True)
async def main():
await res.html.page.waitFor(1000)
await res.html.page.setViewport({'width': 1366, 'height': 768})
url_list = await res.html.page.xpath('//div[@class="img"]/a')
for url in url_list:
url_link = await (await url.getProperty('href')).jsonValue()
print(url_link)
asyncio.get_event_loop().run_until_complete(main())
except Exception as e:
print(e)
finally:
session.close()
最新文章
- 史上最详细git教程
- Android之数据存储的五种方法
- C# 3.0新语言特性和改进(一)
- 【leetcode】Longest Common Prefix
- Linux实现https方式访问站点
- JavaScript、Jquery选择题
- Codeforces Round #104 (Div. 1)
- Connectify是一款很实用的免费软件。能把计算机变成一个无线路由器
- ACM学习
- HDU-1390 Binary Numbers
- CentOS搭建GIT服务器【二】-HTTP源码访问及smart http协议
- My SQL 常用函数
- Windows系统基本概念
- Ubuntu 12.04 中文输入法
- python小工具:用python操作HP的Quality Center
- PHP中的封装和继承
- python sort和sorted区别。
- linux affinity
- dict使用
- Jmeter(十六)Logic Controllers 之 Runtime Controller