Python爬虫(七)
2024-10-06 19:28:49
源码:
import requests
import re
from my_mysql import MysqlConnect # 获取详情页链接和电影名称
def get_urls(page):
url = 'http://www.dytt8.net/html/gndy/dyzz/list_23_{}.html'.format(page)
response = requests.get(url)
response.encoding = 'gbk'
# print(res)
pat = r'<a href="(.*?)" class="ulink">(.*?)</a>'
res = re.findall(pat, response.text)
# print(res)
return res # 获取磁力链接
def get_links(url):
response = requests.get(url)
response.encoding = 'gbk'
html = response.text
# print(res)
pat = r'href="(magnet.*?)"'
res = re.search(pat, html)
magnet = res.group(1)
pat = r'href="(ftp.*?)"'
res = re.search(pat, html)
ftp = res.group(1)
return magnet,ftp if __name__ == '__main__':
mc = MysqlConnect('127.0.0.1', 'root', '', 'homework')
for page in range(1,4):
res = get_urls(page)
for url, name in res:
url = 'http://www.dytt8.net/' + url
movie_tuple = get_links(url)
sql = 'insert into dytt(id,name,magnet,ftp) values(null,{},{},{})'.format(repr(name),repr(movie_tuple[0]),repr(movie_tuple[1]))
print(sql)
mc.exec(sql)
最新文章
- asp.net 有关时间各种(输出)处理
- Ubuntu1404: 将VIM打造为一个实用的PythonIDE
- 网页自适应@media
- C++ 什么是句柄?为什么会有句柄?HANDLE
- Silverlight 读取配置文件
- Educational Codeforces Round 7 B. The Time 水题
- HDU 4793 Collision (2013长沙现场赛,简单计算几何)
- (DT系列一)DTS结构及其编译方法
- DPC定时器
- zookeeper集群配置
- Java程序猿从笨鸟到菜鸟之(九十二)深入java虚拟机(一)——java虚拟机底层结构具体解释
- Java Web Services (0) - Overview
- DCDC电源 TPS54525
- Python的循环导入问题
- PythonStudy——数字类型 Number type
- keepalived自动安装脚本
- 文件描述符fd、文件指针fp和vfork()
- day 50 Java Script 学习
- DevExpress v17.2新版亮点——Data Access
- 【BZOJ】【2946】【POI2000】公共串