2019-02-01 Python爬虫爬取豆瓣Top250
2024-09-07 14:05:11
这几天学了一点爬虫后写了个爬取电影top250的代码,分别用requests库和urllib库,想看看自己能不能搞出个啥东西,虽然很简单但还是小开心。
import requests
import re
# https://movie.douban.com/top250?start=25&filter=
# <span class="title">勇士</span>
count = 1
def getdata(url):
data = requests.get(url)
return data.text
def showdata(data):
global count
regex = re.compile(r"<span class=\"title\">(.*?)</span>")
data = regex.findall(data)
newdata = data.copy()
for dataa in newdata:
if "nbsp" in dataa:
data.remove(dataa)
for i in data:
print(count, i)
count = count + 1
for i in range(0, 10):
i = i * 25
url = "https://movie.douban.com/top250?start={}&filter=".format(str(i))
data = getdata(url)
showdata(data)
# 用requests来实现,正则表达式解析网页
import urllib
import urllib.request
import re
#https://movie.douban.com/top250?start=25&filter=
#<span class="title">勇士</span>
count = 1
def getdata(url):
data = urllib.request.urlopen(url).read().decode("utf-8")
return data
def showdata(data):
global count
regex = re.compile(r"<span class=\"title\">(.*?)</span>")
data = regex.findall(data)
newdata = data.copy()
for dataa in newdata:
if "nbsp" in dataa:
data.remove(dataa)
for i in data:
print(count,i)
count = count+1
for i in range(0,10):
i = i*25
url = "https://movie.douban.com/top250?start={}&filter=".format(str(i))
data = getdata(url)
showdata(data)
#用urllib来实现,正则表达式解析网页
emmmmmmm
最新文章
- eclipse中jsp文档无语法着色,安装Eclipse Java Web Developer Tools插件
- java中&;与&;&;的区别
- Sqlserver 平面文件导入/ SSIS FlatFileSource导入文件时 出现LocaleID is not installed报错问题
- LoaderManager使用详解(一)---没有Loader之前的世界
- spring 占位符 默认值
- Silverlight C#动态设置样式
- [.NET WebAPI系列03] WebAPI Controller 中标准CRUD方法
- UVALive 3211 Now or later(2-sat)
- webdriver(python)学习笔记一
- 注册Model类
- PHP文件头部(header)解释
- VC socket Connect 超时时间设置
- [LeetCode]Copy List with Random Pointer &;amp;Clone Graph 复杂链表的复制&;amp;图的复制
- laravel实现excel表格导出
- 产生AJAX跨域问题的原因
- README.md用法
- 联发科安卓6.0项目的到来的第一个难题:tar的分包与并包
- QT连接postgreSQL
- java 模拟浏览器发送post请求
- 【错误整理】ora-00054:resource busy and acquire with nowait specified解决方法【转】