1.4.1python下载网页(每天一更)
2024-10-02 04:44:29
# -*- coding: utf-8 -*- '''
Created on 2019年4月27日 @author: lenovo
''' # import urllib3
# def download(url):
# return urllib3.connection_from_url(url)
#
# print(download('http://now.qq.com')) # 在python中,urllib2被urllib。request所取代 # import urllib.request
# def download(url):
# return urllib.request.urlopen(url).read()
#
# print(download('https://baijiahao.baidu.com/s?id=1632775818269407606&wfr=spider&for=pc')) # import urllib.request
# def download(url):
# print("Downloading:" + url)
# try:
# html = urllib.request.urlopen(url).read()
# except urllib.request.URLError as e:
# print("Download error:" , e.reason)
# html = None
# return html
#
# print(download("htp://www.baidu.co")) # import urllib.request
# def download(url, num_retries=2):
# try:
# html = urllib.request.urlopen(url).read()
# except urllib.request.URLError as e:
# print("Download error:" , e.reason)
# html = None
# if num_retries > 0 :
# if hasattr(e, "code") and 500 <= e.code < 600 :
# return download(url, num_retries-1)
# return html
#
# # print(download("http://httpstat.us/500"))
# print(download("http://www.meetup.com/")) import urllib.request
def download(url, user_agent="wswp",num_retries=2):
print("Downloading: " , url)
headers = { 'User-agent': user_agent}
request = urllib.request.Request(url, headers=headers)
try:
html = urllib.request.urlopen(request).read()
except urllib.request.URLError as e:
print('Download error:' , e.reason)
html = None
if num_retries > 1 :
if hasattr(e, 'code') and 500 <= e.code < 600:
return download(url, user_agent, num_retries-1)
return html print(download("http://www.meetup.com/"))
最新文章
- jQuery插件 -- Cookie插件jquery.cookie.js(转)
- Java多线程系列--“基础篇”07之 线程休眠
- Android 短信的还原
- django 添加动态表格的方法
- codeforces B. Jeff and Periods 解题报告
- html公用库
- iOS - 基于蓝牙数据交换的环境监测(温度、湿度、光照、粉尘、噪声)
- js兼容性大全
- ASP.NET Zero--8.一个例子(1)菜单添加
- ueditor 文件上传的分析和总结
- 记录一个 spring cloud 配置中心的坑,命令行端口参数无效,被覆盖,编码集问题无法读取文件等.
- PAT A1052 Linked List Sorting (25 分)——链表,排序
- C#:TextBox数据绑定
- 写日志(log)
- WinterCamp2017 游记
- 一行python打印乘法表
- Angular4+NodeJs+MySQL 入门-04 接口调用类
- 我的Android进阶之旅------>Android 众多的布局属性详解
- 10.model/view实例(3)
- rails Ajax--利用Jquery
热门文章
- 程序员修仙之路--优雅快速的统计千万级别uv(留言送书)
- 常用的HTTP状态代码(4xx、5xx)详解
- webpack打包(一)
- Google Chrome浏览器插件入门开发
- Gradle +HanLP +SpringBoot 构建关键词提取,摘要提取 。入门篇
- C++ 洛谷 P1731 [NOI1999]生日蛋糕
- JDK1.8之ConcurrentHashMap
- HDU 3062:Party(2-SAT入门)
- WINDOWS 安装ZeroMQ
- Nginx运行报错unknown directive ";";